Chapter 2: Data & Descriptive Measures
All data and the variables we measure are either 1) ___ or 2) ___
1) Quantitative 2) Qualitative
tabular and graphic empirical frequency distributions are useful for what?
1) describing data 2) or extracting information from a set of data
(epidemiology = the study of the DISTRIBUTION and determinants of health-related states or events (e.g., body weight & diet) in specified human populations, and the application of this study to prevent and control health problems) The word DISTRIBUTION refers to 1)___ and 2)___
1) frequency 2) pattern
What are the 3 scales of measurement used in epidemiology?
1.) Nominal Scale (qualitative observations or categorical observations; ex: sex, race, marital status, exposed (yes or no), disease (yes or no), education status) 2.) Ordinal Scale (qualitative observations or categorical observations; ex: preference rating or rank-order scale) 3.) Numerical Scale (quantitative observations; there are 2 types: continuous (interval), which has values on a continuum, and discrete scales, which has values equal to integers)
Some advantages of studying samples instead of populations are what?
1.) They can be studied more quickly and at a lower cost 2.) It may be impossible to access the entire population 3.) Sample results may be more accurate than results based on a population
•Among epidemiologic study designs, only the ___ design is a qualitative description of the facts in chronological order •May be a case report (involves a description of a single individual) or a case series (involves a description of a small number of cases with a similar diagnosis) •May be thought of as a snapshot description of a problem or situation for an individual or group •Useful for providing in-depth descriptions of the disease state, providing clues about a new disease or adverse health effect resulting from an exposure or experience, and identifying potential areas of new research •Conclusions stemming from one are limited to the individual, group, and/or context under study and can't be used to establish a causal relationship
Case Study Design
•An ___ is a person or thing upon which we collect data •an object (person or thing) upon which we collect data
Experimental Unit
•___ are pieces of information and may be thought of as observations or measurements of a phenomenon of interest •are obtained by observing or measuring some characteristics or property of the population of interest
Data
A ___ is a tabular summary of a set of data that shows the frequency or number of data items that fall in each of several distinct classes; also known as a frequency table
Frequency Distribution
A frequency ____ or ____ is the number of observations (e.g., cases) falling into each of several values or ranges of values (e.g., time periods)
Frequency Table or Distribution •Frequency distributions are portrayed as a frequency table or graph •The strength of a frequency distribution table is that it allows us to readily see the overall pattern of the data, and it easily communicates the information
•The data set that represents the target of interest is called a ___ •A set or collection of items of interest in a study; in public health, where the focus is on human ____, a ____ refers to a collection of individuals who share one or more observable personal or observational characteristics from which data may be collected and evaluated •Examples: social, economic, family (marriage and divorce), work and labor force, and geographic factors of what may characterize these •In biostatistics, the ___ is not limited to people, and in statistics, the ___ is not limited to living organisms; however, when epidemiology applied biostatistics, it involves human ___
Population
A ___ is the number of observations with the characteristic of interest divided by the total number of observations. It is used to summarize counts.
Proportion
•___ data are non-numerical and can only be classified into one of a group of categories; observations that can only be classified into one of a group of non-numerical categories; a general description of properties that can't be described numerically •Can also think of it as a way to describe qualities (hot, yellow, and longer) •Examples: -marital status -racial/ethnic classification -place of residence
Qualitative Data
___ data are observations measured on a numerical scale and can be measured as how many, how long, how much, and so on; observations or measurements that are numerical •Examples: -biometric measures (bp, cholesterol, glucose) -the number of patients a crisis center will serve during a given week -the dose of radiation
Quantitative Data
•A ____ is the most common type of sampling procedure; it is a sample in which every element in the population has an equal chance of being selected •It is used to obtain a representative subgroup of the population •This method is relatively easy to implement if the population is small, but becomes more difficult with larger populations. With larger populations, we can only approximate random sampling •Most epidemiologic studies involving it rely on statistical software packages (Excel, SAS, SPSS, Minitab) with random number generators to automatically obtain it
Random Sample
A ___ is a number of cases of a particular outcome divided by the size of the population in that time period, multiplied by a base (ex: 100, 1,000, 10,000, or 100, 000) Combining the frequency of cases (nominal scale variable) for a selected time interval with the corresponding at-risk population produces a ___ A ___ is calculated by summarizing the frequency of cases during a specified time period and then dividing the total number of cases by the population at risk of becoming a case Deriving ___s for different subgroups of the population (age, sex, geographic area, and exposure history) can assist us in identifying high-risk groups and provide clues about causality; Such information is a prerequisite to the development and targeting of appropriate prevention and control measures
Rate
For nominal or ordinal data, we present the number of values in the data set that fall in each level of the variable. Along with frequencies reported for each level of the variable, ____ are often presented in the table; ___ is the proportion of cases that fall into each level of the variable The ___ of a category is the frequency of that category divided by the total number of observations, where n is the total number of observations (i.e., the sample size); = Frequency/n
Relative Frequency
•Many populations are too large to observe or measure because of time and cost. Thus, we are often required to select a subset of values from the population. Inferences about the population are then made, based on information contained in the ___ •A ___ is a subset of items that have been selected from the population; it is always smaller than the population
Sample
•The ___ scale of measurement is sometimes called qualitative observations because it describes a quality of a person or thing being studied •___ may also be called a categorical observation because the levels of the variable fit into categories •A ___ variable is dichotomous (binary) if it has 2 levels, or multichotomous if it has more than 2 levels •If there was an outbreak of cholera, you could determine case status (___ data) and identify the number of cases in a defined area (discrete data) •If you were interested in assessing the risk of death form leukemia (yes, no) is ___ data, and dose of radiation exposure is continuous data; we could group the exposure level into exposed or unexposed to radiation (___ data) or no exposure, low exposure, medium exposure, and high exposure (ordinal data)
The Nominal Scale of Measurement (1/3 measurement scales used in epidemiology)
•The study of the distribution of health related states or events in human populations is the essence of ____ epidemiology and heavily relies on biostatistics •It is used to describe the health of communities and to identify health problems and priorities according to person (who?), place (where?), and time (when?) factors •It also involves characterizing the nature of the health problem (what?)
descriptive epidemiology
tabular and graphic formats are generally known as ____
empirical frequency distributions
the purpose of the ____ (a multiple of the rate by 10 to the nth power) is to help use better understand, interpret, and communicate the result of our calculations ex:
rate base
The ___ in which a characteristic is measured has implications for the way the information is summarized and displayed The ___ of measurement involves the precision with which a characteristic is measured, which also determines the methods for summarizing, organizing, and analyzing the data There are 3 ___ of measurement used in epidemiology: nominal, ordinal, and numerical
scale
the properties being observed or measured are called ____ a ___ is a characteristic that varies from one observation to the next and can be measured or categorized
variable(s)
A case study design may be a case report or a case series. What is the difference between the two?
•A case report involves a description of a single individual •A case series involves a description of a small number of cases with a similar diagnosis