PUB301: Study designs, Bias and Confounding
Surveillance, epidemiology and end results (SEER) National Cancer Registry
A type of surveillance system that compiles demographics, primary tumor site, tumor morphology, stage at diagnosis, first cause of treatment, and follow up.
National Death Index
A type of surveillance system that counts and surveys death and monitors mortality trends
Validity
Accuracy Definition: Is the degree to which an observation is free of error Goal: To ascribe any observed association as truly the effect of the exposure
Internal validity
Accuracy of observations made within a study. How accurate your MEASURES are WITHIN the study itself? Influenced by: 1. Selection of study groups; 2. Measurement of error
External validity
Accuracy of study observations in other groups GENERALIZABILITY: How well do the measurements in the SMALL group REFLECT the LARGER group of which you intended for those groups to represent Influenced by: 1. Representativeness of participants; 2. Measurement of error
Ecological Studies
BIG PICTURE STUDY Uses Aggregate Data Biggest limitation is the use of aggregate data Exposure/Outcome: AVERAGE exposure/outcome in group Strengths: Cost and time efficient Results MUST be confirmed by a better measurement
Casual inferences require:
Biology of the disease Biology of agent Internal and external environment Host susceptibility to the disease
Blinding
Concealing group assignment; only the biostatistician will know group assignments
Single-blinding
Concealing group assignments from just the participants
Double-blinding
Concealing group assignments from the participants and the experimenters/investigators
Placebo control
Condition is indistinguishable from the the intervention, but does not confer physiological effects. Has to look, taste, smell and get packaged identically to the actual intervention to be effective. Must be indistinguishable with no biological effects. Is a form of blinding. "Sugar pill"
Hill's criteria for causal inference
Criteria for judgement of causal associations: 1. Temporal sequence: Did exposure precede outcome? 2. Strength of association: How strong is the effect (measured as a risk or odds ratio)? 3. Consistency of association: Has the effect been seen by others? 4. Biological gradient (dose-response relation): Does increased exposure result in more of the outcome? 5. Specificity of association: Does the exposure only lead to the outcome? 6. Biological plausibility: Does the association make sense? 7. Coherence with existing knowledge: Is the association consistent with available knowledge? 8. Experimental evidence: Has a randomized controlled trial been done to support findings? 9. Analogy: Is the association similar to others? Note that you have to be able to show that EXPOSURE COMES FIRST. Guidelines act as a ROADMAP and NOT a CHECKLIST
Hypothesis generating studies
Descriptive studies: Ecological Cross-sectional
Bias
Deviation from the truth due to systemic error; Systematic (ONE DIRECTION) deviation of an observed association from the truth. When present, the OR and RR are BIGGER or SMALLER than what they should be. Results from systematically HIGHER or LOWER counts of exposure (case-control) or incidence disease (prospective study) in the numerator or the denominator because of: 1. the process used to select study groups; 2. Differences in measurement accuracy between groups; 3. Inaccurate measurements; 4. Confounding
Primary intervention
Does the intervention REDUCE first disease exposure? Intervenes in the PRECLINICAL stage. The objective is to PREVENT first disease occurrence.
Secondary prevention
Does the intervention reduce disease reoccurrence or death? Individuals already have the diseases or are already sick. The objective is to DECREASE the progression of the disease and to extend survival. It is NOT a cure. It just CONTROLS and PREVENTS complications
Experimental Studies
Doing experiments and manipulating people Comparing incidence of disease between two groups Studies: Clinical trials and field trials
Controls
EXTREMELY difficult to identify; must reflect the distribution of the characteristics of the base population from which the cases come from Can select from the same hospital as cases but with different admission criteria Should look like everyone that is NOT a case Does not look like cases just without disease
Strengths of Case-control studies
Efficient with regards to time, cost and effort Good for diseases with long latency Good for rare diseases Can measure multiple exposures with a single outcome
Randomized trials
Experimental and Clinical HIGHEST level of evidence for cause and effect Test by manipulating people (two groups and impose an exposure on one) Is considered to be the gold standard study design for establishing cause and effect. Used to make consensus statements regarding clinical guidelines and public health policy.
Data Safety Monitoring Board (DSMB)
External group of scientists independent of the study who develop criteria for monitoring adverse effects and the trial's stopping rules. The group has no relationship with the scientific team; external and unbiased Can stop the trial if harm outweighs benefits
Limitations of Randomized Trials
Feasibility issue: sample size, time, cost, burden Loss to follow-up. Compliance and completion issue among participants. Participants may change groups during the intervention. An appropriate control condition must be established. Adverse effects may occur that impact the health of participants.
Cross-Sectional Strengths
Feasible (Large sample size, time and cost) Measures multiple exposures and multiple outcomes Surveillance Hypothesis generating Potential for cohort studies
Strengths of Prospective Cohort Study
Good for RARE exposures Can measure multiple exposures and multiple outcomes Minimizes bias DIRECT MEASURE OF RISK Can establish temporality between exposure and occurrence easily
Experimental studies have...
High strength of evidence for causality between exposure and disease
Analytic studies
Hypothesis Testing Case-control Cohort Does it look like an exposure is linked to a specific or particular disease
Descriptive Studies
Hypothesis generating OBSERVATIONAL Person, Place, Time Distribution of disease and exposure Ecological and cross-sectional
In case-control studies, selection of cases and controls is...
INDEPENDENT of EXPOSURE status
Prospective Cohort Research Question
In a defined population that is initially without the disease of interest, is there a difference in disease incidence between exposed and non-exposed?
Ecological Fallacy
Inappropriate conclusions about individual-level relationships based upon aggregate data. Applying means to an individual and assuming ti applies to everyone
Correlation coefficient (R)
Indicates the degree of linear relationship Less than 0.30 = weak Between 0.30 and 0.70 = Moderate Greater than 0.70 = strong
Limitations of Prospective Cohort Studies
Inefficient for rare disease Feasibility issue: Requires a large sample size, is very expensive, and takes a lot of time. Data quality may be questionable. Loss of participants during follow-up.
Limitation of Case-Control studies
Inefficient for rare exposures Does not DIRECTLY measure incidence Temporality between disease and exposure is hard to establish Prone to selection and recall bias
Case-Control Research Question
Is the exposure history different in persons that have a specific disease (cases) compared to persons that do not have the disease (controls)?
Randomized Trial Research Question
Is there a difference disease incidence (i.e. a prospective cohort study with an intervention) between participants randomized to an intervention group and participants randomized to a control group?
Observational studies have...
LOW strength of evidence for causality between exposure and disease
Randomization
Makes the experimental design POWERFUL because each person has the same chance of group assignment. Maximizes comparability of outcomes between groups (the group that individuals were put in should be the only thing that caused them to get sick)
Study design of a prospective cohort study
Moving FORWARD in time and waiting for people to get sick or die. Study the population at risk (base population minus prevalent disease) and COUNT the number of new disease over time. Compare incidence of exposed and non exposed using risk ratios
Design layout of a randomized trial
Must pose minimal risk to participants as safety is extremely important 1. define the base population 2. Recruit for potenital participants 3. Screen those that are interested/ willing/ eligible with STRICT inclusion criteria 4. Obtain consent among eligible participants 5. Randomize to study groups and control (put people into groups based upon chance with balanced characteristics among controls and experimental) 6. Implement intervention and follow-up 7. Data analyses and hypothesis testing 8. Close out study
Error
Must understand where it is coming from. There is a lot of opportunity for this to occur in epidemiological studies. Once it occurs, you CANNOT get it rid of it, you can only correct it using statistics. Identifying
Observational epidemiology
Observational Tells us the who, what, where, when and how many?
Case-Control Studies
Observational analytic study based upon associations KNOW DISEASE STATUS DON'T KNOW EXPOSURE Looking into the past to find what exposure was different between those that got sick and those that are healthy (answer lies in the past) NOT based on incidence data Uses an odds ratio to determine the exposure counts between cases and controls Cannot compare disease incidence since the disease status is already known
In a case-control study, what do we know and what do we don't know?
We know the disease status but we don't know the exposure
In a Prospective Cohort study, what do we know and what do we not know?
We know the exposure status and we do not know individual's disease statuses
Confounding
When all of part of an observed association between the exposure and disease is explained by an extraneous factor. Is an actual phenomenon; reflects the complexity of biological pathways of disease. Try to understand this phenomenon.
Cases
Standardized and objective definitions must be used; Are selected from hospital admissions (primary source) and disease registries
Causality
The POTENTIAL that a specified change in a factor (cause) produces a predictable change in the event (effect).
Recall bias
The ability to recall information accurately
Population surveillance
The monitoring of specific health characteristics of a population which reflects the effective flow of information
Inference
To generalize findings from a small group and apply them to a large group
Analytic epidemiology
Tries to tell us what is the relationship between exposure and disease. Uses analytic studies: Case-Control; Cohort Studies Looking for associations
Adverse Effects
Untoward effects (risks, side-effects) of the intervention Measured by the risk: benefit ratio
Population-based Cohorts
Used to enroll cohorts independent of the exposure with SPECIFIC characteristics E.G.: Framingham heart study and Nurses' health study
Exposure-Based Cohorts
Used with rare exposure and is used to ensure sufficient distribution of them E.G.: Agricultural Health Study
What does the risk ratio represent in a prospective cohort study?
"The risk of disease is X% higher if you have the exposure than if you don't have the exposure."
Evaluating potential confounder
1. Compare unadjusted (crude) and adjusted measures 2. If meaningfully different (greater than or equal to 10%), confounding may exist
Hierarchy of a research design
1. I- Evidence obtained from at least one properly conducted randomized controlled trial 2. II-1: Evidence is obtained from a well-designed controlled trial without randomization. 3. II-2: Evidence is obtained from well-designed cohort or case-control analytic studies, preferably from more than one center or research group. 4. II-3: Evidence is obtained from multiple time series with or without the intervention (NOTE: That dramatic results in uncontrolled experiments can also be regarded as this type). 5. III: Opinions of respected authorities based on clinical experience, descriptive studies, and case reports or reports from expert communities.
Controlling for a potential confounder
1. Randomization (in experimental designs). This maximizes comparability due to chance and is the best way to control. 2. Restriction: Remove source of the confounder. This can introduce bias into the study via selection bias 3. Matching: Used in case-control study and makes cases and controls very similar, which is something that you do not want to do. 4. Statistical strategies: Stratification; multivariable adjustments
Effect of a potential confounder
1. Reduces (attenuate) the true association (Makes it weaker) 2. Increases (amplifies) the true association (Makes it stronger)
Criteria for a potential confounder
1. Risk factor for the disease 2. Associated with the exposure 3. Not in the causal pathway between exposure and the disease (i.e. the exposure does not cause the confounder to exist)
Single factor model
19th century Reasonable when studying infectious diseases. Agent of overwhelming pathogenicity and virulence produces disease. EASY TO ARGUE FOR CAUSALITY because agents are infectious and single agent.
Multifactorial model
20th century Reasonable when studying chronic disease. A single agent is NOT likely (agents associated with the disease are often ubiquitous in the environment). Uncertain of the cause: Complex multifactorial relationships. Which factor is the cause of disease? Do all factors need to be present to produce the disease? What combination of factors make individuals most susceptible?
Systematic Error
A deviation from the true value, ONE DIRECTION above or below the true value. Causes the true value to be pushed towards ERROR and BIAS. Threatens ACCURACY and VALIDITY. Sources of error can come from selection of groups, ways of measuring data and confounding
Cause
A factor that gives rise to an event (disease)
Selection Bias
By the way we select someone for the study, we influence a factor inadvertently. The process of selecting participants is correlated with the count measures in the respective study design: Exposure (case-control) and Incident disease counts (cohort). Particularly problematic in case-control studies
Design objectives of a randomized trial
COMPARE diseases INCIDENCE between individuals randomly assigned to an intervention (the exposed) and control (the non-exposed) groups by using a risk ratio
Prospective Cohort Studies
Observational, analytic study that assesses associations. Is the observational study with the highest strength of evidence for causality between exposure and a disease. INCIDENCE STUDY (direct measure of incidence) Know: Disease EXPOSURE status Don't know: DISEASE status Exposure comes BEFORE the disease Moving FOWARD in time Uses Risk ratios
What does it mean to close out an randomized trial?
Once the study is complete, you must tell the participants what group they belonged to and possibly the outcomes of the study
Cross-sectional Limitations
Prevalence only Temporality between exposure and disease is hard to establish Sicker cases of disease may have died (survivor bias) Ecological fallacy
Strengths of Randomized trials
Randomization of participants to groups which maximizes comparability outcomes. Temporality of exposure and disease occurrence is easily measured. DIRECT MEASURE OF RISK Extensive characterization of participants Multiple outcomes Safety is measured carefully via risk:benefit ratio
Random Error
Reads above and below the true value; positive and negative fluctuations around the standard deviation. Threatens PRECISION and RELIABILITY. Sampling error and measurement error are the first places to potentially introduce error (effect is reduced by larger sample size)
Cross-Sectional Study
SNAP-SHOT *Prevalence Study* - uses Odds Ratio Observational Study at a single point in time Acts as a surveillance measure Biggest limitation: Don't know if the exposure or disease came first because they are measured at the same time
Information Bias
Systematic differences in the accuracy of measures on exposure and disease. Misclassification: Misclassifying data within a 2x2 table