SPHG 712 Unit 1
Negative Predictive Value (NPV)
% of negative tests that are truly negative. similar to sensitivity, but denominator is total # of people who test negative high NPV = small % of false negatives within all the individuals with negative test results
What are the limitations of ecologic studies?
1) Ecological fallacy 2) Confounding bias: when other risk factors are impacting the exposure of interest because they are correlated with the exposure of interest 3) Cross-level bias: occurs when the confounding factor is associated with the background rate of disease differentially across groups 4) specifically for time trend ecologic studies, investigators cannot be confident that exposure preceded the outcome (in cross sectional ecologic studies) 5) migration into an out of the community can bias interpretation of the data
Advantages of case-control studies
1) Most efficient study design for rare diseases 2) Require smaller study sample than cohort studies 3) Avoids logistical challenges associated with following a large sample over time 4) Allow for more intensive evaluation of exposures of cases and controls 5) less cost + less time 6) can examine multiple exposures 7) likely to be replicable in other populations 8) if done correctly, OR can provide accurate estimate of RR
What are some examples of group-level exposures/outcomes?
1) laws and policies 2) country level GDP 3) environmental exposures at a group level 4) maternal mortality rate
Beyond these biases, what are other threats to the validity of a randomized trials?
1) limiting analysis to only those subjects that were compliant with the protocol, especially when compliance may be correlated with other risk factors for the treatment effect 2) Loss to Follow-Up 3) Non-compliance 4) Crossovers
What types of distortion do we see with confounding?
1. Bias towards the null: This is when the study value is closer to the null than the true value (underestimation of a ratio measure) 2. Bias away from the null: this is when the true value is closer to the null than the study value (overestimation of a ratio measure for a protective and hazardous exposure)
ANY INTERPRETATION FOR MEASURES:
Always include person, place, and time!!!
Selection bias in cohort studies
Can occur if the rates of participation of the rates of loss to follow up differ by both exposure and health outcome status
Selection bias in cross sectional studies
Can occur if those who participate in the study are systemically different from those who don't participate with respect to their exposure-outcome relationship
Null value
Definition: The number corresponding to no association between an exposure and an outcome. In Statistical Tests: Also, the result you would obtain if the null hypothesis (the statement predicting there is no association between exposure and the health outcome) were, indeed, true. Any value obtained other than 1.0 - where 1.0 isn't in the confidence interval - would allow us to reject the null hypothesis and accept the alternate hypothesis. e.g., risk ratio = 1.0, odds ratio = 1.0, prevalence ratio = 1.0
Odds
Definition: - ratio of cases to non-cases Formula: - Cases (can be either incident or prevalent)/non-cases What kind of studies use this measure? - Case-control studies, where we have a group of individuals with the health outcome and a group of individuals without the health outcome of interest that are selected
What are the limitations of modeling?
Methods and results are less transparent Adding too many covariates may decrease precision while doing little to improve validity!
random vs systematic error
Random Error: - Variability in the data that cannot be readily explained - Called variability, random or statistical variation, poor precision, uncertainty, or "noise" in the data - 95% CI is a measure - decreases as the size of the study increases Systematic Error: - results from inconsistent flaws in the design, conduct, or analysis of a study that are not due to chance alone! - leads to deviation of observed measures from the true measures - doesn't change in magnitude regardless of size of the study - worse bc affects internal validity (obscure association, suggests association exists when no association, over or underestimation of the true effect, or changes the direction of the effect)
Do specificity or sensitivity errors tend to produce larger biases?
Specificity
How do we define a cohort?
This is a group of people sharing a common characteristic - like SES, race, gender, education, etc.
Measures of Association
Used to compare the association between a specified exposure and health outcome, and to compare two or more populations with differences in exposure or health outcome status - but doesn't imply that association is causal We compare with division (ratio effect measures like risk/odds/rate ratio) or subtraction (difference effect measures like risk difference)
Multivariate Regression Models
Used to control for confounders in regression models you determine, though, from your conceptual model and the literature, what variable are the confounders + include them in your regression models
Intention-to-treat analysis
When subjects are analyzed according to their treatment (adherence and non-adherence)
Selection bias
when subjects who are selected, participate or followed in a study are systematically different with respect to their exposure or disease association compare with those who are 1. not selected (missingness) 2. don't participate (ex: refusals or unable to contact) 3. are lost to follow up also occurs when one condition on a common effect of two independent variables, resulting in an association between those parent variables that differ from the true causal effect
Positive Predictive Values (PPV)
% of positive tests that are truly positive similar to specificity, but denominator is total # of people who test positive high PPV = small % of false positives among all the people who get positive results
What strategies do we have when building models?
- Directed acyclic graphs - Minimal set of confounders - Literaure based rationale
What are some complications in predicting the direction of misclassification bias?
- Non-differential misclassification of a polychotomous exposure variable (3 or more categories) may result in bias away from null, though this is less likely than bias towards the null. - Non-differential misclassification of a health outcome limited to a loss of sensitivity of detecting the health outcome without any loss in specificity does not bias toward null, whereas a loss of specificity always biases toward the null.
What are the advantages of using ecological studies/using aggregate data?
1) Aggregate data is often easily available and regularly maintained through state and national databases 2) Aggregate data is available on a wide number of exposures of interests 3) Aggregate data is available at a low cost 4) Aggregate data can be compared before and after the intervention to see if it made an impact on the community/group of interest 5) ecologic studies are useful for studying the effect of short-term variations in exposure within the same community, ex: temperature on mortality
What are examples of sources of cases?
1) Cases diagnosed in a hospital or clinic 2) Cases entered into a disease registry, e.g. cancer, birth defects, deaths 3) Cases identified through mass screening, e.g. hypertensives, diabetics 4) Cases identified through a prior cohort study, e.g. lung cancers in an occupational asbestos cohort
Types of Ecologic Study Designs
1) Cross-sectional ecologic studies: Compares aggregate exposures and outcomes over the same period of time across communities. Ex: bladder cancer mortality rate in cities with surface drinking water sources that contain chlorine by-products compared to rates in cities with ground drinking water sources that don't contain chlorine by-products. USE: Prevalence 2) Time-trend ecologic studies Compares aggregate exposures and outcomes within the same community over a period of time. Ex: do hospital admission rates for cardiac disease in LA increase on days when carbon monoxide levels are higher? USE: rates or risks 3) Solely descriptive ecologic studies Compares aggregate exposures and outcomes between communities at the same time OR within the same community over time. Ex: What are the differences in lung cancer mortality among cities in NC? What is the secular trend of lung cancer mortality between 1960 and 2010 for the entire state of NC?
What are examples of sources for controls?
1) Population controls - non-cases sampled from the source population producing cases (ex: census block groups or DMV adults) 2) Neighborhood or friend controls if they didn't develop the outcome of interest 3) Hospital controls - but may not be from the same source population + not have similar exposure prevalence + have diseases resulting from the exposure of interest that are different that the outcome of interest 4) Controls with another disease - ex: if we are studying lung cancer, important to look at other cancers related to the study exposure of interest
What considerations do we need to take into account w/ RCTs that we don't usually see with observational research?
1) Unless researchers are genuinely uncertain about the potential harms or benefits of a treatment, typically unethical to assign it to one group of people while withholding it from others (equipoise) - which limits the kinds of questions we can get.
What questions do you ask to identify the study design?
1) When were the exposure and outcome measured (with respect to investigator)? - Exposure before outcome - Outcome before exposure - Outcome and exposure at the same time 2) What was the unit of analysis? -> Group or individual
What are solutions for interviewer bias?
1) blind data collectors regarding exposure or health outcome status 2) develop well standardized data collection protocols 3) train interviewers to obtain data in a standardized manner 4) seek same information about exposure from two different sources, e.g. index subject and spouse in case-control study
Disadvantages of case-control studies
1) no measure of rate or risk bc we don't know the population at risk 2) subject to recall bias if exposure is measured via interviews and if recall of exposure differs between cases and controls (can be avoided by access historical records to measure exposure) 3) difficult to choose the correct source population (can add to selection bias) 4) not efficient to study rare exposures (ie. less than 10% of controls are exposed) bc then very large #s of case + controls are needed to actually understand effects of rare exposures. 5) sometimes time sequence between exposure and outcome is uncertain
What are source populations for case-control studies?
1) population of particular interest (ex: postmenopausal women at risk of breast cancer) overall, as long at they represent the restricted source population from which cases should arise, and not all non-cases in the TOTAL population, we should be set!
What methods do we have to randomize study participants?
1) roll a dice or use a random number table to put individuals in different groups 2) randomize study participants through stratified random allocation, where individuals are stratified first through a baseline risk factor (like smoking status) and then randomizes the subjects into the subject or treatment group - this is used when the investigator wants to ensure that a strong external risk factor is equalized at baseline between the two groups
Steps in a Case Control Study
1. A subset of the source population is identified to be potential members of the study population Groups: study population and remaining population 2. A sample of potential participants are assessed for inclusion criteria, and study participants are selected. From Study Population -> study participants and non-participants 3. Investigators select eligible cases and then select eligible controls. Cases and control may come from different study populations From Study Participants -> Case Group and Control Group 4. Investigators assess prior exposure status (exposure at some specified point before disease onset) From either Case Group/Control Group: Exposed or Unexposed
How do we select controls in case-control studies?
1. Base or case-base sampling 2. Cumulative density or survivor sampling 3. Incidence density or risk-set sampling
REVIEW: Distortion towards & away form the null
1. Bias towards the null: This is when the study value is closer to the null than the true value (underestimation of a ratio measure for hazardous and protective exposures) 2. Bias away from the null: this is when the true value is closer to the null than the study value (overestimation of a ratio measure for a protective and hazardous exposure)
Other Models for Causality
1. Component Cause Theory - circumstances leading to a health outcome as being parts of one pie chart, or a "causal pie." - without each component in place, the disease or health outcome would not have occurred at that specific point in time. ex: low calcium intake + impaired balance + shoe without grip + icy path + walking dog + dog chasing squirrel = person falling 2. Counterfactual Models - comparing an exposed group of people to a fictional group of people who are exactly the same except they are unexposed to the key variable. - if the ONLY the exposure is changed, helps see how outcome would be changed 3. Directed Acyclic Graphs - creates a conceptual diagram that maps the relationships between the main exposure, outcome of interest, and all potential confounders for a given study
What are the key steps of case-control study?
1. Define and select cases - Determine diagnostic criteria, and incident cases are typically preferable 2. Define and select controls - can have multiple controls per case (will increase statistical power) - can have multiple control groups (consistency = credibility) 3. Compare exposure prevalence
What are the advantages of selecting controls using risk set sampling or incidence density sampling?
1. Direct estimate of the rate ratio is possible 2. The estimates are not biased by differential loss to follow-up among the exposed vs. unexposed controls
Confounding vs. EMM Questions
1. EMM: Does the relationship between racial and ethnic microagressions and depression differences between men and women? 2. Confounding: Does gender/sex obscure the relationship between racial and ethnic microagressions and depression?
Types of selection bias
1. Hospital patient bias - when hospital controls are used in a case-control study 2. Differential loss to follow up - when the exposure-outcome association differs between those who drop out of the study vs those who remain enrolled 3. Healthy worker effect - when workers of a particular occupation are compared to people in the general population, such that workers tend to be healthier than the people in the general population because workers must be healthy to work - so unexposed subjects must have proportional workers 4. Volunteer or nonresponse bias - when the association between exposures and outcomes differs based on whether you are a volunteer or not, bc those who volunteer tend to have different characteristics than the typical individual in the target population. 5. Prevalent user bias - may inadvertently include only those individuals who have survived the exposure without developing the outcome (recruiting people who are using the drug of interest, most who have been using it for years, not realizing that drug was dangerous for some folks that died early on or quit, so they aren't included in the study)
What are the main steps in epidemiological research?
1. Identify a public health problem 2. Develop a hypothesis - define your exposure and disease 3. Pick a study design to fit your question - taking into account the nature of your exposure and disease, the time available for you to do this study, the resources and funds available, and the ease of collecting data or analyzing already available data.
What the limitations of using cross-section studies to evaluate risks?
1. SURVIVAL BIAS 2. SELECTIVE MIGRATION 2. ANTECEDENT CONSEQUENT BIAS
Nine Factors that Determine Causality (Bradford Hill)
1. Strength of association - causal relationships are more likely to have strong associations than non-causality (however, an association being weak doesn't rule out causality, ex: when outcome is common AND a strong one may merely be due to the presence of confounding) 2. Consistency of Data - reproducibility of results in various populations and situations (however, lack of consistency doesn't rule out casual association as some causal agents are only seen in the presence of other co-factors tied to populations + situations) 3. Specificity - pathogens -> tuberculosis - again, may not always be true (ex: smoking doesn't lead only to lung cancer, but also a number of other diseases) 4. Temporality - MOST ESSENTIAL - for agent to be causal, presence must precede the development of the outcome -> if this is not met, causality can be ruled out! 5. Dose-response - graph in which increasing levels of exposure are associated with either an increasing or a decreasing risk of the outcome - absense doesn't negate causality, as some diseases + health outcomes don't have a dose response relationship with exposure (ex: doesn't change after threshold) 6. Biological plausibility - proposal of a causal association that is consistent with existing biological and medical knowledge. 7. Coherence - for a causal association to be supported, any new data should not be in opposition to the current evidence, that is, providing evidence against causality - however, new information can be biased or incorrect - so doesn't negate causality 8. Experimental evidence - preference of experimental evidence from well-controlled study, can support causality by demonstrating that "altering the cause alters the effect". 9. Analogy - one of the weaker criteria as it is speculative in nature + depends on the opinions of the researcher - ex: while infection may cause fever, not all fevers are due to infection - absence does not negate causality.
When should we be concerned about confounding?
1. When we are evaluating an exposure-health outcome association 2. Quantifying the degree of association between an exposure and health outcome. - For example, to what degree does being overweight increase risk of CVD? In this case, if age is an confounder, we would want to control for this 3. Multiple causal pathways may lead to the health outcome - we can't have confounding if there's only one way for the outcome or disease to happen, but this is rare!
What are solutions for recall or reporting bias?
1. add a case group unlikely to be related to exposure 2. add measures of symptoms or health outcomes unlikely to be related to exposure
When do we construct DAGs?
A DAG should be constructed for a study question prior to collecting data and performing data analysis, based on a priori knowledge from previous literature.
What does a highly specific test mean?
A large percent of individuals without disease are classified correctly as not having disease.
Period prevalence
A measure of how many individuals were affected by the disease during a specified time period.
What is confounding?
A third variable (or multiple variables, which is more typical) that distorts the true relationship between exposure and disease/outcome Also considered a form of bias because it can result in a distortion in the measure of association between an exposure and health outcome - we want to minimize this Examples: 1. Does Gen X cause reproductive cancers among those living Cape Fear river? (Confounding: is the association being obscured by the effect of long-chained fluorocarbons) SEE CHECKLIST IN ANOTHER CARD
What is selective migration in cross-sectional studies?
Additionally, disease or the threat of developing the disease may cause out migration of cases from an environment thought to cause disease (ex: workers impacted by toxic exposures in a plant may quit, while more resistant workers will stay).
How do we indicate that we have adjusted for bias in a DAG?
Adjustment is represented by a box drawn around the adjusted variable
What are advantages and disadvantages of cohort studies?
Advantages: - directly estimate rates and risks - good for rare exposures - good for multiple outcomes - provide evidence of temporality between exposure and outcome Disadvantages: - Expensive - time consuming - resources intensive - inefficient for rare outcomes - potential loss to follow up
What are the advantages and disadvantages of hospital-based control selection?
Advantages: - can assume that whatever selection factors influenced a case's decision to use a certain medical facility will also be operating on the controls Disadvantages: - difficult to define source population, and controls may not be representative of the true exposure rate in the target pop
Closed back-door paths in DAGs
Any path from exposure to outcome with arrowheads pointing into the same node EX: Colliders are factors that have arrowheads pointing to it from two other factors WE DON'T CONTROL FOR THESE bc of collider stratification bias
Open back-door paths in DAGs
Any path from exposure to outcome without arrowheads pointing into the same node EX: Confounders are factors that are ancestors of both the exposure and outcome (and are not affected by the exposure) WE CONTROL FOR THESE
How are risk factors defined in this context?
Any variable that is: 1. Already known to be "causally related" to the health outcome or disease (but not necessarily a direct cause) 2. Antecedent temporally to the health outcome or disease on the basis of substantive knowledge or theory, and/or on previous research findings
Measures of incidence in case-control studies
Because we don't know the proportion of cases in the entire population-at-risk, we can't calculate incidence! While the controls are representative of the population at risk (bc they don't have the disease), they are only a sample - so we still can't figure out the population-at-risk. Without pop-at-risk, we can't calculate 1) incidence, 2) risks, or 3) rates. BUT: we can calculate odds and therefore odds ratios as an estimate of the risk ratio or rate ratio Odds of exposure among cases: = (cases + exposed)/(cases + unexposed) Odds of exposure among control = (control + exposed)/(control + unexposed) Odds Ratio = (cases exposed * controls unexposed)/(cases unexposed * controls exposed)
Differential misclassification on exposure or health outcome
Bias is towards the null when: if fewer cases are considered to be exposed or if fewer exposed are considered to have the health outcome. Bias is away from the null when: if more cases are considered to be exposed or if more exposed are considered to have the health outcome.
What would an ideal test be like?
Both highly sensitive and highly specific, where disease would be detected in 100% of those who truly have disease (100% sensitivity), and disease would be ruled out in 100% of those who are truly disease-free (100% specificity).
What can you do about information bias?
CANNOT UNDO IT - so work hard to avoid it in the design phase but you assess the effects of information using sensitivity analysis - which estimates the effect of various degrees of bias and hypothetical estimates are "corrected" for bias
What is the midpoint idea for calculating person-years, and when should we use it?
CURRENTLY ADDING
What study designs have confounding?
Can be present in any study design - but ecological studies are most susceptible to confounding - more diff to control for confounders at aggregate level of data
Selection bias in case control studies
Can occur as a result of the procedure used to study participants when the selection probabilities of exposed and unexposed cases and controls from the target population are different and not proportional - like when the exposure status influences selection. Ex: selective survival (use of prevalence vs. incident cases) can introduce selection bias Ex 2: Hospital-based controls differ from population-based controls with respect to exposure pattern (Berkson's bias)
What are case-control studies?
Case-control studies are studies that are used to see if there is an association between exposure and a specific health outcome, however, we work from health outcome to exposure rather than exposure to outcome. Typically used to asses whether exposure is disproportionately distributed across cases and controls, which could be used to justify if that exposure is a risk factor or not. As such, subjects in case-control studies are chosen because they have the outcome of interest, and controls are individuals randomly chosen from the population that don't have the outcome of interest. OFTEN USED FOR RARE HEALTH OUTCOMES OR DISEASES
Loss to Follow Up
Circumstance in which researchers lose contact with study participants, resulting in unavailable outcome data on those people. Bias can be introduced if LTF is correlated with exposure to the treatment and/or exposure to other risk factors
What are types of randomized trials?
Clinical Trials: Experiment w/ patient (either those who are already sick or are at high risk of being sick) as the subject, with a goal of finding a treatment for a disease or to determine the effectiveness of an intervention in the progression of a disease. Typically the control group is using placebos or the current standard of treatment as a baseline. NIH recommends asking the following questions when determining clinical trials: 1. Does the study involve human participants? 2. Are the participants prospectively assigned to an intervention? 3. Is the study designed to evaluate the effect of the intervention on the participation? 4. Is the effect that is being evaluated a health-related biomedical or behavioral outcome? Individual/Family Trials: Community Trials: Experiment w/ entire community (town, school, factory, office, classroom) are all options. In this case, individuals in the same unit of observation are experimentally exposed to the same intervention but may not receive the same level of exposure.
Construct validity vs. Information bias
Construct validity: Is x really x? Is y really y? EX: How well does a questionnaire measure resilience? Information bias: Are those with disease outcomes reporting exposure differently than those without disease outcomes? Are those with exposure reporting outcomes differently than those without the exposure? EX: Are those with disease outcomes more likely to report low level of resilience (as the exposure) compared with those that are not diseased?
Base sampling or case-base sampling
Control selection from a source population such that every person has the same chance of being included as a control When do we use this? - When we have a previously defined cohort of individuals Does it allow the odds ratio to be a good estimate of RR? - Yes, and does so w/o assuming that the disease is rare in the source population
Incidence density or risk-set sampling
Control selection when the cases are incident cases and the control are selected from the at-risk source population at the EXACT same time cases occur (chosen in pairs) Control provides an estimate of the proportion of the total person-time for exposed and unexposed cohorts in the source population. Note: controls must be eligible to become a case if the health outcome occurs in that control during our period of observation Does it allow the odds ratio to be a good estimate of RR? - Yes, and does so w/o assuming that the disease is rare in the source population
Cumulative density or survivor sampling
Control selection when the controls are chosen from the individuals who are free of the health outcome at the end of follow-up, and have never had the outcome during the entire course of the follow-up Risk: hard to account for LTF because we choose controls at the end of the study, meaning that if there are LTF they are not available for selection at the end of the study Does it allow the odds ratio to be a good estimate of RR? - Yes, but only if the health outcome is rare
What criteria must be met for potential confounders to actually be confounders?
Criteria 1: The potential confounders must be a known risk factor for the health outcome or disease Criteria 2: The potential confounders must be associated with the exposure. Note: the confounding factor must be productive of the health outcome or disease occurrence apart from its association with exposure - i.e among the unexposed, the potentially confounding factor should STILL be related to the health outcome or disease Criteria 3: The potential confounder must not be on a casual pathway (not on the road from exposure to disease; not an effect
Crude estimates vs Adjusted Estimates
Crude Estimates: Simple measures that do no account for other factors that may be driving the estimate Adjusted Estimates: Measures that account for other factors that may be driving the estimate; allow for controlling of cofounders or accounting of effect modifiers in analyses via gender, race, SES, smoking status, family history, etc.
Crude vs. adjusted
Crude odds ratio: odds ratio for the entire group, means only one independent variable Adjusted odds ratio: stratified by specific groups of interest, typically means multiple independent variables
Prevalence
Definition: - The proportion of the population living with a health condition at a specific period or point in time - Used when difficult to determine the onset of the health outcome or if the disease has a long duration Limitations: - Tends to favor the inclusion of chronic diseases over acute ones - Hard to infer causality because exposure and new outcomes are measured simulatenously Formulas: - New + current cases/total study population at the point of inquiry - alternatively: rate * duration How to interpret: - A population with a heart disease prevalence of 0.25 indicates that 25% of the population is affected by heart disease at a specific moment in time.
Rate
Definition: - The proportion of the population who developed the health condition per unit time at risk over the specified amount of follow-up time. What does this measure? - Measures how quickly the health outcome is occurring the population Formula: - Incident Cases/Total Person-time at-risk during the study period (usually years) How to interpret? - A rate of 0.1 case/person-years indicates that on average, for every 10 person-years contributed, 1 new case of the health outcome will develop. - Commonly multiplied by 1k or 10k person-years, and written as X cases per 1k or 10k person-years. Also known as: - incidence rate or incidence density Note 1: Denominator of person-time-at-risk changes as persons originally at risk develops the health outcome during the observation period and are moved from the denominator into the numerator
What are randomized control trials?
Definition: Epidemiological studies in which a direct comparison is made between two or more treatment groups, one of which serves as a control and the other serves as the treatment group (or multiple treatment groups). Assignment into groups is random, and all are followed over time to observe the effect of the treatments and see if the health outcome of interest develops or not. NOTE: the intervention group and control group should be comparable in all aspects for this to work
Cross-sectional studies
Definition: Study design that looks at the prevalence of disease and exposure at one moment in time, rather than following individuals throughout a period of time to look at risk/rates as seen in cohort studies. "Snapshot" looking at health outcomes like death, disease, etc How do you choose the population do they look at? Select a sample population and then obtain the data to classify individuals as having or not having the health outcome - different from cohort studies that initially select for non-diseased/not at risk of disease or diseased/at risk of diseased population.
Gold-standard
Definition: the most definitive diagnostic procedure, e.g. microscopic examination of a tissue specimen, or the best available laboratory test, e.g. serum antibodies to HIV. Can be costly, invasive, and/or uncomfortable! New tests being developed are usually compared to the gold standard in order to determine if there is an improvement in the accuracy! Used in validation studies to ask: how well did we measure our exposure/outcome, and can we estimate the degree of misclassification to adjust later on?
Risk
Definition: - The number of new cases is divided by the total at-risk population at the beginning of the follow-up period. What does it measure? - Measures of the probability of an unaffected individual developing a specified health outcome over a given period of time. Formula: - Incident Cases/Population at-risk at start of the study How to interpret: - 0.x cases per person or XX cases per 100 or 1000 persons -> so a five-year risk of 0.10 means that an individual at risk has a 10% chance of developing the health outcome over a five-year period What kind of studies use this measure? - Prospective studies because the population at risk is easy to determine and follow - NOT case-control studies because the total population can not be determined, SO we use odds instead. Also known as: - incidence, cumulative incidence, or attack rate
Attributable Rate (AR) or Rate Difference
Definition: a measure that allows us to find the absolute effect of an exposure, also known as the excess rate among the exposed population attributed to exposure In words: Among those who texted while driving, the rate of traffic accidents was 6.02 cases per 100 PY higher than the rate among those who did not text while driving. (note: we include units here) How to interpret: If + -> excess risk is due to the exposure If - -> exposure of interest has a protective effect against the outcome (ex: vaccinations). If 0 <- exposure has no association Formula: Incidence rate in exposed (a/total person time exposed) - incidence rate in unexposed (c/total person-time unexposed)
Equipoise
Definition: genuine uncertainty about the benefits/harms of treatments and exposures
Compliance/adherence
Definition: whether or not participants follow treatment recommendations
P-value
Defintion: Exposure of the probability that the difference between the observed value and the null value occurred by chance or because of sampling variability. Alternatively, p-values are the statistical probability of occurrence of a given finding by chance alone in comparison with the known distribution of possible findings, considering the kinds of data, the techniques of analysis, and the numbers of observations. Interpretation: The smaller the p value, the less likely the probability that sampling variability accounts for the difference. When there is a statistical significance, the p value is less than alpha (p<0.05) and the confidence interval does not include the null value so, we reject the null hypothesis that there is no association between exposure and outcome. In words: With 0.05 or less is our baseline, there is less than a 5% probability that the observed difference between the rate ratio, risk ratio, or odds ration and 1.0 is due to sampling variability
Types of Cross Sectional Studies
Descriptive cross-sectional studies: Characterize the prevalence (either point or period) of a health outcome BUT NOT EXPOSURE in a specified population Analytical cross sectional studies: - Data on the prevalence of both exposure and health outcome are obtained for the purpose of comparing health outcome differences between the exposed and unexposed - Note: also compare the proportion of exposed people who are diseased w/ the proportion of non-exposed people who are diseased, which we can't do in descriptive cross-sectional studies
Types of Case Control Studies
Differences on when the cases develop the health outcome: Some use prevalent cases, some use incident cases Differences in how the cases/controls are sampled/identified: Some are population-based while others are hospital-based cases
Pros and Cons of Specificity vs. Sensitivity
EXAMPLE 1: - if a disease is not life threatening if left untreated, - the costs of treatment are high - invasive surgery is required THEN: a very specific diagnostic test is preferred over a more sensitive test. EXAMPLE 2: - If the disease under study is life threatening if left untreated - the survival rate is improved with immediate treatment, THEN: the sensitivity of a diagnostic test is of greater importance than its specificity.
How does randomization avoid bias?
Eliminates the baseline differences in risk between control and treatment groups by making both groups similar in terms of distribution of risk factors, regardless of whether risk factors are known or unknown (like confounding variable)
Experimental vs. Non-Experimental Studies
Experimental: Investigator randomly ASSIGNS exposure, like RCTs Non-experimental: OBSERVATIONAL, does not assign exposure like case-control, cohort, ecological, or cross-sectional studies
What are the three types of exposure?
Exposure can be preventative, harmful, or have no effect on developing the disease/health outcome in the exposed population.
Colliders in DAGs
Factors that have arrowheads pointing to it from two other factors
Retrospective vs Prospective Studies
From the investigator's perspective: Retrospective: data formed in the past - EX: geographic area. workers in the radiation industry, potential exposure to hazardous substances Prospective: data collection starts now, goes into the future
How do you interpret risk ratio values? And what does the value tell us about the type of exposure?
How to Interpret Values: <1: Indicates that the risk in the exposed group is less than the risk in the unexposed or less exposed reference group (so exposure is PREVENTATIVE) =1: Indicates that there is no difference in risk or rates between exposed and unexposed groups. >1: Indicates that the risk in the exposed group is greater than the risk in the unexposed group (so exposure is HARMFUL) OVERALL: The farther away the risk ratio or rate ratio is from 1, the greater the effect of the exposure on the study group (both beneficial/preventative and bad/harmful)
How does prevalence affect predictive values, specifically PPV and NPV?
If disease has a low prevalence and test being used to assess disease in individuals is not 100% sensitivity or 100% specific, false positives may overwhelm the positive test results!
How can confounding impact our observations?
If present, it can cause an over or under estimate of the observed association between exposure and health outcome - and even change the apparent direction of an effect. It can also obscure and association, or even suggest an association exists when there is no association!
What happens if we want to use greater/higher/more or less/lower to ratios like risk ratio?
If you add words like greater, higher, or more (when ratio >1.0), you have to subtract 1 from your measure of effect. If you add words like less or lower (when ratio <1.0), you have to subtract the ratio measure from 1. If you just say as likely, you don't need to modify the ratio at all!
Non-differential misclassification of exposure status
In a case control study: Exposure status is equally misclassified among cases and controls. In a cohort study: Exposure status is equally misclassified among persons who develop and persons who do not develop the health outcome Note: - if dichotomous exposure, will cause a bias of the risk ratio, odds ratio, or rate ratio towards the null - if 3 or more categories, intermediate exposure groups may be biased away from the null, but overall exposure response trend will be biased towards the null
Non-differential misclassification of health outcome status
In a case control study: health outcome is equally misclassified among cases and controls In a cohort study: A study subject who develops the health outcome is equally misclassified among exposed and unexposed cohorts. NOTE: - typically causes bias towards the null! ALSO: - if errors in detecting the presence f the health outcome are equal between exposed and unexposed subjects (sensitivity is less than 100%), but no errors are made in the classification of health outcome status (specificity is 100%), the risk ratio and rate ratio in a cohort study will not be biased, but RD will be biased towards the null. BUT - if no errors are made in detecting the presence of the health outcome (100% sensitivity) but equal errors are made among the exposed and unexposed in the classification of health outcome (specificity less than 100%), the risk ratio/rate ratio/risk difference will be biased towards the null
What kind of modeling do we use when we have binary outcomes?
In these cases, we use logistic regression - has binary outcomes and beta coefficients that can be used to estimate odds ratio (which can approximate a risk or rate ratio) when certain assumptions are met (like the rare disease) Interpretation of the coefficient B1: increase in log odds of outcome y per unit increase in x1, adjusted for all other variables in the model
Mediator
Indirect path connecting an exposure and a disease, however, exposure must
What is a false positive?
Individual who is incorrectly diagnosed as a case when, in fact, they do not have the disease. 100% - % specificity = % false positives
What is a false negative?
Individual who is incorrectly diagnosed as a non-case, when in fact the person does have the disease. 100% - % sensitivity = % false negatives
Internal vs. External Validity Questions
Internal Validity: Does X really cause Y? -> Threats: confounding, selection bias, and measurement/misclassification errors External Validity: Do the results of my study apply to other populations?
How do we interpret the odds ratio in a case-control study?
Interpretation? >1.0: Positive association, or increased odds of developing the health outcome in the exposed group <1.0: Negative association, or reduced odds of developing the health outcome in the exposed group (protective, like vaccinations) =1.0 Odds of disease is the same for exposed and unexposed Example: If the odds ratio was 4.0, this means that that individuals who consumed a high fat diet have four times the odds of colon cancer than do individuals who do not consume a high-fate diet. As such, this exposure was harmful to the individual.
What does selection bias do to validity and the measures of association?
It can affect internal validity and lead to a distortion in the measure of associatIon. If bias is towards the null, the measure of association will underestimate the true effect. If bias is away from the bill, measures of association will overestimate the true effect.
How does increasing the number of controls we enroll for each case impact the statistical power of the study?
It increases the statistical power of the study (ie the prob that you will find an association if one does exist)
How can we avoid bias given our understanding of these threats?
Keep study subjects in the original randomized group, even if they were LTF, switched to the other group, or were non-compliant
What does a highly sensitive test mean?
Large percent of people who have disease are classified correctly as having the disease.
What are the different types of models?
Linear, logistic, cox, and poisson
Cohort studies
Longitudinal studies where an exposed and unexposed group (or less exposed group) are followed forward in time to find the incidence of the outcome of interest -> tldr: make sure the entire group is disease free to start We start by measuring exposure at baseline, and then checking exposure at multiple points of time OR development of disease at multiple points of time. Note: cohort studies don't have to measure just one outcome of interest (like death), we can also observe free of disease, behavior change, injury, improvement, and disease simultaneously
What is intersectionality?
Lopez: examining race, gender, class, ethnicity together - for interrogating inequalities across a variety of social outcomes, including education, health, employment, housing, and developing contextualized solutions that advance social justice
How do we address selection bias in the design phase?
Make sure to use the same criteria for selecting cases and controls, taking into account diagnostic and referral practices. In cohort studies: work to obtain high participation rates and use a variety of methods to track subjects NOTE: not always possible to correct sampling or selection during analysis, so design a study that will avoid these issues as much as possible
Rate Ratio - definition? how to put into words? how to interpret values?
Measure that compares the rate of those individuals exposed to the rate of those individuals who are unexposed. How do we interpret it? How do we interpret it? >1.0: Positive association, or increased risk of developing the health outcome in the exposed group <1.0: Negative association, or reduced risk of developing the health outcome in the exposed group (protective, like vaccinations) =1.0: No association In words? - Those who had a high lead exposure at baseline had 1.92 times the rate of an IQ decrement over 10 years of follow-up compared with those who were unexposed to high lead. - Those who texted while driving had five times the rate of traffic accidents compared with those who did not text while driving. (note: no units) Formula: Rate ratio = (rate exposed or a/total person-time exposed)/(rate unexposed or c/total person-time unexposed) *remember rate = Incident Cases/Total Person-time at-risk
Sensitivity vs. Specificity
Measures that assess the validity of diagnostic and screening tests! SPECIFICITY: how well the test is detecting non-diseased individuals as truly not having the disease, or the % of those w/o disease that get negative results in normal words: "number of people without disease who test negative divided by those without disease" formula: true negatives/(true negatives + false positives) OR (truly and test unexposed/ everyone that is truly unexposed) SENSITIVITY: how well the test detects disease in all who truly have disease, or the % of diseased individuals who have positive test results in normal words: "number of people with disease who test positive divided by the number of those who have the disease" formula: true positives/(true positive + false negatives) OR (test AND truly exposed/everyone that is truly exposed)
Risk Ratios
Measures: Compares the risk of those individuals exposed to the risk of those individuals who are not exposed, where the values of the risk ratio can be from 0 to infinity. Generally are measures of the strength of the association between exposure and the outcome. Formula: Risk ratio = risk exposed / risk unexposed or Risk ratio = [a or exposed with disease/(a+b or total exposed)]/[c or unexposed with disease/(c+d or total unexposed)] How do we interpret it? >1.0: Positive association, or increased risk of developing the health outcome in the exposed group <1.0: Negative association, or reduced risk of developing the health outcome in the exposed group (protective, like vaccinations) =1.0: No association In words? - Among those who texted while driving (exposure), the risk of having a traffic accident (outcome) was approximately eight times (magnitude) "as likely" or "the odds of outcome" compared with those who did not text while driving (not exposed) over a one-year period (time). note: we don't include units in this, and stick with as likely for now *remember risk = incident/at-risk population at start of study Note: when a or c is a very small number, we can approximate a/a+b to (a/b) or c/c+d to (c/d)
Confidence Interval or Measures of Uncertainty
Measures: The extent of potential variation in a point estimate (the mean value, risk ratio, rate ratio, or odds ratio). Variation can occur when our estimate is based on some sample of the population, rather than the entire population as a whole. Also provides information about how precise the estimate is, the bigger the CI the less precise. Interpretation: - If 100 samples were taken and the 95% CI computed for each sample, we would expect approximately 95 of the 100 intervals would contain the true population mean. - If repeated samples were taken and the 95% CI computed for each sample, approximately 95% of the intervals would contain the population mean. In words: The 95% CI of (lower bound, upper bound) does not contain the null value and is statistically significant. Benefits over P-value: Contains information on both the size of the sample and variability of the sample How do sample sizes impact CIs? - The larger the sample size, the narrower the confidence interval (more precise) - The smaller the sample size, the wider the confidence interval (less precise) What is borderline significance? - Borderline significance if one bound of 95% CI is very close (must be slightly below) the null value - data is only considered statistically significant if it doesn't include the null value of 1 at all - Ratio measure 95% CI (0.97, 2.3)
Risk difference or Attributable Risk
Measures: The absolute difference in risk between the two groups indicating how much excess risk is due to exposure of interest. How to interpret: If + -> excess risk is due to the exposure If - -> exposure of interest has a protective effect against the outcome (ex: vaccinations). If 0 <- exposure has no association In words? - Among those who texted while driving (exposure), the risk of traffic accidents (outcome) was 7.99 cases per 100 persons (risk magnitude) higher than those who did not text while driving (not exposed) over a one year period (time). note: we include units here Benefits of using this over risk ratio: - provides the absolute difference in risk, which the risk ratio doesn't tell us. Ex: A risk ratio of 2 can imply both a doubling of very large or small risk, while a risk difference can actually tell us the magnitude of this change. General formula: Risk exposed (a/a + b or total exposed) - Risk unexposed (c/c + d or total unexposed) OTHER FORMULAS BUT IGNORE FOR NOW: Formula for the total population: (risk in total population - risk in unexposed)/risk in total population * 100 Formula for exposed: (risk among exposed - risk among unexposed)/risk in exposed * 100
What is the difference between nested case-control study vs. regular case-control study?
Nested case controls is conducted with an existing cohort (where the source population is already defined), while the source pop is not defined for regular case control studies?
Are participants in RCTs representative, and if not how does this impact our study?
No! This means that there may still be selection bias and limits generalizability.
What are the two main components of DAGs?
Nodes: which represent specific variables or factors of interest in the relationship between your exposure and outcome, and Lines: with a single arrowhead between nodes, which represent relationships between these factors.
What is the mathematical difference between exposure odds and disease odds?
Nothing! We get the same value!
When are cross-sectional studies used?
OVERALL: used to evaluate the proportion of a population with disease or with a risk factor, useful for planning or administering preventative or healthcare services, for surveillance programs, and for conducting surveys and polls Can also be useful for studying the association of exposure with disease for chronic diseases lacking information on time of onset - but interpretation requires caution regarding the time association between disease and exposure (chicken vs. egg) MORE SPECIFIC TYPES: Descriptive studies: can help evaluate the proportion of a population with disease or are at risk of developing the disease - ex: prevalence of asthma in children or prevalence of elevated blood lead in toddlers. Can also help look at previous across specific segments (age, sex, race, SES). Helpful to plan or administer preventative or health care services, surveillance programs, and surveys and polls. Ex: decennial census or National Health and Nutritional Surveys Both descriptive and analytical cross-sectional studies are useful for establishing preliminary evidence for causal relationships!
Exposure Odds in Case Control Study
Odds of being exposed in cases (exposed cases/unexposed cases) DIVIDED BY Odds of being exposed in controls (exposed controls/unexposed controls) -> (exposed cases * unexposed controls)/(exposed controls * unexposed cases)
Two sided vs. one sided tests
One sided: Used when we have a reasonable basis to assume that direction from which the null value exposure is likely to be associated with only one direction - EX: Children of smoking mothers will only have a higher incidence of asthma Two Sided: Used when we have no basis for predicting in which direction from the null value exposure is likely to be associated with the health outcome (aka we don't know if exposure is beneficial or negative) -EX: Children of smoking mothers will have either a higher or lower incidence of asthma than other children
How do we assess confounding in logistic regression?
Percent Change = | crude OR - adjusted OR | divided by crude OR * 100 If there's 10% or more of a difference, suggests potential confounding
Cases
Persons who experience the outcome of interest
What are types of bias we can avoid through blinding?
Placebo bias: The phenomenon when participants report a favorable response when no treatment, but the only placebo (sham treatment that appears identical to the real treatment), is administered Post-randomization confounding bias: - definition: subject's awareness of intervention motivates them to be more cooperative or otherwise change their behavior - potentially correlating with other risk factors for the intended effect and preventing true randomization - ex: if individuals participating in a clinical trial study to study the effectiveness of a weight-loss drug are aware they are receiving the weight loss drug, they may be more likely to comply with the study diet Selection bias: - group differences in loss to follow-up -> symptoms of disease or side effects of treatment may influence rates of loss to follow-up in subjects that are aware of the treatment Information bias: - controlled in double-blinded studies, but subjects who are aware of status may report their symptom or side effects differently. similarly, staff or statisticians may evaluate subjects differently if they know the treatment status.
What are the types of RCTs?
Placebo-controlled randomized trials: RCTs where the control group is untreated - EX: comparing the effect of vit E supplement in one group of schizophrenia patients against the effects of placebo on a sep group of untreated schizophrenia patients Active-controlled randomized trials: RCTs where the control group undergoes a gold-standard regiment against which the new regiment will be assessed. - EX: comparing diabetic patients with implanted insulin pumps against diabetic patients who receive multiple insulin injections (the control group)
Treatment Crossover
Planned crossover: - when group A (treated w/ experimental drug) and group B (treated w/ standard drug) are switched to the other treatment at the midpoint of the trial, producing carryover effects (when the effects of the first drug carry into the second half of the study during the administration of the other drug) and diminished interest (when subjects have a diminished interest or lack of compliance as a result of the change) Unplanned crossovers: - when a clinician decides to switch a study member from the control to the treatment group or from the treatment group to the control group. this negates the benefit of randomization and introduces bias if the switching is related to the risk of the outcome
Prevalence in Cross-Section Studies
Prevalence = existing number of people with health outcome (both those diagnosed X number of years ago all the way to current time period)/number of people in the study population
Prevalence Difference
Prevalence exposed (diseased + exposed/total exposed) - Prevalence unexposed (diseased + unexposed/total unexposed)
What are prevalent-case vs. incident-case case control studies?
Prevalent-case: studies that look at existing cases of health outcome during observation - gives POR (but can be influenced by incidence rate + survival + migration) which is not representative of RR Incident cases: studies that looking at individuals who develop health outcome, common when study is looking at cause of disease bc more important to look at factors that lead to development of disease rather than duration (seen in prevalence)
Why do we use randomized control trials?
Provide the most direct evidence of causality, and helps eliminate bias
What are measures of disease occurrence for cohort studies?
Rate and risk
What are the reasons for having both ratio and difference measures?
Ratio: expresses the strength of association with no units Differences: express the absolute excess of a health outcome attributable to exposure
What measures of association do we have for ecological studies?
Ratios: Prevalence, Prevalence odds, risks, and rates Differences: Prevalence, risk, and rates
What is relative risk?
Relative risk is any ratio measure of effect that approximates risk - this is not precise and is typically avoided.
What is survival bias in cross-sectional studies?
Remember that under steady conditions, the prevalence of disease is influenced both by incidence/risk and the duration of the disease (or survival with disease). Because of this, people who survive longer with a disease have a higher probability of being counted in the numerator of a prevalence proportion and short-term survivors will be less likely to be counted as a case. Additionally, if incidence influences survival time, we will not expect the POR or PR to provide a valid estimate - so thus these measures are susceptible to survival bias. Even if incidence remains constant, changes to the disease treatment like improvement (so people are cured) or increased lethality (meaning there's a higher case fatality rate) will impact the prevalence rate. We can try to minimize this by collecting information on the exposures that preceded the first symptoms of a chronic disease!
Sources of Information Bias
Respondent/participant: reporting bias/social desirability bias - sensitive topics may be less reported, socially desirable characteristics may be over-reported, cultural differences, and poorly worded questions Recall bias: those with disease may remember/recall information differently than control group Tool: poorly designed questionnaire Researcher: bias in interviewing, lack of equal probing for exposure history between cases + controls, lack of equal measurement of disease status between exposed and unexposed, bias in abstracting records
How can we control for confounders?
Restriction: - We can restrict the study population on only those who are unexposed to one or more confounding variables. - Ideal when there are strong confounders. - Efficient, convenient, inexpensive, and straight-forward method of controlling. - Not logical when the sample size of available study participants is decreased Matching: - We can constrain the control groups (for case controls studies) or the unexposed groups (in cohort studies) such that the distribution of confounding variables within the groups are similar to the case/exposed groups! - Consider cost, precision, feasibility, and flexibility - Matched variables cannot be assessed for confounding Randomization: - Ideal method for controlling confounding bc we can control both known and unknown confounders - cannot be used for observational study designs (cross sectional, cohort, case control, ecological) - Assumes that groups will have equal distribution of confounders, and requires a sufficiently large sample size in each group
What is the ecologic fallacy?
Results when we conclude that because an association exists between an exposure and health outcome at a group level, the same association occurs at the individual level. Causes: - This is because we don't actually know the link between the exposure and the health outcome in the individuals in each group. - We don't know the number of diseased individuals who were exposed or not exposed in the high exposure group or in the low exposure group, nor do we know the cumulative exposure of cases and non-cases! Because there is heterogeneity in the lifetime air pollution of individuals in the groups, we cannot use the average exposure to describe the distribution of exposures among individuals in the population. Example 1: Air pollution protects against lung disease deaths - because air pollution is higher in LA than in Denver yet mortality from lung disease is lower in LA than in Denver. Example 2 (look at the baltimore vs. tampa example in the eric reading)
What are measures of association for cohort studies?
Risk difference, risk ratio, and rate ratio
Selection bias vs. sampling bias
Sampling bias - issue of external validity, produces a valid measure of association that may not be generalizable to other populations Selection bias - issue of internal validity, produces an invalid measure of association
What is true at baseline in case-control studies?
Selection of cases and controls is based on health outcome or disease status AND exposure status is unknown
Why would we measure confounders simultaneously?
Simultaneously control of two or more variables can give different (and potentially more interesting) results from those obtained by controlling for each variable separately. This is because by simultaneously assess confounders, we can better emulate the natural environment where the relationship occurred.
What are the downsides of stratifying the results on more than one variable?
Splitting the sample into significantly smaller sample sizes and limits generalizability!
What are ecologic studies?
Studies in which the units of observation are a group, not separate individuals, for one or more study variables. Aschengrau Key Features: - "A classic ecologic study examines the occurrence of disease in relation to a factor described on a population level." Ex: We cannot measure exposure + risk at a individual level, rather we know these only at a group level! The health outcome of interest may also known be known at the group level, like overall mortality rates from chronic lung disease in the same cities with measured levels of air pollution. NOTE: We can use these studies to generate association hypotheses between exposure and outcome, but we cannot claim causation.
What are case reports/case series?
Study design that includes only cases, no control groups, common in medical literature, but we cannot make conclusions about exposure outcome relationship. Can be used to generate hypotheses Will typically be referred to as case series in the title or the methods!
Total Person-time at-risk
Sum total of time all individuals remain in the study without developing the outcome of interest. Time without developing outcome interest = time where the individuals are still at risk of developing the risk Note: as it can be difficult to determine the exact time when a person becomes a case, investigators commonly will use the midpoint of the interval between being disease-free and becoming a case as the "onset of case"
Synergistic vs. Antagonistic
Synergistic: two exposures act together to have a greater effect on the outcome Antagonist: exposures together produce a lesser effect than the effect of either of the exposures on the outcome
Information bias
Systematic error -> threatens internal validity Distortion in the measure of association between exposure and outcome caused by the lack of measurements of key study variables (exposure, health outcome, or confounders) Can occurring during measuring, interviewing, recalling (reporting), or abstracting of the data NOT THE SAME AS MEASUREMENT ERROR: which is random, systematic, and focuses either the exposure or outcome, not both! Results in misclassification of exposure or disease in study participants, bias towards or away from the null value, or overestimation or underestimation of the true measure of association EFFECTS: 1. suggests an association when there is none 2. obscure an association 3. lead to overestimation or underestimation of the true effect 4. change in the direction of an observed effect
How does the size of the randomized groups impact our ability to remove risks?
The larger the randomized groups, the greater the probability of equal baseline risks.
What are the denominators in a case-control study?
The number of exposed and unexposed in the case group (or control group), NOT THE SOURCE POPULATION
Point prevalence
The percentage of people in a given population who have a given disorder at any particular point in time.
Incidence
The proportion of the population who developed the health condition over the specific amount of time Formulas: new cases / total at-risk study population at baseline
effect measure modification
The strength of the association between an exposure and a disease differs according to the level of another variable! ERIC: when estimates of an exposure-health outcome relationship stratified by a a confounder are sufficiently different from one another (risk ratio grp 1 = 4 vs. risk ratio grp 2 = 0.02), suggests that two different exposure-health outcome relationships are occurring! NOT A SYSTEMATIC ERROR, doesn't compete with exposure of interest and instead modifies outcome in specific subpopulations Examples: 1. The funding that a reduction in the regional public transportation services (the exposure) affects individuals with little to no access to a car much more than those individuals with access to a car. 2. Asbetsos is associated with cancer, however whether an individual smokes or doesn't smoke modifies the impact of asbestos exposure on the lung cancer outcome! EMM can identity subpopulations that are especially susceptible or vulnerable to the exposure of interest.
What are measures of frequency?
These are measures that characterize the occurrence of health outcomes, disease, or death in a population. Typically descriptive in nature, and explain how likely one is to develop a health outcome in a population. RISK, RATE, AND PREVALENCE
When/why is it a good idea to use more than one control group in case-control studies?
This is when researchers suspect that one control group ill have certain deficiencies that another does not, then by comparing cases to more than one control group, we can get a better idea of the true magnitude of the exposure's effect. Additionally, it ensures that the distribution of external risk factors ares similar between cases and controls
Recall or reporting bias
Type of information bias Caused bc of diff. in accuracy of recall between cases + non-cases, or bc of differential reporting of a health outcome between exposed and unexposed leads to differential misclassification! can occur in both case-control and cross-sectional studies EX: - cases have a greater incentive bc of their health issues to recall past exposures - exposed people in a cohort study might be more concerned about their exposure, and over-report or more accurately report the occurrence of symptoms or the health outcome
Interviewer bias
Type of information bias due to: 1) lack of equal probing for exposure history between cases and controls (exposure suspicion bias) 2) lack of equal measurement of health outcome status between exposed and unexposed (diagnostic suspicion bias)
Prevalence Ratio (PR)
Used in cross sectional studies especially when the outcome occurs over a short period of time, similar to the risk ratio in cohort studies. PR = (a/(a+b))/(c/(c+d)) or ((diseased + exposed)/total exposed)/((not exposed and diseased)/total not exposed) Denominators in both ratios are fixed populations. Interpretation: A prevalence ratio of 3.0 can be interpreted to mean that the people who are not physically active are three times as likely as those who are physically active to CHD. You can use either PR or POR when the prevalence of the disease is low, ie 10% or less in exposed and unexposed populations.
Prevalence Odds Ratio (POR)
Used in cross sectional studies, especially when the period for being at risk of developing the outcome extends over a considerable time (aka months to years), similar to the odds ratio: POR = ad/bc or ((exposed & disease) * (not exposed & not diseased))/((exposed & no disease) * (not exposed & diseased)) You can use either PR or POR when the prevalence of the disease is low, ie 10% or less in exposed and unexposed populations. If greater than 10%, more appropriate to use prevalence ratio. Also should use this measure when the onset of disease is difficult to determine!
What do we want to use when we have multiple variables to control/multiple cofounders?
We use mathematical models! Overcome limitations of stratified analysis, allows for adjustment of multiple cofounders, + precision may be better!
What is our goal while selecting controls?
We want to ensure that controls represent the source population from which the cases arose!
Stratification diagram
What three measures of association are needed? A: The overall, crude measure of association of the exposure-health outcome B1: The measure of exposure-health outcome association, among all study participants who have history of the confounding variable (C+) B2: The measure of exposure-health outcome association among all study participants who do not have history of confounding variable (C-) OPTIONS: 1. B1 & B2 are similar but substantially larger or smaller than A: confounding is present 2. B1 & B2 are similar but only slightly larger or smaller than A: potential confounding is present 3. B1 & B2 are on opposite sides of A both with some space from A: effect measure modification 4. B1 & B2 are both bigger or both smaller than A but are not similar in magnitude: effect measure modification and confounding may be present
Odds Ratio
When do we use it? It replaces the risk ratio or the rate ratio in case-control studies, where the underlying population at risk for developing the health outcome or disease cannot be determined because the individuals are selected as either diseased or non-diseased, or as having the health outcome or not having the health outcome. When can it approximate risk or rate ratio? Instances where the health prevalence is less than 10% (like rare diseases) and specified sampling techniques are used, otherwise it will overestimate the risk or rate ratio Interpretation? >1.0: Positive association, or increased odds of developing the health outcome in the exposed group <1.0: Negative association, or reduced odds of developing the health outcome in the exposed group (protective, like vaccinations) In words? Children who received childhood vaccines were 0.88 times as likely to have HPV compared with children who did not receive vaccines. Note - if trying to interpret more likely for any ratio, subtract one (ex: if children's odds ratio was 1.88, so to explain in terms of more likely, we would say 0.88 more likely) Note: we can find the odds ratio of the group without the exposure by taking (1/odds ratio) -> The odds of getting HPV among children who were not vaccinated were 1.14 (1/0.88) times the odds of getting vaccinated against HPV. Formula: (odds of being exposed in cases or a/c) / (odds of being exposed in control or b/d) OR when rearranged we get a*d x b&c Note: we do not have odds differences!
What is antecedent-consequent bias in cross-sectional studies?
When it cannot be determined that the exposure preceded the disease, since both are ascertained at the same time (unlike cohort studies or clinical trials) - chicken vs. egg
Calculating adjusted summary estimates
When no EMM is present (???)
What is blinding? What are the three types of blinding?
When participants in an experimental study, and sometimes also research team members, do not know whether a participant is in the active intervention group or the control group. Single-blinded: subjects are not aware of treatment status Double-blinded: subjects and investigators all are not aware of treatment status Triple-blinded: subjects, investigators, and independent statisticians all are not aware of treatment status
If a control member got the outcome of interest, would they become a case in our study?
YES
Can covariates be both confounders and modifiers?
YES!
Can you have an intervention w/o randomization and/or comparison? What do you need to still have a "study"?
Yes - we can at times just look at a non-random treatment group and just compare their outcomes before or after our intervention with the group.
What does non-parametric mean when it comes to DAGs?
Your DAG will be the same regardless of whether you're calculating a risk difference, a rate ratio, etc.
Population Based Control Selection
best way to ensure that the distribution of exposures among controls is representative of exposure levels in the source population sources of info - DOT lists, gov. aid rosters, tax lists, voting lists, directories, ads, etc. advantages: confidence that controls and cases come from the same population disadvantages: 1. recall bias - may remember exposures differently than cases 2. pop based controls have less incentive to participate 3. pop based controls may need compensation - ie more expensive 4. may miss certain groups of the population through random digit dialing (like those w/o phones)
What is confounding? Does it cause overestimates or underestimates of observed association?
bias that can result when the exposure-disease relationship is mixed with the effect of extraneous factors (i.e. confounders). can cause either an overestimate or underestimate of the association between exposure and the health outcome fo itnerest.
Non-differential misclassification
equal disease misclassification: occurs if there is equal misclassification of exposure between subjects that have or do not have the health outcome OR equal exposure misclassification: if there is equal misclassification of the health outcome between exposed and unexposed subjects NOTE: typically produces bias towards the null, however less bad bc it just underestimates the effect
Two-by-Two Tables
generally used to organize the data from a study Exposed + Diseased = A Exposed + No Disease = B Unexposed + Diseased = C Unexposed + No Disease = D SO: Diseased = A + C No Diseased = B + D Exposed = A + B Unexposed = C + D Total = A + B + C + D Note: when a or c is a very small number, we can approximate a/a+b to (a/b) or c/c+d to (c/d)
STROBE Technique
https://ares.lib.unc.edu/ares.dll?Action=10&Type=14&Value=14E31E49
Disease Odds Ratio in Case Control Studies
odds of being a case among those exposed (exposed cases/exposed controls) DIVIDED BY odds of being a control among those unexposed (unexposed cases/unexposed controls) -> (exposed cases * unexposed controls)/(exposed controls * unexposed cases)
What are ways to select controls in case-control populations?
population based, hospital based, and convenience controls
What are ways to select cases in case-control populations?
population based: a specific geographic region is chosen as a source population with certain restriction criteria (age, SES, race, etc.) cases can be found via disease registries, surveillance databases, hospital medical records, within an existing cohort study, or through an MCO VS: hospital based: cases are chosen from a hospital with certain restriction criteria
What are directed acrylic graphs?
type of causal diagram commonly used in epidemiology to assess relationships between health-related factors relevant to a study question directed bc: arrows points in a single direction acyclic: there should be no loops in the diagram graph: visual way of representing information
differential misclassification
when misclassification of exposure is not equal between subjects that have or do not have the health outcome OR when misclassification of the health outcome is not equal between exposed and unexposed subjects NOTE: causes bias in the risk ratio, rate ratio, or the odds ratio either towards or away from the null, depends on the proportions of subjects misclassified - not so good news bc could be underestimating or over estimating! REFER TO SLIDE 60 OF WEEK 10 SLIDES FOR AN EXAMPLE IN THE CONTEXT OF A STUDY
How do we calculate adjusted measures when all confounders are assessed simultaneously?
y = a + b1x + b2z2 + ... bizi where: y: the health outcome in a dichotomous format x: the exposure z2 to zi: the confounders using this formula, we get additional information from: a b1: the coefficients for the exposure variable b2 to bi: the coefficients for each confounder that is controlled for in the model