Epidemiology - Quick Review 3
screening
"the presumptive identification of an unrecognized disease or defect by the application of tests, examinations or other procedures that can be applied rapidly" clinician must target screening in preclinical phase of disease
sensitivity
% of people with the disease who tested positive for the disease = TP/(TP+FN) = TP/all people with the disease Sensitivity (%) + FN (%) = 100%
specificity
% of people without the disease who tested negative for the disease = TN/(FP+TN) = TN/all people without disease Specificity (%) + FP (%) = 100%
shape of distribution
- If symmetric then mean = median - If left skewed then mean < median (long tail to left) - If right skewed then mean > median (long tail to right)
Indicators of valid results:
- Independent blind comparison with reference standard - Appropriate sample selection - Clear method description to allow for replication - Evaluation of whether results influence decision to perform reference standard
Causation
- association does not prove causation - non-causal explanations (including study biases like measurement error, selection bias or confounding) may cause a spurious association and threaten internal validity - if the study has internal validity, we can then consider whether causal inference can be made Bradford Hill criteria... Aspects of risk factor-disease association must be examined to determine causation: 1. Temporality (only NECESSARY criterion) 2. Strength of association 3. Dose-response relationship (biologic gradient) 4. Consistency 5. Biologic plausibility
cross sectional studies
- cannot easily establish causality - can estimate the prevalence of a disease - cannot estimate the incidence of a disease - they are subject to confounding - they measure exposure and disease at one point in time
cohort: crude vs. adjusted effects
- cohort studies are observational, so exposures are not randomized - because exposures are not randomized, there are baseline differences in the two groups being compared that may influence the outcome (confounding) - the crude estimate may be "confounded" (ie risk of disease may seem higher due to confounding) - must adjust for confounding in regression model
How to prevent selection bias
- design study so that participation is NOT impacted by the exposure or the outcome - attempt to recruit all cases within the source pop - ensure control and cases are selected from the same target population (population based controls are ideal) - minimize non-response and refusals - minimize loss to f/u (in cohort and RTCs)
Retrospective cohort study
- exposure status is determined from information previously recorded - incidence of disease (or outcome) is determined from time of exposure to start of study - the outcome of interest already occurred by the time the study starts but may be learned at start of study
normality assumption of t test
- if underlying distribution is normal, okay to use a t-test when you estimate sample standard deviation - if underlying distribution is not normal, only okay to use t-test if sample is large enough and not too skewed - t test is very robust against the normality assumption
Prospective cohort study
- information on the exposure status of the cohort members is determined at the start of the study - identify new cases of disease (or outcome of interest) from the start of the study moving forward - the outcome of interest has not occurred when the study starts
descriptive statistics for numerical variables
- mean and std (if normally distributed) - median and range (if skewed) - mode express by box plots or histograms. must take into account shape of the distribution
prerequisites for successful screening
cannot screening all diseases so we choose those that are important: -high prevalence -high death or disability early identification of disease must reduce death and/or disability: -allow prompt and effective treatment -prolong survival
case-control study
- observational study in which subjects are selected on the basis of outcome status - people with the outcome of interest are cases - people without the outcome of interest are controls - prior exposures are determined in cases and controls - retrospective direction of inquiry regarding risk factors - must identify cases and select controls (ideally, controls should be a sample of the study base from which the cases emerged. controls must be selected from people who could be cases)
Measures of association and study design for prospective studies (cohort or RTC)
- risk ratio - rate ratio (need person-year info) - HR (from Cox proportional hazards model, need person-year information) - OR (unlikely to use because RR is better)
non-parametric test
- t tests require outcome variable to be approximately normally distributed - non-parametric tests are based on RANKS instead of means and standard deviations - non-parametric tests are less powerful than parametric tests if the population is close to normal, but more powerful if distributed in skewed Examples: - Wilcoxon signed rank test - one-sample and matched pair t test - Wilcoxon rank-sum or Mann-Whitney test - two sample t test - Kruskal-Wallis test - ANOVA
properties of t distribution
- the mean of the distribution is equal to 0 - the variance is equal to v/(v-2) where v is degrees of freedom (V > or =2) - variance is always greater than 1 but it is close to one when there are many degrees of freedom. with infinite df, the t distribution is the same as the standard normal distribution
proportions that are commonly called "rates"
-case-fatality rate -mortality rate -attack rate
purposes of surveillance
-characterize disease patterns and trends -detect disease outbreaks -develop clues about possible risk factors -identify cases for further investigation -identify high-risk population groups to target intervention -monitor impact of prevention and control programs -project health needs
why quantify disease?
-characterize disease patterns and trends (rising level of obesity in US) -detect epidemics (tsar's epidemic) -identify cases for research (especially useful for rare diseases/cancers) -evaluate prevention and control programs -project health needs
purpose of diagnostic tests
-establish a diagnosis in symptomatic patients -screen for disease in asymptomatic patients -provide prognostic information in patients with established disease -confirm that a person is free from a disease ****The purpose of a diagnostic test is really to move the estimated probability of the presence of a disease toward either end of the probability scale. All approaches to gathering clinical information can be considered diagnostic tests (ie history, physical exam)****
ambispective cohort study
-evaluate past records and collect data from groups that are followed into the future
how to determine efficacy of screening program?
-mortality is preferred endpoint -best determined by randomized trial -mortality must be reduced in order to recommend screening
Fisher's exact
-non-parametric test to use in replace of chi-square with a 2x2 table when there are expected counts <5 -more powerful than chi-square but harder to compute -requires two categorical variables with categories in each
case counts
-useful for investigating disease outbreaks (epidemic that occurs suddenly and within a confined geographic area) -epidemic curve: number of cases (y-axis) against time of consent of disease (x-axis)
key sources of data for surveillance
1. National population health surveys from NCHS (mortality data) 2. CDC's congenital malformations registry 3. NCI's Cancer, Surveillance, Epi, and End Results (SEER) program 4. International Association for Research on Cancer (IARC) 5. Centers for Medicare Services (CMS) hospital discharge data 6. National population health surveys from the NCHS: NHANES, BRFSS
3 ways to quantify disease occurrence
1. case counts (numbers) 2. proportions (ratio: # of cases/some variant of ppl at risk) 3. rates (ratio: # of cases in a given time/people at risk and time) ~ velocity of the disease
3 ways to quantify disease
1. counts 2. proportions - point and period prevalence - cumulative incidence 3. rates - incidence rates
3 main sources of epidemic
1. point transmission 2. person-to-person transmission 3. continuous transmission
Selection Bias
A form of sampling bias due to systematic differences between those who are selected for a study (or agree to participate) and those who are not selected (or refuse to participate). Occurs when different people will have different probabilities of being in the study depending on their exposure or outcome. Selection bias can arise from: - procedures used to select subjects - factors that influence study participation - factors that influence participant attrition
In a retirement community of 2000 men and women, 600 are found to have speech-frequency hearing loss at initial screening with audiometry, and 154 new cases of hearing loss are found at subsequent screening one year later. A. What is the estimated prevalence of hearing loss at initial screening? B. What is the approximate 1-year risk of developing hearing loss? C. What is the incidence rate? D. What is the estimated annual prevalence at the end of 1 year-f/u?
A. Prevalence = 600/2000 B. Approximate 1 year risk (cumulative incidence) = 154 / (2000-600) C. CI = 154 / (2000-600) cases/person-year D. Period prevalence = (600+154) / 2000
Cohort studies
Advantages: - can determine incidence in exposed and unexposed groups - can assess temporal associations - minimizes bias in exposure and subject selection - good to evaluate rare exposures Disadvantages: - can take years and be costly - loss to followup may introduce bias in the outcome - not good for rare diseases
case-control studies ad. vs disad.
Advantages: - relatively inexpensive - can use small sample size - great for rare diseases - can evaluate many exposures Disadvantages: - very susceptible to bias (accurate exposure info is hard to obtain; outcome and exposure are known when the study starts) - can't determine risks in exposed and unexposed - not good for temporal sequence - not good for rare exposures
t test assuming equal variance
df = (n1-1) + (n2-1)
Type I and II errors
Alpha - generally ranges from 0.01 to 0.1 - use low alpha if important to avoid type 1 (FP) error (ie testing efficacy of a potentially dangerous medication) - if alpha=0.05, then there is a 5% chance of incorrectly rejecting null Beta - generally ranges between 0.05 and 0.2 -use low beta if important to avoid type II (FN) error (ie provide evidence to reassure the public that living near a toxic dump if safe) -if beta=0.10, there is a 10% chance of missing an association of a given magnitude, or a 90% chance of finding an association of that size - reduce errors by increasing sample size
Outcomes of hypothesis testing
Alpha (type I error) - FP: incorrectly reject null Beta (type II error) - FN: incorrectly fail to reject null Power = 1-beta
Selection bias in cross-sectional studies
Can arise due to sampling: - survivor bias: occurs when survivors of a disease are sampled instead of all cases of a disease. The bias stems from the issue that survivors may have less aggressive forms of disease and different exposures than the people who died - volunteer bias / membership bias: people who join groups tend to be systematically different than people who do not join groups - non-random sampling schemes (i.e snow-ball sampling, convenience sampling) Can arise due to non-participation: - non-response bias: those who choose not to respond may be systematically different than those who do (this is especially problematic in survey research)
Selection bias in case-control studies
Cases are already more motivated to participate than controls because they have the disease of interest - if EXPOSURE status ALSO affects the likelihood of being in the study (or vice-versa), then selection bias occurs Berkson's bias arises when you use hospital controls. Hospital controls may have a lower observed probability than the target population should (P'<P) therefore the observed OR is an overestimate of true association
Clinical and statistical significance
Clinical significance - determination is subjective. Depends on the magnitude of effect and unit of increase. Statistical significance - p-value (HT) - confidence interval (significant if it doesn't include the null)
Chi-square
Dichotomous categorical independent and outcome variables Perform to determine if there is a relationship between two categorical variables measured on same subjects ie null: low birth weight (y/n) and smoking status of mother are independent (not associated). Compute expected value for all cells. Example: Expected value for (smoker and LBW)=P(smoker)*P(LBW)*Total Idea.. how likely is it that we'll observe a chi-square value or larger if there is no association between smoking and LBW df = (# columns -1)*(# rows -1) For 2x2 table, df=1
Non-differential misclassification
Equal misclassification in the groups being compared. There are equivalent degrees of outcomes among exposed and unexposed Both cases and controls under-report so OR is biased toward the null. (underestimation of true effect, conservative bias) Example: In a case-control study of alcohol consumption and risk of hepatocellular carcinoma, both cases and controls underreport exposure to alcohol
Analyzing results of RTC (ITT)
Everyone who is randomized is analyzed (ITT) ITT is more conservative approach to analyzing results but it better reflects how people actually adhere to drug usage in the real world. ITT preserves balance of measured and unmeasured confounders ITT analysis results in bias toward the null
Cutpoints
For a diagnostic test that measures a continuous variable we must choose a cutoff point above which we consider the disease to be present, and below which we consider the disease to be absent. When you choose the cutoff point there is a trade off. If you improve sensitivity, the specificity will suffer and if you improve specificity the sensitivity will suffer. Maximizing sensitivity and specificity will result in the inclusion of FP and exclusion of FN. Where you choose the cutpoint depends on the clinical context. If it is imperative you identify everyone with the disease (no FN) choose a highly sensitive test. If it is imperative you don't misdiagnose, choose a highly specific test. Cutpoints A, B, C: -A = high sensitivity, low specificity (no FN but high FP) -B = maximum specificity and sensitivity (includes FP but exludes FN) -C = high specificity, low sensitivity (no FP but high FN)
Hazard Ratio
HR is similar to rate ratio (relative risk). HR equals a weighted relative risk over the duration of a study. Analysis by ITT will pull the HR closer to the null.
Stages of RTC
I. Unblinded, uncontrolled studies in a few volunteers to test safety, find dose II. Relatively small randomized, controlled, blinded trials: test tolerability, surrogate outcomes III. Relatively large, randomized, controlled, blinded trials to test effect of therapy on clinical outcomes IV. Large trials or observational studies after drug is approved by FDA to assess rate of SAEs and other uses
estimating cumulative incidence from incidence rate
If incidence rate is low (IR*time<10%) then cumulative incidence ~ IR*time if incidence rate is high then cumulative incidence = 1-e^(-IR*time)
SnNout
If test is highly sensitive and NEGATIVE, rule out disease
SPIN
If test is highly specific and POSITIVE, rule in disease.
P value
If the null hypothesis is true, what is the probability that the difference between two groups will be at least as large as that actually observed? The P value is the probability of obtaining an effect as large as or larger than the observed effect, assuming null hypothesis is true. - provides a measure of strength against the null - does not provide info on magnitude of effect P value ranges from 0 to 1 - P ~ 0 means the association observed is unlikely due to chance alone - P ~ 1 suggests there is no difference between the groups other than that due to chance variation - p value is an arbitrary cut-point therefore it's important to report exact p-value If comparing two studies, take note of the sample size in each. - p value decreases with increasing sample size so a much larger N in one study will make it appear as if the observed association is more significant - if the sample sizes are approximately the same for two studies being compared, the p-value is related to the magnitude of the observed associations. so small RR will generate a smaller P value
Lead-time bias
Lead-time bias is an increase in survival as measured from detection of disease to death, without lengthening life. Patients identified by screening are diagnosed earlier therefore their "survival time" starts before patients diagnosed once the disease progresses enough to show clinical symptoms
measures of central tendency
Mean - the average/balancing point - affected by outliers (use if data is normally distributed) - should not be used with ordinal data - describes numerical data that is symmetrically distributed Median - exact middle value - not affected by outliers (used is data is skewed) - describes ordinal OR numerical data if skewed Mode - value that occurs most frequently - describes bimodal distributions
numerical variables
Measurements that can be quantified as numbers. Continuous: uninterrupted numbers for which any value is possible - weight, BP, cholesterol levels, age, salary Discrete: integers; only some numbers are possible - number of children, number of cavities, MCAT test scores
measures of variation / spread
Measures of variation give information on the spread or variability of the data value. - range (Xmax-Xmin) - percentiles/quartiles - IQR (Q3-Q1) - std/variance ((sum of (Xi-mean)^2)/(n-1)
Hypothesis testing
Null hypothesis: there is no association between the independent and dependent/outcome variables (rate of outcome in exposed group = rate of outcome in unexposed group) RR = 1 Alternative hypothesis: RR does not equal one. At alpha of 0.05, if we get a p value < 0.05, we can be 95% confident that the observed value is not due to chance alone. We reject the null. If p > 0.05 then we fail to reject the null.
Measures of association and study design for prospective studies (case-control)
OR (approximates relative risk under rare disease assumption. if rare disease assumption is not met OR > RR)
Odd Ratio (case-control study)
OR is a good approximation of the RR as long as the probability of the outcome in the unexposed group is less than 10% Odds = P(A) / (1-P(A) ..... odds of exposure among cases/odds of exposure among controls As with RR... OR = 1 indicates no association between exposure and outcome OR < 1 indicates exposure is a protective factor OR > 1 indicates exposure is a risk factor When rare disease assumption is NOT met, bias is introduced into the OR - OR > RR
Epidemiological study designs
Observational study: the population is observed without any interference by the investigator Experimental study: the investigator tries to control the environment in which the hypothesis is tested
Misclassification bias
Occurs when you classify people as having the wrong outcome, or the wrong exposure because there are systematic problems with the way you are measuring/getting your information 2 types: differential (worse) and non-differential
Effect of overestimation of a RR or OR
Overestimation of a RR or OR for probability of a risk factor biases away from the null (RR' > RR) Overestimation of a RR or OR for probability of a protective factor biases away from the null (RR'<RR)
Recall bias
Participants are asked to report on past exposures after the disease outcome has already occurred. Problem occurs in case-control and cross-sectional studies. Example: Case control study to test the association between maternal second-hand smoke exposure in pregnancy and infant birth defects. - mothers of infants with birth defects are more likely to recall second-hand smoke exposure Bias OR away from null! Less conservative
Precision vs Accuracy
Precision: - The absence of random error in a conclusion or measurement. - Reproducibility of results Accuracy: - The correctness of a study's conclusions. - A measure of how accurate or close to the truth the results are - Results reflect the TRUE CAUSAL effect in the source population
How likely is it a disease is present or absent? Predictive values.
Predictive values are measures of clinical utility. Sometimes referred to as posterior probability because it is determined AFTER knowing a test result. PPV is the proportion of people who tested positive who actually have the disease. PPV tells you the likelihood a person has the disease if their test was positive. PPV = TP/(TP+FP) = TP/all ppl with positive test result NPV is the proportion of people who tested negative who do not have the disease NPV = TN/(TN+FN) = TN/all ppl with negative test result
measures of disease frequency
Prevalence - # cases / total pop - % with disease at one point in time - no units Cumulative incidence - # new cases / pop at risk - % who develop disease over given period of time - no units Incidence rate - # new cases / (# persons at risk * time observed) - number / person-years or number of persons at risk per year - includes a measure of time
ROC curve
ROC (receiver operating characteristic) curve summarizes the relationship between sensitivity and specificity. Signal (TP%~sensitivity) is plotted on the y-axis. Noise (FP% = 1-specificity) is plotted on the x-axis. An excellent diagnostic test has an area under the ROC curve that approaches 1. Signal to noise ratio (sensitivity/(1-specificity) = Likelihood ratio +
Randomized Clinical Trials
Randomized, controlled clinical trial is the gold standard for evaluating usefulness of a treatment Advantages: - Experimental design eliminates many sources of bias (randomization reduces confounding; blinding reduces misclassification of exposure and outcome) - can determine risks - good for temporal sequence - can be used for rare or common exposures Disadvantages: - expensive - loss to f/u - not good for rare outcomes - can't randomize harmful exposures
Relative Risk (RTC and Cohort Studies)
Relative risk = incidence of disease in the treated group / incidence of disease in the control group RR = 1 indicates no association between exposure and outcome (null hypothesis) RR < 1 indicates exposure is a protective factor RR > 1 indicates exposure if a risk factor Image is of rate ratio
Which measures of variability to use
STD - use when mean is used (and numeric data is symmetrically distributed) Percentiles and IQR - use when median is used (ordinal or skewed numeric data) - can use when mean is used if objective is to compare individual observations with a set of norms IQR - use to describe central 50% of a distribution regardless of shape Range - use with numerical data to emphasize extreme values
Bias
SYSTEMATIC errors by the investigators in sampling, collecting or interpreting data that threaten the internal validity of the study 3 main types: selection, misclassification and confounding
Accuracy of diagnostic tests
Sensitivity and specificity describe the validity and accuracy of the diagnostic test relative to the gold standard.
LR+
Signal to noise ratio (sensitivity/(1-specificity) = Likelihood ratio + - LR+ > 10.0 indicates great diagnostic test (rule in disease) - LR+ < 0.1 rule out disease - LR > 10 or < 0.1 generate large and conclusive changes from pre to posttest probability - LR 5-10 or 0.1-0.2 generate moderate shifts - LR 2-5 and 0.2-0.5 general small changes - LR 1-2 and 1-.5 do not alter probability in any important way Can use the LR+ to compute the predictive value. Compute pre-test odds. Pretest odds x LR+ = posttest odds. Convert posttest odds to PPV (posterior probability)
Surveillance
Surveillance detects the occurrence of health-related events or exposures in a target population. The goal is to identify changes in disease distribution in order to prevent or control these diseases within the population
A study was conducted in children to determine the accuracy of a rapid antigen-detection test (RADT) for diagnosing group A streptococcus (GAS) pharyngitis compared to the throat culture with a blood agar plate(considered the reference standard). Both tests (throat culture and RADT) were administered to 1843 children, 3-18 years of age, in community pediatric offices. Thirty percent of the children had a positive throat culture for GAS and among these 385 had a positive RADT. Among the children who had a negative throat culture for GAS, 28 had a positive RADT. A. What is the sensitivity of RADT? B. What is the specificity of RADT?
TP = 385 TP+FN = all with disease = 0.3*1843 TN+FP = all without disease = 0.7*1843 TN = (0.7*1843) - 28 A. sensitivity = TP/(TP+FN) B. specificity = TN/(TN+FP)
Target population vs external population
Target population: population we want our results to directly effect. - within the target population, we find an actually population from which we can reasonably sample in order to get our study population External population: a larger sea of people we may or may not want our study to apply to
number needed to treat
The NNT is the number of patients who need to be treated in order to prevent one additional bad outcome. NNT is the gold standard of reporting. For ARR, if CI includes 0 then the result is NOT statistically significant.
Key features of Stage III RTCs
The following features add strength (internal validity) to RTC: - prospective design (ensures temporality which is required to determine causality) - intervention / treatment - randomization (controls for confounding) - placebo-controlled - double-blind (controls bias in outcome ascertainment)
Gold standard vs. other diagnostic test
The reference or gold standard definitively informs the presence or absence of disease. Other diagnostic tests have benefits but they're not as accurate. Throat culture (gold standard) vs. rapid strep test Gold-standard tests are typically : -expensive -invasive -not readily available -time-consuming We typically use other diagnostic tests instead of gold standard because they tend to be: -inexpensive -safe and painless -reliable -quick and sample
categorical variables
Two or more groups/categories being measured. Nominal: Descriptive names (no natural order) - Examples: marital status, presence of disease, blood type) - binary data is categorical variable with 2 groups (yes/no) Ordinal: "ordered" data; values with an order; often numeric values but intervals between consecutive values are not equally spaced - degrees of pain (0-10) - Rankin or Likert scales - TNM stages
Effect of underestimation of a RR or OR
Underestimation of a RR or OR for probability of a risk factor biases toward from the null (RR' < RR) Underestimation of a RR or OR for probability of a protective factor biases toward from the null (RR' > RR)
Differential misclassification bias
Unequal misclassification in the groups being compared. Worse than non-differential because one group is favored over the other. The observed effect could be an overestimate or underestimate of the true effect but can't predict. The amount of misclassification depends on whether one is exposed or unexposed to the risk factor, or whether one has/doesn't have the disease outcome. Differential misclassification of exposure: - recall bias, interviewer bias - occurs mainly in case-control and cross sectional studies Differential misclassification of outcome: - observer bias, respondent bias - occurs mainly in cohort and RCTs Example: In a case-control study of SSRIs and congenital birth defects, cases are more likely to report SSRI exposure (recall bias)
T test
Use to assess the association between a continuous variable (outcome) and a binary variable (independent) - to use must follow normal distribution - use to evaluate whether the mean of two groups are statistically different from each other - for the t test statistic the numerator is always the signal (difference you hope to detect) and the denominator is a measure of the variability ***when looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores.
Interpreting confidence intervals
Width of CI - narrow CI implies high precision - wide CI implies poor precision (usually due to small sample size and therefore high variability) Notice whether the interval contains a value that implies no change/no effect/no association - CI for a ratio ( OR, RR, HR): not statistically significant if CI includes 1 - CI for a difference between two means: not statistically significant if CI includes 0
prevalence
amount of disease already present in a population. best used to measure chronic diseases (ie diabetes) -point prevalence is proportion of disease in a population at a point in time -period prevalence is proportion of disease in a population during a period of time
point source transmission
an epidemic in which all cases are infected at the same time, usually from a single source or exposure. ie all people infected at a picnic ate the same food
continous source transmission
an epidemic in which the causal agent (ie polluted drinking water, spoiled food) infects people as they come into contact with it over an extended period of time. (as in the case of cholera discovered by John Snow) ie. multi-state outbreak of Listeriosis linked to Cantaloupes from a farm in CO
person-to-person transmission (propagated epidemic)
an epidemic in which the causal agent is transmitted from person to person, allowing the epidemic to propagate or spread. (ie influenza)
incidence rate
how fast new occurrences of disease arise. best used to measure acute, short duration diseases and/or chronic diseases in large populations over longer times
estimating prevalence from incidence rate
if a disease is rare (very low prevalence) then prevalence is approximately equal to the incidence rate times disease duration this is because for rare diseases, the rate of incidence will approximately equal the rate at which people either die or are cured. Example: Coronary Heart Disease is decreasing in prevalence over time. Why? Mortality rate of ppl with CHD is going down due to improved treatments so disease duration has increased. Incidence rate is decreasing because we are effectively preventing CHD by modifying health behaviors (known risk factors). Prevalence is approx equal to incidence rate * disease duration (for rare diseases)
validity
internal - how accurately study results reflect target population external - "external generalizability" - how generalizable study results are to an external population *differences between study sample and actual population impact statistical inference **differences between study sample and target population introduce bias, which impacts the internal validity of the study (and undermine the study findings) ***differences between study sample and the external population impact generalizability, which hurts the external validity of the study
descriptive statistics for categorical variables
number (N), frequency (%) express by contingency tables or bar charts
Predictive value and prevalence
prevalence is considered prior probability. -as prevalence increases: PPV goes up and NPV goes down -as prevalence decreases: PPV goes down and NPV goes up -prevalence is proportional to PPV -prevalence is inversely proportional to NPV -sensitivity and specificity do not change with prevalence Example: consider breast cancer. prevalence without palpable mass is lower than prevalence with palpable mass therefore with palpable mass (greater prevalence) PPV increases and NPV decreases. Note sensitivity and specificity for a given diagnostic test remain constant regardless of changes in prevalence.
Selection bias in cohort studies
self-selection bias - occurs if people who choose to participate in the study are systematically difference from those who decline to participate **disease status rarely affects participation in cohort because disease is not yet known. Healthy worker effect differential loss to f/u: if the exposure or an outcome makes the study subject less likely to continue participating in the study then results may get distorted - consider cohort study on depression. if people can depressed and stop participating in the study then the number of individuals who are observed to be depressed will be far less than the number of individuals truly depressed. if a'<a then RR'>RR
crude mortality rates
special type of incidence rate
components and types of surveillance
surveillance consists of: -continuous data collection -data analysis -timely dissemination of info -use of data for purposes of investigation or disease control types of surveillance include: -laboratory-based -death certificates -physician notification (reporting system) -hospital discharge summaries -pharmacy records -active surveillance
cumulative incidence (risk)
the likelihood (risk) that an individual will develop a disease. commonly used to measure acute diseases and chronic diseases
age-adjusted mortality rates
to calculate age-adjusted mortality rates must: 1. calculate the age-specific rates of death from people in the study population > age-specific rate of study population = # incidence cases in age stratum/# of study population in age stratum*time) 2. calculate expected number of cases in each age stratum using the number of people from the standard population > expected # of cases = age-specific rate of study population * # of people from standard population in age stratum 3. sum total expected # of cases direct age-adjusted rate = total expected number of cases / total size of the standard population
paired t test
to find a difference between pre and post measurements on the same individual.
t test assuming unequal variance
use sattertwaite to find df. finds to be much smaller than df when you use equation that assumes variances are equal