Epidemiology Midterm
Temporal Relationship
Most important criteria for determining if relationship is casual Major category Must always be met Exposure precede disease development with adequate elapsed time *Latency Period: Time from initial exposure to agent to onset of disease *Incubation Period: Exposure to infectious agents RCT, case-control, and cohort all have temporal relationships -> Cohort most sensitive/liekly to get it wrong *Prospective studies have temporality built in
Case Fatality Rate
Number of individuals dying during a specified period of time after disease onset or diagnosis / # of individuals with the specified disease during this time interval How likely are you to die given that you have a certain disease Measure of how severe it is
Attack Rates
Number of people who develop the disease / total number of people at risk Usually exposure specific Number of people with the exposure who develop the outcome/ total number of people with that exposure Number of people who got sick after eating the cake/ Total number of people who ate the cake
Box and Whisker Plot
Upperhinge=Q3 (find median of data above the median) Median=Q2 Lowerhinge=Q1 (find median of data below the median) H-Spread = interquartile range = Q3-Q1 -Contains 50% of the observations Upperfence= Upperhinge (Q3)+(1.5×H-spread) Lowerfence Lowerhinge (Q1) -(1.5×H-spread) Fences are not drawn, they are guidelines for outliers "Whiskers" are lines drawn to the smallest and largest observations within the fences
Low Incidence and High Prevalence High Incidence and Low Prevalence
nAchronic,incurabledisease,suchasdiabetes,canhavealowincidencebuthighprevalence,becausethediseaseisnotveryfatal—butitcannotbecompletelycuredeither -An acute, curable disease, such as the common cold, can have a high incidence but low prevalence, because many people get a cold eachyear—but itlasts fora short time Prevalence~incidencexdurationofdisease -Higherincidenceresultsinhigherprevalence -Longerdurationresultsinhigherprevalence
Compliance
nComplianceisthewillingnessoftheparticipantstocarryouttheproceduresaccordingtotheestablishedprotocols(adherence) nDrop-outsaretheparticipantswhodonotadheretotheexperimentalregimen nUsually result of side-effects, but theresalso a fair amount of people that just aren't functinoal nDrop-insaretheparticipantswhodonotadheretothecontrolregimen nPeople who were assigned to placebo that somehow get their hands on the drug
Numerical Variables
nCounts -Numbersrepresentedbywholenumbers • e.g. numberofbirths,numberofrelapses nIntervals -Equal distances between values, but the relation to the zero point is arbitrary • e.g. IQ test score of 100, 110, 120..., some pain scales nRatios -Equal intervals between values, and the relationships between numbers and the zero point are meaningful • Example: weight, pulse rate
Transmission of Agents
nDirectcontact(person-to-person) nSkin, saliva, sexual contact, sneezing or coughing -Polio,hepatitis,HIV,influenza - nIndirectcontact nVia a vector- anorganismcarriesdisease-causingmicroorganisms,suchasmosquitoor deer tick nVia environmental factors - dustparticles,air,food,water -Fomites-inanimate objects that can carry disease causing microorganisms -e.g.,toothbrush, cutting board, toys... -common cold, conjunctivitis, croup, E.coli infection, influenza, lice, rotavirus diarrhea, strep
Selection of Subjects: Population
nGeneral population -Whole population in an area -A representative sample nSpecial population: nSelect group noccupation group / professional group nExposure groups nPerson having exposure to some physical, chemical or biological agent ne.g. X-ray exposure in radiologists
Herd Immunity
nHerdimmunityistheresistanceofagrouptoattackfromadiseasetowhichalargeportionofmembersareimmune,reducingthechance that an infected individual comesincontactwithasusceptibleone nThediseaseisrestrictedtoasinglehostspecieswithinwhichtransmissionoccurs -For example, small pox in human; no other reservoir nThereisdirecttransmissionfromonememberof thehostspeciestoanother(directcontactonly) nInfectionsmustinducesolidimmunity(alsofrom immunization) nHerdimmunityleveldiffersforvariousdiseases -Itisestimatedthat94%ofthepopulationmustbeimmunebeforemeaslescanbecontrolled -Forsmallpox,itwasaround84%
Hawthorne Effect
nIn 1955 Henry A. Landsberger, was a researcher at Hawthorne Works near Chicago nStudy whether lighting intensity in factory affected productivity nAny change in light (switch from bright to dim and dim to bright) increased productivity n...because workers knew they were being watched n nIn epidemiology, when people know they are being observed their behavior or reporting may change
Selection of Comparison Group: Cohort Stduy
nInternal comparison nOnly one cohort involved in study nSub classified and internal comparison done nExternal comparison nMore than one cohort in the study for the purpose of comparison ne.g. Cohort of radiologist vs. gynecologists nComparison with general population rates nIf no comparison group is available we can compare the study cohort with the general population. ne.g. Uranium miners vs. in general population
Survival Curves
nOften used in follow-up studies to display the proportion of one or more groups still alive at different time periods. nSimilar to the axes of the cumulative frequency plots nX-axis shows time intervals nY-axis shows percentages, from 0% to 100%, still alive. -Don't have to measure death, just time to event -Does nicotine patch therapy help pregnant smokers quit during pregnancy?
Cumulative Frequency Plots
nPlots the cumulative frequency rather than the actual frequency distribution of a variable. nUseful for identifying medians, quartiles, and other percentiles. nX-axis records class intervals nY-axis shows the cumulative frequency either on an absolute scale (e.g., number of cases) or, more commonly, as percentages from 0% to 100%. -Does this new type of Computer Tomography (CCTA) reduce length of hospital stay compared to standard of care to rule out heart attacks in ER patients admitted with chest pain?
Cohort Study Follow Up
nTo obtain data about outcome to be determined (morbidity or death) nMailed questionnaire, telephone calls, interviews nPeriodic medical examination nReviewing records nSurveillance systems or death records n nFollow-up is the most critical part of the study nSome loss to follow-up is inevitable due to death, change of address, migration, change of occupation. nLoss to follow-up is one important draw-back of cohort studies.
Placebo
A medical treatment (therapy, chemical, pill) which is administered as if it were a therapy, but which has no therapeutic value Placebo Effect nA response (good or bad) to a placebo, due to expectations of possibly taking an active drug nBecause a person expects a pill to do something, their body's own chemistry may cause effects similar to what the active drug might have caused Common Examples: Depression, pain, sleep disorder
A representative sample of residents were telephoned and asked how much they exercise each week and whether they currently have (have ever been diagnosed with) heart disease. (b) Occurrence of cancer between March 1991 and July 2002 was compared between 50,000 troops who served in the first Gulf War (which ended February 1991) and 50,000 troops who served elsewhere during the same period. (c) Persons diagnosed with new-onset Lyme disease were asked how often they walk through woods, use insect repellant, wear short sleeves and pants, etc. Patients without Lyme disease from the same physician's practice were asked the same questions, and the responses in the two groups were compared. (d) Subjects were children enrolled in a health maintenance organization. At 2 months, each child was randomly given one of two types of a new vaccine against rotavirus infection. Parents were called by a nurse two weeks later and asked whether the children had experienced any of a list of side-effects.
A representative sample of residents were telephoned and asked how much they exercise each week and whether they currently have (have ever been diagnosed with) heart disease. Cross-sectional study - both exposure (exercise) and outcome (heart disease) are ascertained at the same time. (b) Occurrence of cancer between March 1991 and July 2002 was compared between 50,000 troops who served in the first Gulf War (which ended February 1991) and 50,000 troops who served elsewhere during the same period. Cohort study - participants are grouped by exposure (location of military engagement) and followed forward in time for subsequent outcome (cancer development). Knowledge/documentation of the exposure predates the outcome assessment. (c) Persons diagnosed with new-onset Lyme disease were asked how often they walk through woods, use insect repellant, wear short sleeves and pants, etc. Patients without Lyme disease from the same physician's practice were asked the same questions, and the responses in the two groups were compared. Case-Control study - participants are grouped by outcome (Lyme disease status). Individuals with and without the outcome were asked about their exposures in the past (several behaviors that increase mosquito exposure). (d) Subjects were children enrolled in a health maintenance organization. At 2 months, each child was randomly given one of two types of a new vaccine against rotavirus infection. Parents were called by a nurse two weeks later and asked whether the children had experienced any of a list of side-effects. Randomized controlled trial (experimental study) - patients are randomized to an exposure (type of rotavirus vaccine) and followed forward in time for outcomes (side effects). Important: exposure is allocated by investigators - not chosen by participants
Cross-sectional Study
A study in which a representative cross section of the population is tested or surveyed at one specific time. *Not allocating the exposure, we are asking for exposure and outcome, at the same time (Outcome ascertained at same time as exposure) *Usually given like a survey *looking for PREVALENCE (whats currently going on right now)(Doesn't look at risk/incidence) *Can't be used to establish temporal relationship (don't know which one came first) Advantages: -Fast/Inexpensive - no waiting! (short-term) -Fewer resources needed, less statistical analyses, less complex design -No loss to follow-up -Associations can be studied -Provide relationship between attributes of disease and characteristics of various groups) *Data good for planning health services for prevalent cases *compare either exposed to non-exposed OR those with outcome and those without it Disadvantages: -Prevalence measurements only (no incidence, no risk) -Cannot determine directionality (because measured at same time) and, thus, causality/ no casual evidence (we don't know if A caused B or if A came before B) -Exposures and Outcomes must be reasonably common (What if you are interested in a rare outcome?) *Represent only those who are studied, may not be representative *Healthy person effect: people who are ill may not volunteer *Start with defined population, gather data, and have 4 possible outcomes: 1. Exposed with outcome 2. Exposed, no outcome 3. Not exposed, have outcome 4. Not exposed, don't have outcome Can be split 2X2 table two different ways: *Does prevalence of outcome/disease differ with exposure status? Calculate the prevalence of the outcome: a / a+b VS c / c+d *Does the prevalence of exposure differ with outcome status Calculate the prevalence of the exposure: a / a+c VS b / b+d *VA Cooperative study on prevalence of Hep C (You have Hep C, now have you been in combat?) Examples: National Survey studies (population based survey, collect information on health and other stuff, National Health Interview Survey, China's national prevalence survey...) *Basically just descriptive studies *Cross-sectional studies with specific exposures or disease outcomes ( National study on views on sexual violence)
Comparing Mortality rate and Case Fatality Rate
Assume a population of 100,000 people at midyear -20aresickwithdisease"X" -Inoneyear,18diefromdisease"X" - nThemortalityrateinthatyearfromdisease"X" 18/100,000 =0.00018(or0.018%) Case fatality rate from"X"= 18/20 = 90%
Procedure of Cohort Study
Identify your exposed group and not exposed group In each group, look at who develops disease, and who doesn't develop disease Exposed and develops disease (a) Exposed and does not (b) Not exposed and develops disease (c) Not exposed and no disease (d) Find a/a+b = e Find c/c+d = f e/f = risk ratio Also take a/ person-years of follow up and c/ person years of follow up (its a ratio, so it should be the same) If your risk ratio is 1.6, 60% increase in heart disease in smokers compared to non-smokers -Is it significant? lower bound has to be greater than 1 in 95% C.I
Empirical rule
If distribution approximates a bell-shaped, normal distribution nabout 68% of data within 1 std. dev. of mean nabout 95% of data within 2 std. dev. of mean nall or nearly all data within 3 std. dev. of mean
Selection Bias
Due to inappropriate cases/controls (non-responders, self-selection) In case control studies: Those with riskiest behavior may not respond or enter study In cohort studies: "Loss of follow up" Self selection: Don't use family members in study, likely to have similar exposures
Confounding
Due to inherent differences between study groups other than the exposure of interest *A third factor"mixed" in effecting the distribution of disease and exposure among study groups *Independently associated with both the exposure and the outcome *Must be risk factor for outcome and must be associated with exposure (positive or negative) *Can be adjusted for in analysis and prevented by good study design Is it a confounder? Take the 2X2 table apart, and calculate odds ratios separately If either OR = 1, there is no relationship Controlling for confounding: Study design (randomization, restriction, matching major confounders, everything else we have to adjust for) Analysis: Stratification, multivariable analysis
Masking and Blinding
-Masking or blinding is used to increase the objectivity of the persons dealing with the randomized study (topreventprejudice) -Subjects who can be masked/blinded -1. Study participants -2. Data collectors/outcome assessors -3. Caregivers/ investigators -4. Data analysts Level of masking/blinding -Non-blinded(open) -Single,Double,Triple
Lemeshow Table
# of Patients per group estimated via Lemeshow table or online tool Example: 90% power (Beta = 0.1) to detect a treatment effect at a significance level of alpha = 0.05 X-axis = proportion developing outcome in the lower of the groups Y-Axis: Difference in proportion of outcome Lets say positive outcome: IN placebo, 10% would get better, I want at least 30% in upper group to get better, so that's a 20% difference even without drug benefits I have a 90% power So, you want 30% with the drug to show positive effects (30%-10% = 0.2 on Y axis) Did number of people match the amount that you have? "our study was underpowered" is NOT a valid excuse to interpret findings that don't reach statistical significance as clinically important
State the measure of disease occurrence (incidence proportion, cumulative incidence, incidence rate, point prevalence, or period prevalence) that is given or can be calculated from the information provided for each study. (a) Investigators performed a one-time f of 500 residents of California and asked whether respondents currently had a "common cold". Eighty residents responded "yes". (b) Investigators performed a one-time survey of 500 residents of California and asked whether respondents have ever had shingles. One quarter of the residents responded "yes". (c) Among 600 initially HIV-negative men, 15 men acquired HIV infection during follow-up. The mean follow-up time for the 600 men was 6.5 years. 85 men were lost to follow up. (d) The probability that a female resident of New Hanover County, NC experienced hot flashes during the year 2015 was 18%. (e) Based on student health center records, 1 in 53 UNCW students developed infectious mononucleosis (common referred to as Mono) within the 2015/2016 academic year.
(a) Investigators performed a one-time survey of 500 residents of California and asked whether respondents currently had a "common cold". Eighty residents responded "yes". Point Prevalence (b) Investigators performed a one-time survey of 500 residents of California and asked whether respondents have ever had shingles. One quarter of the residents responded "yes". Cumulative incidence, aka life-time incidence (c) Among 600 initially HIV-negative men, 15 men acquired HIV infection during follow-up. The mean follow-up time for the 600 men was 6.5 years. 85 men were lost to follow up. Incidence rate (d) The probability that a female resident of New Hanover County, NC experienced hot flashes during the year 2015 was 18%. Period prevalence (e) Based on student health center records, 1 in 53 UNCW students developed infectious mononucleosis (common referred to as Mono) within the 2015/2016 academic year. Incidence proportion
(Sample) Size matters and number of patients in RCTs is estimated based on:
-An estimate of the outcome frequency in one of the groups -Baseline effect, against we measure our intervention -The meaningful difference in outcome rates -Clinically significant difference that is worth detecting -Level of statistical significance (α), usually set at 0.05 -Our willingness to allow for a false positive (Spurious association) -The desired power (1 − β), usually set at 0.9 or 0.8 -Our ability to find an effect, should it indeed exist
Failing to independently analyze quality of evidence
(different study designs have different quality, media needs to address this) *Question mark journalism: You can take anything and put a question mark behind it (no real story) *Is the study even done in humans? (or was it done in mice) Usually phase 1 and phase 2 (This really means nothing-> just say the drug isn't lethal) *If you see phase 1 or 2, study is failing to analyze quality *For every 5000 compounds in pre-clinical trials, 5 make it to human trials *Phase 1: figure out pharmacology, how high of a dose we can give (not really a big deal), there is no outcome in a phase 1 trial except we can apply a dose that doesn't kill you A phase one trial doesn't look at efficacy or effectiviness Phase 2: Stage of drug testing where drugs go to die -Give to VERY few people, and hope the drug does something -No placebo group/control group, no randomization, no blinding *Only 8% of new drugs in phase 1 studies ever get approved If something is really a breakthrough, it should move through phase 3 (where we get placebo and control and stuff)
Pearson's Correlation Coefficient (r) A study reports a Person's r of 0.5. What percent of the variability in the outcome can be explained by the exposure variable?
*Measure of linear correlation (dependence) between 2 variables X and Y (+1, -1) +1: higher independent = total positive correlation = higher dependent 0 = no correlation -1 = total negative correlation = Higher independent variable becomes, lower dependent becomes *r is NUMERICAL AND CONTINUOUS r^2 -How much of trend is captured -What proportion of your outcome variable can be explained by exposure variable -Tells you magnitude of relationship for example, if 2^2 = 0.22: 22% of variability in dependent variable can be explained by variation -78% not explained by that A study reports a Person's r of 0.5. What percent of the variability in the outcome can be explained by the exposure variable? 0.25%
Recall Bias
*Most common type of bias in case-control studies Information on some past exposures depends on memory of events from both cases and controls (often inadequate or limited) If you are healthy, you aren't really giving. Your life choices a lot of thought Cases more likely to remember everything that is bad more vividly Occurs when recall is better among cases than controls because of presence of the disease (false association may be found) If you just gave birth to a healthy baby, you probably aren't thinking about that small infection you had when you were pregnant , less likely to recall -Makes it look like there is a relationship, but there isn't Try and phrase question in a way that makes both cases and controls THINK about the answer
Why is randomization and use of control (placebo) ethical?
-Justification of no treatment, placebo or SOC control -Clinical Equipoise: Genuine uncertainty within the professional community as to which treatment arm is superior Justification of randomization: Uncertainty principle -Physicians who are convinced that one treatment is better than the other cannot ethically chose at random how to allocate treatment.
Phases in Testing of New Drugs (Won't be on the exam)
-Phase I studies (pharmacologic studies): -Test new treatment in a small group of 20-80 people (frequently healthy volunteers) to evaluate its safety -Can we give drug without hurting the person - Determine levels of toxicity, metabolism, pharmacological effect (what does this drug actually do, and is it what its supposed to), and safe dosage range -Identify side effects -What is the highest dose I can give the drug without people getting sick -No randomization/Open label trial (not blinded) -Phase II studies (efficacy studies) (Efficacy means what is the effect in the people who actually take the drug) -The drug or treatment is given to a larger group of people (100-300) for efficacy and to further evaluate its safety -Generally no control groups, not randomized, not blinded -Best case scenario, what is drug going to do, does it do good? -How big is the effect? If I give the drug to 100 people, how many people benefit? Lets us power phase 3 study accurately -Most drugs "die" in phase 11 -Phase III studies (effectiveness studies)(Effectiveness has intent-to-treat analyze, still count people who don't take drug)(Randomized control trial) -The treatment is given to a large group of people (1,000-3,000(half placebo)) in a randomized conrolled trial to confirm its effectiveness, compare it to commonly used treatments, and monitor side effects -By the end of phase three, only like 1500 people have taken the drug Phase IV studies (post-marketingclinicaltrials) -Observational studies -Not actually a trial -The treatment is monitored to gather more information on risks, benefits, and optimal use -How does health outcome of people who took the drug compare to those who didn't *Phase 1: figure out pharmacology, how high of a dose we can give (not really a big deal), there is no outcome in a phase 1 trial except we can apply a dose that doesn't kill you A phase one trial doesn't look at efficacy or effectiviness Phase 2: Stage of drug testing where drugs go to die -Give to VERY few people, and hope the drug does something -No placebo group/control group, no randomization, no blinding *Only 8% of new drugs in phase 1 studies ever get approved If something is really a breakthrough, it should move through phase 3 (where we get placebo and control and stuff)
Quality of Data Sources (In descending order)
-Randomized clinical trials (most casual) -Cohort studies -Case-control studies -Time-series studies (descriptive) -Case-series studies (descriptive)
Elements of Cohort Study
-Selection of study subjects -Obtaining data on exposure -Selection of comparison group -Follow up -Analysis Both the cohorts are free of the disease at the outset (because both have to be at risk). -Both the groups should equally susceptible to disease -Both the groups should be comparable in all factors except the variable under investigation -Diagnostic and eligibility criteria for the disease should be defined well in advance.
Treatment A vs. Treatment B
-Superiority trials(A is better than B) -Equivalence trials (A is just as good as B) -Non-inferiority trials (A is no worse than B by a pre-specified amount)
Public Health "Work flow" Using quantitative methods to address public health problems
1. Address the Public Health Problem Generate a hypothesis (step1)(hastobebasedon existing data) Based on scientific rationale orBased on observations or anecdotal evidence or - Not always scientific evidence already around to support something - You can initiate a study just because people are talking about something Based on results of prior studies 2. Conduct a study Descriptivestudy quantifies the extent of a disease and monitors specific diseases over time, by geographical location or population subgroup (go through records) - Observational studies investigate association between an exposure and a disease outcome (How big of a deal is this) • Rely on "natural" allocation of individuals to exposed or non-exposed groups - Experimental studies also investigate the association between an exposure, often therapeutic treatment, and disease outcome Individuals are "intentionally" placed into the treatment groups by the investigators Clinical trials 3. Collect the data Numerical facts, measurements, or observations obtained from an investigation to answer a specific question 4. Assess the strength of evidence for/against a hypothesis Inferential statistical methods provide a confirmatory data analysis Generalize conclusions from data from part of a group (sample) to the whole group (population) Assess the strength of the evidence Make comparisons Make predictions Ask more questions; suggest future research 5. Recommend interventions or preventive programs The study results will support or refute the hypothesis, or sometimes fall into a grey area of "unsure" § The study results appear in a peer-review publication and/or are disseminated to the public by other means § As a consequence, the policy or action can range from developing specific regulatory programs to general personal behavioral changes
Elements of Meta-Analysis
1. Formulate review question: Determines inclusion and exclusion criteria, population of interest, specific exposure and intervention, outcome of interest, which study design is appropriate... 2. Write protocol 3. Search for and include/exclude primary studies: Literature search (computerized bibliographic databases, review articles (just an opinion), lots of potential for bias, also search "gray" literature: Conference proceedings, Dissertations, Books, Experts, Granting agencies (prelimary data gathered to try and get funding) , Trial registries, Industry 4. Asses study quality / Analyze quality : Assign a numerical composite score *Jadad: Convert quality of paper into numerical score (Jadad for RCTs: Randomization (up to 2 points), double blinding (up to 2 points), withdrawals (give 1 point for description of withdrawals and drop outs) ) 5. Extract data and analyze data *Inclusion and exclusion criteria defined at beginning, during design stage (Factors determining inclusion are: study design, population characteristics, type of treatment or exposure, outcome measures, Quality of the study: (Did they actually calculaterisk ratio, adjust for stuff, etc.)(pre-specified)) *Fixed-effect vs random effect models 6. Interpret results and write report Not all studies in the composite will count equally: There will be strict inclusion or exclusion criteria, but not all included will count for the same
Lessons from Snow
1. Successful epidemiological investigations start with rational, testable hypotheses Based on review of what was known about cholera, Snow formed a hypothesis: "Sewage in drinking water causes cholera" The prevailing belief of "miasma" theory was neither rational nor testable 2. Ecological studies are "quick and dirty", and can generate testable hypotheses Data was easy to ascertain Snow's ecologic studies supported his hypothesis Prompted him to perform individual-level analysis "Long shot" hypotheses can quickly be addressed 3. A well-defined study population is key for valid and generalizable results Snow identified a study area and study population that: provided an excellent exposure contrast (high vs. no exposure) minimized confounding by SES factors districts with intermingled supplies of high and low contamination serving homes in a virtual random fashion - natural randomization experiment 4. Association does NEVER imply Causation Farr's data on elevation was correct Londoners who lived at lower elevation indeed developed cholera at higher frequently In fact, changing elevation would have been an effective public health measure Contaminated drinking water was a strong confounder of the elevation - cholera association. 5. Epidemiological studies serve to inform public policy, with the ultimate goal of disease prevention (and control) As a physicanand fellow Londoner Snow wanted to instate public health measures that would prevent further outbreaks Epidemiologic investigations have disease prevention (or control) as direct or indirect goal 6. Public health priorities in the U.S. have shifted from preventing and controlling infectious diseases to chronic diseases.
Bradford Hill Criteria for Causal Associations (1965)
1. Temporal relationship (Major categories) 2. Biologic plausibility (Major categories) 3. Replication of the findings (can findings be replicated in different populations/by using various study designs, do multiple studies agree?)/ 4. Consistency with other knowledge (in vitro and animal studies, other studies like ecological and cross-sectional, other types of data like sales data, time trend...) (Major categories) 5. Extent to which alternate explanations have been considered (Adjustment for confounding, randomization in RCT, confounders measured and adjusted for in observational studies)(Major categories)(CONFOUNDERS) 6. Dose-response relationship (Risk increases with increasing exposure, but absence of this does not preclude casualty)(Theres usually some type of threshold, above which any further increase in dose will not result in increased effect)(J-Curve doesn't strengthen argument, but doesn't hurt it either)(Minor categories) 7. Strength of the association (Measures of association: Relative risk and Odds Ratio)(Stronger association more likely to be casual, but weak association can still be, Consistency among findings is more important than magnitude)(Minor categories) 8. Effect of removing the exposure / Cessation Effects (Presence of this supports causal association, but absence does not preclude it)(After quit smoking, risk of lung cancer drops)(Minor categories) (what happened to the people who started exercising, but then stopped) 9. Specificity of the association (Removed in modification in 1990's)(Said that one exposure is specific to one disease, but this isn't true because disease may be caused by several exposures, and an exposure may cause several disease)
3 Principles to observe in case-control studies (Case control are really susceptibleto bias Up to the investigator to eliminate/reduce bias)
1. The study base principle: -Controls must be representative of study (controls must represent the prevalence of the exposure in the study base) -Leads to selection bias if not followed Goal of study base principle: sample controls from representative study base in which cases arose in unbiased sample/fashion -Controls serve as proxy for complete study base (should be representative of exposure prevalence) -Key issue: unbiased sampling (solve this by randomly sampling controls from study base) -If you go to a fitness study to find a representative sample, you would probably find overall healthier people, and that's not representative of all people. 2. The deconfounding principle / confounding principle: Take care of all of your confounders -Make sure results are due to exposure, not confounders -Cases and controls can be restricted to, matched on major confounders and the analysis adjusted for additional ones -Bias occurs when we neglect to adjust for major confounders 3. The comparable accuracy principle/ the comparative accuracy principle: Exposures must be measured with same accuracy in both cases and controls -Violations lead to information bias (Using proxy responders (such as friends or family members) to ascertain the exposures for dead cases, while using living controls is poor practice, because proxy responders tend to overestimate good health practices and downplay bad ones) -Common cause of spurious association(we found an association but there actually wasn't one) If the principles of study base comparability, deconfounding, and comparable accuracy are followed, then any effect detected in a study should (hopefully!) not be due to: -Differences in the ways cases and controls are selected from the same base (selection bias) -Distortion of the true effect by unmeasured confounders (confounding bias) -Differences in the accuracy of the information from cases and controls (information bias)
Beer Consumption and Premature Mortality in North Carolina: An ecologic study Methods: We conducted a cross-sectional ecologic study using tax data on the sales of beer and mortality data for each of the 100 counties in North Carolina in 2014. Results: There was a statistically significant correlation (r = 0.24) of beer consumption and premature death (compared to state average) among the counties of North Carolina. 1. Did counties with higher beer sales have higher or lower rates of premature death in this study? 2. Name twomajor limitations of interpreting aggregate level data in ecologic studies using this example and discuss how these limitations effect our ability to interpret the findings 3. Formulate an interpretation of these findings that falls prey to the ecologic fallacy.
1. Yes, looking at r = 0.24, there is a high premature death rate in counties with higher beer sales 2. Aggregate level data doesn't tell you anything about the individuals (you don't know if the people who bought beer drank) Also, this data did not adjust for any confounders They didn't consider any other type of alcohol consumption 3. As long as you say that counties with high beer sales have high rates of death you are fine, but if you say that drinking more beer increases your chance of dying early, this falls prey to ecologic fallacy
A study was conducted in men aged 40-70 in order to determine whether exercising for 2 or more hours per week decreases the likelihood of heart attack. The cases were 1,000 men who had recently had a heart attack; of these, 236 reported that they had regularly exercised for two or more hours per week prior to their heart attack. 1,000 controls were also selected for the study; of these, 379 reported that they exercised regularly. Calculate the magnitude of association between regular exercise and heart attack. What does your calculation suggest?
1000- 236 = 764 236/764 = 0.31 1000-379 = 621 379/621 = 0.61 0.31/0.61 = 0.5 People who exercise regularly have about 0.5 times the risk of having a heart attack compared to people who don't exercise.
In a study that investigated the association between television viewing and smoking uptake teenagers, the following statement appeared in the publication of the study results: "We found that every additional hour of television viewing was associated with a 98% increase in risk of smoking uptake (RR=1.98, 95% CI 1.02 - 1.15)." "A similar association between television viewing and the onset of alcohol use has been reported..." "In this study, the median follow-up period was 2 years." "The association was substantial, with youth who watched >6 hours per day being 5.87 times as likely to initiate smoking than youth who watched 0 to 1 hours per day." "...Television provides adolescents with role models, including movie and television stars and athletes, who portray smoking as a personally and socially rewarding behavior." "Social or behavioral interventions that reduced the time watching television, were associated with significantly lower rates of smoking uptake." "These results were additionally adjusted for age, sex, race and physical activity." Which of the criteria for causal associations is being directly addressed in this statement?
A) Dose response B) Consistency and replications C) Temporality (did exposure come before outcome)(did death come after exposure) D) Strength of association E) Plausibility F) Cessation Effects (what happened to the people who started exercising, but then stopped) G) Consistency and replications H) Alternative Explanations
Study Power
Ability to detect an effect, should one actually exist False positive (Spurious association) worse than false negtvie If P value or significance finding is tiny, you have a very large population probably Limit of people you can use is set by your power (if you say 80% power, you get a number of people that. You can enroll in a study Observation: Treatment has effect (Reject H0): Truth: Treatment has effect: Good choice! Treatment has no effect: Type 1 error (False positive, finding something that is not there, much worse then type 2) Observation: Treatment has no effect (Not reject H0) Truth: Treatment has effect: Type 2 error (false negative) Treatment has no effect: good choice Type I = α= 0.05 Type II = β= 0.2 or 0.1 Power = 1 - β
The comparable accuracy principle
Accuracy of exposure measurement must be same in cases and control -Example: in a study of the effect of smoking on lung cancer it would not be appropriate to measure smoking with urine cotinine levels in the cases and with questionnaires in the controls Bias caused by differential errors in measurement of cases and controls should be eliminated (use same measurement tools same way) Blind interviewers / abstractors
For each variable, fill in the last column of the table with a description of how this variable was measured in table 1. In other words, what type of random variable is displayed? Age and Weight: Race: Education: Prior Hospitalization: Post-treatment Mortality:
Age and Weight: Numerical, continuous (age at time of diagnosis) Race: Categorical, nominal Education/Group of age ranges (50's and 60's, 30s and 40s...): Categorical, ordinal Prior Hospitalization: Numerical, discrete Post-treatment Mortality/pre menopausal and post menaposual women: Categorical, binary
A case-control study compares prior exposures between _______________ of the study base. In a case-control study investigating the association between a cigarette smoking and diabetes, who is considered a case?
All cases and some controls Diabetics who either do or do not smoke
Which of the following risk factors will likely never be studied in a randomized controlled trial, meaning that observational data from cohort and case-control studies is all we will ever have to make our health decisions? Cigarette smoking Seatbelt use Condom use Recreational drug use
All of the above
Randomized Controlled Trial
An experimental study in which researchers randomly assign individuals to either an experimental or a control group and expose the experimental group to the manipulated variable of interest. -gold standard for causal inferences *Randomized control trials are the only trials where we think the exposure and observation are casually related Observational studies cant determine causality Investigator controls the predictor variable (intervention or treatment) Major advantage over observational studies is ability to demonstrate causality Randomization controls unmeasured confounding Very expensive - Only for mature research questions -Intention to treat analysis -"As we randomize, so we analyze" -Include all persons assigned to intervention and control groups (including those who did not take intervention or dropped out) -Randomization distributes all confounders (measured and unmeasured) equally between the groups -Throwing out data points breaks randomization -It would be no better than a observational study gold standard for causal inferences
Bias
Any systematic error in design, conduct, or analysis of study that results in mistaken estimate of an exposure's effect on the risk of disease *Statistics can't fix bias in the design *can be avoided *Bias has magnitude and direction (towards or away from null hypothesis) The direction of bias is away from the null if more cases are considered to beexposed or if more exposed are considered to have the health outcome A bias away from the null would mean that the data is indicating a stronger association than actually existsin real life. Towards the Null (towards no difference) means that the value is close to the null value of the effect measure. For example, the value would close to 1 if you're using the OR or RR. If the bias is non differential (both groups have it), its fine Differential vs non-differential Non-differential: *Proportion of missclassficaiton between groups is equal *10% of smokers tell you they are non-smokers, and 10% of non-smokers tell you they are smokers *Increases the similarity between the two groups *Bias is "toward the null"or the true effect is diluted *Harder time finding the difference, but not that big of a deal *both groups only remembered 50% of infection, which is really bad recall, but doesn't matter for the study because both groups are the same Differential: Proportion of misclassification is different among 2 groups -> Artificially decreases similarities of the two groups (You are going to see differences that don't exist (bad)) -Bias is "away from the the null"(under or over estimates) Did you find an association? Yes Direction of bias: Away from Null Spurious conclusion (type I error) No Toward Null Might have missed a real effect (type II error)/False negative
Arithmetic VS Semi-log graphs
Arithmetic graph: LOOKS AT MAGNITUDE -Both axes use arithmetic scales -Illustrates the absolute magnitude of the change over time Semi-logarithmic graph: -The x-axis is arithmetic scale, the y-axis logarithmic. The slope of the line indicates the rate of increase or decrease. -A horizontal line indicates no change. -A straight line indicates a constant rate (not amount) of change. -Parallel lines show identical rates of change.
Hierarchy of Study Types
Descriptive: Trends used to generate hypotheses about novel associations -Surveillance data/ case reports Analytic: Attempt to establish a casual link between a exposure/risk factor and an outcome -Greater than, leads to, compared with, more likely than, associated with, related to.... Split into observational (cross sectional, case-control, cohort) and experimental (Randomized controlled trials)
Chance Findings
Caused by random error Result of imprecise measurements Errors from chance will cancel each other out in the long run Large sample sizes can do away with random error/minimize random error P value of 0.05: 5% chance that two correlate by random chance *Chance is a random error, bias is a systematic error
Information Bias
Controls were not representative of the study base *Includes recall bias, interviewer bias, misclassification bias The assumption is that the red line is the amount of coffee consumed by a normal population BUT actually, thecases were representative of the normal population (green line) The controls had an unusually low level of coffee drinking. Cases drank just as much coffee as normal, so they actually proved nothing
Immediate Cause of death VS Underlying Cause of death
Captures immediate causeofdeath(very few options, really only heart stops or stop breathing...)andtheunderlyingcauseofdeath (source: car crash) nTheunderlyingcauseofdeathisthediseaseorinjurythatinitiatedthesetofeventsleadingtodeath
The deconfounding Principle: 3 ways to eliminate confounders
Confounding effects by factor can be death with by eliminating variability in that factor via: 1. Restriction of cases and controls to presence or absence of risk factor (gender) 2. Matching cases and control on confounder (age) 3. Adjusting final analysis for effects of possible confounders (Statistical adjustment on computer) Men can get breast cancer, but its REALLY unlikely (1%), so you wontfind a study mixing and matching females and males with breast cancer (restriction) If you are only looking at a high riskpopulation, you would do this as well (Young women more likely to go to tanning salons: restrict to that population when testing for skin cancer) Matching: You cantmatch on exact smoking status (exactly how many per day) makes it difficult, so you cant match on these, you would just adjust -Age and race are easy to match on, for most other things, just adjust When done appropriately, matching and adjusting are both just as correct, one is not better than the other Confounders: Independently associated with exposure and outcome Tannign bed and skin cancer in young women: Restrict to young women (age and sex) Match on race (have toself-identify to one race) Adjust for occupation (working outside a lot and are more likely to use tanning bed vs inside), how much time you spend outside
Media Dissemination: Most Common Flaws
Conveyed a certainty that doesn't exist: 1. Exaggerated effect size 2. Used causal language to describe observational studies: In observational studies we always have residual confounding We talk about associations, not causations *Example: language like "raises breast cancer risk" in an observational study implies causation. All news repots on observational studies should have disclaimer: Observational study, so no definitive conclusion can be drawn about cause and effect 3. Failed to explain limitations of surrogate markers/endpoints (Intermediate endpoints are over sold, easy to study)(maybe you dont want to wait until someone dies, maybe you can just look at tumor shrinkage as indicator for results to come)(If you want to know if a drug will save your life, you cant look at intermediate endpoints like BP or cholesterol, weight loss) 4. Single source stories with no independent perspective (reporter didn't actually read the story)(Based on anecdotal evidence)(SOMETIMES RELYING SOLELY OR LARGELY ON NEWS RELEASES, PUBLIC RELATIONS ANNOUNCEMENTS, OR ANECDOTES)(look for terms like 5. Failed to independently analyze quality of evidence (different study designs have different quality, media needs to address this)
Steps in the Paradigm of Public Health
Define the problem How big is it, how bad is it § Measure its magnitude § Understand the key determinants Develop intervention/prevention strategies Set policy/priorities § Implement and evaluate § Did it work? § We are good at implemetnign new programs, bad at admitting when they don't work
Mortality Rate
Total number of deaths from all causes in one year/ Number of persons in the population at midyear Denominatorbecomespersons per year (person-years), andmortalityratecanbeconsideredasa"rate"(rate deals with time)andnota"proportion" -Even though the number of persons at mid year is used in the calculation
Analysis (Relative Risk or Attributable Risk)
Direct comparison of incidence proportions (or rates) among exposed and non exposed groups using a Risk Ratio (RR) -Attributable Risk (how much of the risk of developing an outcome can be attributed to a risk factor) -What proportion of the overall disease is due to this particular risk factor AR= Incidence in exposed - Incidence in non-exposed / Incidence in exposed (Exposed (disease develops) / All who have disease develop (exposed + non exposed)) - (Exposed (No disease develops) / All who don't have disease develop (exposed + non exposed)) DIVIDED BY (Exposed (disease develops) / All who have disease develop (exposed + non exposed)) Practice: Smoker with Cancer A= 70 Smoker with no-cancer: B= 6,930 Non-Smoker with cancer: C=3 Non-smoker without cancer: D= 2,997 Total: Smoker: 7,000 Non-Smoker: 3,000 Cancer incidence in smokers = 10 per 1000 Cancer incidence in non-smokers = 1 per 1000 RR = 10 / 1 = 10 Thus, cancer is 10 times more common among smokers than non-smokers. AR = 10 - 1 / 10 = 0.9 or 90 % Thus, 90% of the cases of cancer among smokers are attributed to their habit of smoking.
Adjustment Procedures for Rates
Direct method of adjustment: Observed,stratum specific rates are applied to common age structured population/ common large reference population -Generally fictitious (2 populations may be combined)(direct adjustment) -Often last census population is used (indirect adjustment) Indirect method: Stratum specific rates of an existing reference population used to calculate the number of expected deaths int he observed population Calculation of adjusted (standardized) rates allows comparison of event rates between populations when there are differences in characteristics between the populations that may influence the event Summary of Direct method of adjustment -Adjusted rates are index measures, the magnitude of which have no intrinsic value -Adjusted rates are useful for comparison purposes only -The choice of the reference population is important -It should not be abnormal or unnatural -Adjustment (standardization) is not a substitute for the examination ofage-specific rates in the populations of interest
Variance and Standard Deviation
Dispersion measured relative to the scatter of the values about their mean S^2 = sum of all (X initial - mean ) ^2 / n-1 Standard deviation(s) -Square root of the variance •MeanandStandard deviation isbestforsymmetricdistributionswithoutoutliers • •Medianandrange isusefulforskeweddistributionsordatawithoutliers
Registrar General William Farr
Dissmissed Snow's findings Considered one of the founders of medical statistics At the Generals Register Office he set up a system for routinely recording the causes of death By differing occupational exposures On Cholera he subscribed to the prevailing theory that is was caused by miasma "bad air" instead of water. Farr believed that Elevation, not drinking water, was to blame
Crude Rates
Easy to calculate, not easy to compare -If a population can be stratified (subdivided into groups), appropriate comparisons must be made between stratum-specific rates such as: -Age specific rates -Age/gender/race specific rates -Information of the stratum-specific rates of a reference population allows for calculating Stratum-adjusted rates (e.g.age-adjusted rates) -expected deaths in reference population/ total reference population A crude rate (overall rate) is a weighted average of stratum-specific rates (the weights are the population totalsofthestrata) -Comparison of crude rates between two populations involves differences in both: -stratum-specificrates, and -population composition (distribution of characteristics) Comparison of crude rates is often confounded by these differences and not appropriate -Crude rates don't take into account that populations differ in how many people are in each strata (age group) The Crude Mortality Rate (CMR)is a weighted average of the age-specific mortality rates. (doesn't take age into account, so we need stratum specific rates to compare *Crude rate: real numbers, cant copare *Stratum specific: real numbers can compare (but you only comparing small pieces of population) Stratum adjusted: Not based on real numbers, but can compare (taking other things into account)(looking at comparisons, not real numbers)
In a cross-sectional study, the smoking prevalence was compared between lung cancer cases and people free of lung cancer. Strangely, it was found that lung cancer patients were more likely to be non-smokers at the time of the study than the cancer-free controls. However, they cancer patients had an unusually high proportion of ex-smokers. Assuming that this wasn't a chance finding, which model best explains why the lung cancer patients were more likely to be non-smokers than their cancer free counterparts?
Effect-Cause
Endemic, epidemic, pandemic
Endemic: The habitual presence of a disease within a given geographic area nEpidemic nTheoccurrenceinacommunityorregionofa groupofillnessesofsimilarnature,clearlyin excessofnormalexpectancy nOutbreak nPandemic nAworldwideepidemic
Public Health Investigations
Epidemiology (asks the question) and Biostatistics (figures out the data that epidemiologists give them) are the basic sciences of public health § Public health investigations use quantitative methods, which combine these two disciplines § Public Health is about the understanding of disease development and uncover the etiology (reasons for diease), progression (what is happening over time), and treatment of diseases in the population § Information is collected to investigate a question § The methods and tools of biostatistics are used to analyze the data to aid decision making
Epidemiology
Epidemiology is the study of the distribution and determinants of health, disease, or injury in human populations and the application of this study to the control of health problems Descriptive epidemiology defines public health problems and set priority for developing drugs/interventions:
Confounders in Cohort Studies
Factors that are independently associated with both the exposure and the outcome - may be differentially distributed in exposed and unexposed groups nAdjusting for these covariates (confounders) in our analysis will eliminate their effect on the study results nIf crude and adjusted measures are different, the confounding effect was strong nIf crude and adjusted measures are similar, the confounding effect was weak *If odds ratio is one for either 2X2 split from big chart, than it is not a true confounder (doesn't have to be positive, it can be negative too, just as long as its not 1) -ONLY if confounder can be an alternate explanation is it a strong cofounder (family history of breast cancer not a strong cofounder when comparing food and breast cancer, because family history doesn't impact the food you eat)
Fixed effect vs Random effect
Fixed: Assume that studies are homogenous, assume size of tx effect is same across all studies, measure of association does not differ by some study characteristics, all studies are estimating the same underlying association. *: Fixed effect assume that the true effect of treatment is the same for every study. (Assumes that every study measured same exposure/outcome relationship, just did it slightly different) *nContribution to summary estimate depends largely on study size nBigger study = better (larger chance of finding true effect, so larger studies get bigger slice) nRewards for size Random: Assume that the true effect estimates for each study vary, so there is heterogeneity among studies. -Contribution to summary estimate depends on study size and agreement with other studies nStill says bigger is better, but also awards for how close you are to final consciences *Effect really does vary between studies (heterogeneity in study). Studies are estimating different association, increases variance of summary measure making it more difficult to obtain significant results, when heterogeneity is large it may be inappropriate to calculate and overall summary measure effect size.
95% Confidence Interval
Formal definition:A range of values computed from the sample data which, were the study repeated multiple times, would contain the unknown parameter 95% of the time. Definition for the rest of us:based on our data, 95% of the time, the truth is in this range. Point estimate: 90 kg 95% CI: 66 - 114 kg nWidth of the confidence interval (CI) nA narrow CI implies high precision nA wide CI implies poor precision (usually due to inadequate sample size) nDoes the interval contain a value that implies no change or no effect or no association? nCI for a difference between two means: Does the interval include 0 (zero)? nCI for a ratio (e.g, OR, RR): Does the interval include 1? RR<1 = decreased risk RR>1 = increased risk RR = 1 = no association The confidence interval for the comparison of 25+ months of breastfeeding to never breastfeeding is 0.2 - 1.1, and thus includes the null hypothesis of no effect (OR=1). This finding is therefore not statistically significant. However, that does not mean the finding is meaningless. Women who breastfed for more than 25 months had a 50% reduction in breast cancer risk by age 50, compared to women who never breast fed based on the odds ratio of 0.5. It did not reach statistical significance because of small numbers. There were only 10 breast cancer cases and 45 controls that breastfed for such a long period of time and because of the small sample size the findings did not reach statistical significance.
Frequency
Frequency Countornumberofobservationswithinanintervalorgroup Cumulative frequency Countwithinthecurrentintervalandallprecedingintervals Relative frequency Countwithinanintervaldividedbythetotalnumberofobservations Cumulative relative frequency Count within the current interval and all preceding intervals divided by the total number of observations
Hazard Ratio (HR) for "time-to event" (survival) analyses
HRis NOT based on survival probability at end point nHR is based on survival curve (rate over time) nOften shown as Kaplan-Meier plot (Image shown, X axis = time, Y axis = survival %) nCaptures Proportional Hazard Does not need to measured death: Progression free survival = time to disease progression (tumor growth) Little vertical tic marks can represent when people dropped out If Hazard Ratio (HR) is 0.594 means that if you were on red drug, you had about a 40% reduction in dying compared to blue drug HRs are also not exclusive to experimental studies Oscars Example: Survival of winners and nominees of academy awards for screenwriting Risk of dying: People who won award vs those who just got nominated HR = 1.37 95% CI (1.1 - 1.7) Interpretation: Oscar winners in screenwriting have a 37% increased risk of death compared with Oscar nominees. You are more likely to die if you win (37% more likely to die than those who were nominated) This is statistical significant
Evidence Based Medicine
Highest standard of medicine Defined as the integration of 1. Individual clinical expertise (medical schools) 2. Best available external clinical evidence from systematic research (what is our best standard of knowledge and how do we translate this to patient care) 3. Patient's values and expectations With the explosion of medical literature, review articles are important for decision-making and staying informed
Selection of controls in case control study
Hospitalcontrols -Similar high quality information, convenient, but have characteristics or diseases that led to hospitalization -If you are looking at bone cancer, don't use controls that came to hospital with bone fractures -Dead controls -If cases are dead, information of past exposures will be given by surrogates, such as spouse or children -Dead controls share the same limitation - Sibling, best friend or neighbor controls may share similar characteristics (too similar?) - Population controls: Random digit dialing is often used (hard these days) Controls are generally volunteers of someone with the disease, but they might be way too similar (not representative)
The epidemiologic triad of a disease
Human disease results from interaction of: •the host (Intrinsic) Age Gender Ethnicity Religion Customs Occupation Heredity Marital status Family history Previous diseases •the agent (plasmodium) -(e.g., a bacterium) Biological (bacteria,virus) Chemical (poison,alcohol,smoke) Physical (auto,radiation,fire) Nutritional (lack,excess) •the environment (Extrinsic) -(e.g., contaminated water supply) nTemperature nHumidity nAltitude nCrowding nNeighborhood nWater nMilk nFood nRadiation nAirpollution In the middle of the triangle is VECTOR •Vectors are mediators of indirect disease transmission -Mosquitos in transmission of malaria -Transfers plasmodium parasite between humans
How does the study design of a case-control study differ from cohort study?
In case-control studies subjects are selected and grouped based on their disease status, but in cohort studies subjects are selected and grouped based on exposure status.
Random Allocation
In randomized control trial, treatment should be randomly allocated nRandom allocation means that all participants have the same probability of assignment to a group nNot determined by investigators, clinicians, participants nNot predicable based on a pattern nConsider these patterns: n nDate of birth (odd = treatment, even = control) nDay of enrollment (Mon = treatment, Tue =control,...) nAlternating (1st person = treatment, 2nd =control, ...)
Units of Analysis
Individual-level analysis: *Measurements for each individual in the study Completely ecologic analysis: (most common) -All variables (exposure, outcome, covariates) are ecologic, so unit of analysis is the group -All things measured at group level Partially ecologic analysis (semi-ecologic) -Combines data collected at individual and ecologic levels -One outcome cant be measured on individual level, but the rest can
External VS Internal Validity
Internal Validity: Are the study results valid (New treatment vs Current treatment) External Validity: Can we generalize the results to the general population? (generalizability)
Ecological Fallacy
Interpreting the relationship observed for groups on individual level *Analogous to Simpson's paradox (a statistical phenomenon) : Trend that appears in different groups of data disappears or reverses when these groups are combined
John Snow (1813-1858)
London-based British physician and father of modern Epidemiology •Lived during London Cholera epidemics -1831-1832: 23,000 deaths -1848-1849: 53,000 deaths -1853-1854: 10,000 deaths Correctly hypothesized causation and transmission (Hypothesis: Sewage-contaminated drinking water is the causal agent for the cholera epidemic) Based on clinical expertise and epidemiologic data Without knowledge of modern germ theory •Snow tested this hypothesis by first conducting an "ecologic"study of London districts using: -routine surveillance data on cholera cases -population data at the district level -information on water companies serving the districts -available data on property values by district Ecologic Studies: No information on exposure status of individuals Only available summary data - easily available Useful for identifying strong associations initially Poor evidence of causation Snow knew that: the Eastand Southdistricts were served by water supplies obtained from known polluted parts of the Thames River the rest of London received drinking water relatively uncontaminated by sewage. Limitations of Snows ecologic study: Uncertainties about districts' water supplies some districts served by more than one company (but then in 1852, one of the companies moved to a less polluted part of the river...) Districts vary by socioeconomic (SES) factors Don't know if cholera cases actually drank contaminated water Snow was worried about 1. Variables that could provide an alternate explanation to his observations, e.g. SES (today: confounders) 2. Uncertainty about whether the cases actually drank from the respective water supply (today: misclassification) So, he used Shoe leather epidemiology (walking door to door) Visit home of each case to determine source of water to the residence: confirmation by water bill receipt or by water sample. next-of-kin recall found to be inadequate Snow chose a district that was supplied by both companies - giving him a defined study population that differed in exposure Conclusion: South Londoners supplied with water by Southwark& Vauxhall were 8.5 more likelyto die from cholera than, those supplied by Lambeth.
Cohort Studies
Longitudinal: Study that goes forward in time -Prospective studies/Forward looking studies/Incidence studies -All terms mean same thing -Generally used as person/time -Allows you to switch groups They start with people free of disease (must be at risk) (You don't have it, but could develop it) -Always start with: Assessing exposure at "baseline" -You have to make sure they don't already have it -OR figure out to what degree they do -Create comparison groups that differ by exposure -Assesses outcome (disease) status at "follow-up" When do you do a cohort study? -When there is good initial evidence of a connection between exposure and outcome -When the exposure is RARE (Go somewhere with enriched population, you go find rare exposure) -Good to use when time between exposure and outcome and short (Short time between exposure and outcome) -Use when the effects of a risk factor on multiple outcomes need to be investigated -When a cross-sectional or case control studies are not feasible (do you remember how many servings of carrots you have eaten in the last year) -When ample funds are available (these are very expensive)
Meta-Analysis
Meta analyses: Take all papers (do systematic search on all studies on a subject) and summarize: Summary of all the published studies on a specific topic (Meta analysis is part of systemic review) *A systematic review can be performed without a meta-analysis, but a meta-analysis cannot be conducted without a systematic review *Type of systematic review that combines results of independent but similar studies to obtain overall estimate *Pre-specified criteria: here are the sources I'm going to look for, only going to consider a certain study type, look for certain terms, etc. Cant dismiss study because you disagree with finding Go through 100 papers, pick the best ones, and then summarize the most high quality ones *Cochrane Review *Systematic reviews: Collect ALL available evidence according to clearly stated criteria, minimizing biases and errors (both meta analysis and systematic review are high quality sources of evidence) *I will search literature and then summarize only if they fit into a certain pre-specified category (search for all papers) Individual patient data (IPD) meta- analyses: Recrunching data, very very rare (more rare than meta-analysis) *Most reviews are not systematic, they are like traditional or narrative *Narrative Reviews: usually written by expert, generally qualitative , can be vey biased and broadly focused, based on subjective and informal criteria, inclusion and exclusion criteria are not specified
Ecologic Studies (Aggregate Study)
Not looking at how exposure/outcome relationship is measured, we are looking at level to which its measured *Looks at collections (aggregate level= groups of people) *Comparison of groups or population of people rather than individuals *Useful for generating hypothesis *Can't adjust for confounders due to lack of data *Prone to ecologic fallacy: When we take aggregate level data and interpret it for individual *missing data can be an issue *Low cost and convenient *some stuff can't be measured on individual level (population density) *Simple analyses and presentation *Helpful for generating new hypothesis for further research Levels of Measurement: *Means or proportions in groups (school districts) *Environmental measures: Air pollution, stuff in water *Global measures: Population density: Attributes for groups/places for which there is no individual analog -On an aggregate level, you don't get neat 2X2 outcomes -Continuous scale for exposures and outcomes -Linear correlation, as one goes up, the other follows -Use Pearson's "r" as correlation coefficient
Investigators enrolled 2,100 women in a study and followed them annually for four years to determine the incidence rate of heart disease. After one year, none of the responding women had a new diagnosis of heart disease, but 100 could not be reached during follow-up. After two years, one had a new diagnosis of heart disease, and another 99 could not be reached during follow-up. After three years, another seven had new diagnoses of heart disease, and could 793 not be reached during follow-up. After four years, another 8 had new diagnoses with heart disease, and 392 could not be reached during follow-up, leaving 700 women who were followed for four years and remained disease free. (To simplify your calculations, a woman who develops heart disease in a given year should not contribute a person year of observation during this year*). Question 2: Calculate the incidence rate of heart disease during the 4-year follow up among this cohort of women.
Numerator = number of new cases of heart disease = 0 + 1 + 7 + 8 = 16 Denominator = person-years of observation = (red => loss to follow up, green => developed disease) Year 1 (2100 - 100) = 2000 Year 2 (2100 - 100- 99) = 1901 - 1 = 1900 Year 3 (2100 - 100- 99- 793) = 1108 - 1- 7=1100 Year 4 (2100 - 100- 99- 793- 392) = 716 - 1- 7- 8 = 700 = 2000 + 1900 + 1100 + 700 = 5,700 person-years Person-time rate = Number of new cases of disease during specified period Time each person was observed, totaled for all persons = 16 ⁄ 5,700 = .0028 cases per person-year = 2.8 cases per 1,000 person-years *Note: The calculation above I told you not to attribute observation time to women who develop heart disease during their year of diagnosis. In reality would know when exactly within the calendar year the cases occurred and would add the appropriate fraction of a year to your denominator, i.e. if a person got sick on December 1, we would add 11 months (Jan through November) or 0.962 person-years as observed disease-free observation time to our denominator.
Odds ratio (for case-control studies) A case-control study was conducted to evaluate the relationship between artificial sweeteners and diabetes. 3,000 cases and 3,000 controls were enrolled in the study. Among the cases, 1,293 had used artificial sweeteners in the past, while among the controls, only 855 had used sweeteners. Compute the appropriate measure of association, and round off the answer to one decimal place.
OR = (a/c)/(b/d)= ad/bc OR = 1293 X 2145 / (855 X 1707) = 1.9 OR = Prob (exposure in cases) / [1 - prob (exposure in cases)] divided by Prob(exposure in controls) / [1 - prob(exposure in controls)] Suppose a study looking at the association between smoking and bladder cancer found an odds ratio = 2.4. This means Smokers have 2.4 times the risk of developing bladder cancer compared to non-smokers.
Exaggerating Effect Size
OVERSELLING OF RELATIVE EFFECTS WITHOUT ACCURATELY DESCRIBING THE FULL MAGNITUDE OF THE EFFECT: Make effect look bigger and more pronounced than it really is) We deal with relative risk: looking at the population Absolute VS Relative Risk: *Relative risk reduction - makes effect size seem larger (we are interested in populations, not individuals) *Absolute risk reduction: make effect size seem smaller (these numbers need to be mentioned too) When someone looks at a news article, they are thinking about their individual risk, they want absolute risk Giving relative risk without absolute risk will always exaggerate effect size *The "of what?" is the absolute risk: When you're only told the relative risk - 50%, the absolute risk is 50% of a 90 person study *it could be risk reduction from 90 in 100 down to 45 in 100...so 45 benefit. *Or it could be from 2 in 100 to 1 in 100....so only 1 benefits.
P (probability) Value
P value is very dependent on sample size (A P value is also affected by sample size and the magnitude of effect. Generally the larger the sample size, the more likely a study will find a significant relationship if one exists. As the sample size increases the impact of random error is reduced.) -Probability of obtaining an effect as large as or more extreme than the observed effect, assuming null hypothesis is true nMeasures the strength of the evidence against the null hypothesis -Smaller p-values indicate stronger evidence against the null hypothesis -p-values of <0.05 are often accepted as "statistically significant"in the medical literature; but this is an arbitrary cut-off. P value of 0.08 (A little evidence against null, but not much) - With RR of 3, 8 out of 100 such trials would show a 3-fold or higher benefit of Red Bull consumption just by chance. Not statistically significant. 0.7: Very weak evidence against the null hypothesis...very likely a chance finding (You were right) -With RR of 1.2, 70 out of 100 such trials would show a 20% benefit or more extreme just by chance ...very likely a chance finding 0.007 : Very strong evidence against the null hypothesis...very unlikely to be a chance finding => reject the H0 -With RR of 1.8, only 7 out of 1000 such trials would show a 80% benefit or more extreme just by chance ...very unlikely to be a chance finding P value related to sample size: If i study a huge amount of people, anything comes out as significant, even the smallest most meaningless things can become statistically significant Also dependent on effect size: How big was the effect RR values: 3.0 and 1.2 Significant = less than .05 Small study with big finding = more meaningful finding If i asked 100 people who are 5 ft tall about their weight, how many would be expected to answer with a number between 66 and 114? Not 95, because this doesn't apply to everyone. It only tells you stuff about your data, not generalizable 4 things you can do with confidence interval that you can't with P value More than .05, not significant Larger studies measure more precisely F your confidence interval includes one, then it's not significant nP-values are related to the effect size (the magnitude of the observed association np-values give no indication about the clinical importance of the observed association nTherefore, most epidemiologic studies report effect size and confidence intervals...
Case Control Study (Case-Referent)
Previous exposure, comparing A type of epidemiologic study where a group of individuals with the diseases, referred to as cases, are compared to individuals without the disease, referred to as controls *Obtaining histories and other information from a group of people with a particular disease or condition and from a group without the disease to determine the relative frequency of a past exposure under study Case-Control Studies cannot directly measure incidence or prevalence Past: Retrospective (start with people who do or don't have disease, ask them about exposures) Take every single person who is sick: cases (you don't know exposure) Your controls are going to be a representative/fair sample -50% exposed Cases: Seniors accepted into Stanford Medical School Controls: Seniors who applied, but weren't accepted -Exposure: self-reported regular Red Bull consumption during undergraduate time -Potential Confounders: Age, sex, undergraduate institution, SES -Work backwards -Sample chosen on the basis of outcome -Find (cases), plus comparison group (controls) Advantages: -Rare outcome/Long latency period -Inexpensive and efficient: may be only feasible option -Multiple Risk factors (Predictors) -Establishes association Disadvantages: Causality still difficult to establish -Selection bias: -Recall bias: due to retrospective sampling -Cannot tell about incidence or prevalence *If you want to find direct risk or risk measurement, you cantdo case control It lets you estimate or infer risk, but you cantmeasure I directly Efficient way to measure exposure -Defined study base -Very common study design Use if you want to know if two factors are related Once you have study base defined, you find every single case of disease Controls: fair sample of underlying study base -Want cases and controls to represent fair sample of study base -Toughest part is defining study base and selecting controls who represent the base (even though you won't reach out to most of them) Frequency of exposure: way more smokers among cancer patients If the frequency of the exposure is higher among cases (with disease) than controls (healthy) then the incidence rate of this disease will be higher among exposed than non-exposed. Cases- Need all cases in the population (not studying whole iceberg) Controls- Should be representative of all persons without disease in the population Most common because they are inexpensive and quick Usually conducted before a cohort study or experimental study to identify etiology of disease Good for masters student (can be as quick as 6 months) Can investigate Multiple exposures, but you can only study one outcome because you have to define the disease -useful when real exposure is not known Preferred if the disease is rare, you can go and find those cases * A cohort study of rare disease would need huge number of exposed to get enough cases Retrospective cohort study: You would have to start out with people who had written down exposure in the past, and if the outcome is rare, you still wouldn't be studying rare cases efficiently Case control is more efficient Case control , people have tobe able to recall exposure, they are great if there is a long time between exposure and outcome because you can bridge gap if people remember exposure
What's the purpose of random treatment allocation?
Primary purpose -Prevent bias in allocating subjects to treatment groups (avoid predictability) -Secondary purpose -Potential confounders are equally distributed at baseline -Not always! (especially if number is small) -Both measured and, more importantly, unmeasured variables
Types of study base
Primary study base: -The base is well-defined cohort -The cases are subjects within the base who develop disease -i.e. participants in Nurses Health Study (nested CCS) -Everyone in a certain health plan, HMO -Know everyone in study base Secondary study base: Who would be the appropriate control? -Cases are defined before the study base is identified -The study base then is defined as the source of the cases; controls are people who would have been recognized as cases if they had developed disease
Matching
Process of selecting controls in a case-control study so that the controls are similar to the cases with regard to certain key characteristics (age, sex, and race) *eliminates confounders Matching can be performed at individual or group level -Individual matching (matched pairs) OR -Group matching (frequency matching) -One case matched to more than one control (more accurate) -More controls you match to a case =. More accurate (but after four or five it doesn't really make a difference/you don't get any more power) Cant explore possible associations of disease with any variable on which the cases and controls have been matched (If you study effects of race, case and control participant race should not match)
Prevalence
Proportion of a population found to have a condition (new and existing cases) Prevalence means the number of people alive who have the disease. Prevelance per 1000 = 1000X # of cases of disease present in the population at a specified time / # of persons int he population at that specified time Two Types: Point Prevelance Period Prevalence -"Do you currently have asthma?" -Point prevalence -"Have you had asthma during the last 5 years?" -Period prevalence -"Have you ever had asthma?" -Cumulative or life-time incidence
Types of Cohort Studies
Prospective: I start today, and move forward -In 20 years, Im near retirement -Concurrent cohort study or longitudinal --Investigator -Starts the study with the identification of the population and the exposure status -Follows them (overtime) forthedevelopment ofdisease -Takesalongtimetocompletethestudy(as longasthelengthofthestudy) -Stand at the beginning of the study -I start in 2018, in 2038 I get data Retrospective: -Non-Concurrent cohort or historical cohort study -Still grouping by exposure -Stand at the end of the study -I look at data from 1998 -Investigator -Usesexistingdatacollectedinthepasttoidentifythepopulationandtheexposurestatus -Determinesatpresentthestatusofoutcome (disease). -Investigatorspendsarelativelyshorttimeto: -Assemblestudypopulationfrompastdata -Determinediseasestatusatthepresenttime(nofuturefollow-up) nInvestigatorusesexistingdatafrom thepastto: -Identify thepopulationandtheexposurestatus -Compares incidence between exposed and unexposed at the present time nInvestigator furthermore: -Willspendadditionaltimefollowingthemintothefutureforthedevelopmentofdisease -A lot quicker, because research has already been done *With cohort, people don't switch groups, they stop when they change and then you just take into account how many years they gave *Sometimes, when you determine a link to an exposure in the past causing a disease, you first do a retrospective analysis, then you do prospective to follow up with these people to see if more develop risks in future.
Relative Risk (Risk Ratio)
RR= [a/(a+b)]/[c/(c+d)] a = outcome on drug x b = no outcome on drug x c = outcome on placebo d = no outcome on placebo RR = risk of outcome on Drug X (usually outcome is death) / Risk of outcome on placebo If RR = 0.2, then there is an 80% reduction of dying with drug X Risk ratio of 1 means no difference (with or without drug same)
Assuming your study population consisted of only 50 heart disease patients and 50 controls. Doubling the number of participants in your next study would likely reduce the risk of a spurious finding due to: While all 50 heart disease patients agreed to be in your study, half the people asked to serve as controls declined. Assuming all of the 50 people who declined use marijuana, which bias has occurred and in which direction were the study results biased?
Random Chance Non-responder bias, away from the null hypothesis (increases probability of spurious finding)
Non-Compliance
Randomized: Split into NSAID (refuse of cannot tolerate NSAID) and Placebo (Require NSADI or take on their own) nPrimary:intentiontotreat -Analyzeaccordingtooriginalallocation -Neteffectofnon-compliancewillreducetheobserveddifferences= underestimating effectiveness of the intervention Dealing with Non-Compliance: nMonitorcompliance -Observetreatmentdirectly -Countpills -Conductbloodorurinetests nUseof"run-in"period -Physicians health study -33,223 were asked to take both aspirin and placebo for 18 weeks -Those with good compliance (22,071) were later randomized to aspirin or placebo
Recall is always imperfect. When it is non-differential it leads to a bias ______ the null hypothesis. When it is differential, it biases the results ______ the null hypothesis.
Recall is always imperfect. When it is non-differential it leads to a bias toward the null hypothesis. When it is differential, it biases the results away from the null hypothesis. Differential can result in "spurious findings" => type I errors
Rate Ratio (Risk Ratio)(RR)(Hazard Ratios)
Risk of outcome in the exposed/ Risk of outcome in the the unexposed Numerator: Risk of getting sick with the exposure (Attack rate) # of people who got sick after eating cake/ Total number of people who ate the cake Divided by Denominator: Risk of getting sick without the exposure # of people who got sick without eating the cake/ Total number of people who did not eat the cake If odds ratio is 0.6, Women who breastfeed for 13-24 months had a 40% reduction in breast cancer risk by age 50, compared to women who never breastfed, based on the odds ratio of 0.6. While 40% is our best estimate of this effect, the confidence interval tells us that this benefit may be as little as 10% or as high as 60% (95% CI 0.4 - 0.9). Either way it is statistically significant. Appropriate measure of associaition of a case control study Odds of tanning in skin cancer / odds of tanning in odssof no skin cancer If tanning causes skin cancer, odds higher in cases than control Recognize that odds ratios can only be interpreted as relative risks when the outcome is rare. Risk and odds pretty much the same thing Whatever outcome you study, if population less than 10%, you can take odds ratio and interpret it as risk ratio Anything larger than 10%, deviation is too big ANYTHING LESS THAN 10%, DEVIATION IS SO SMALL THAT YOU CAN TAKE YOUR ODDS RATIO AND INTERPRET IT AS RISK RATIO Risk estimate: they calculated odds ratio, but they are intereptingit as risk ratio (its an estimate, close enough) > 1 means the exposure is a risk factor. = 1 means the exposure is not associated with the disease. < 1 means the exposure is protective. If your odds ratio is greater than 1, than your exposure is a risk factor (increases risk)
Risks VS Odds
Risk: a/ (a+b) blank in total *One in 8 women will develop breast cancer in her life time. Taking same slice out, numerator same, but denominator no longer contains slice (something to something, denoting that the other part does not contain A) -Denominator is everything but the numerator Odds: a/b Odds of a perfect March Madness basketball bracket are 128,000,000,000 to 1. If any event takes place with a probability 'p' then the odds in the favor of event are 'p / (1 - p)' to 1 nIf p=2/3 then the odds of event are 2/3 /1/3 = 2 to 1
Single exposure, multiple exposure, or continuous exposure
Single exposure nGroup of individuals exposed to a common vehicle (food,water,air,etc.) nTheexposurewasonetime(forexample,thefoodwasservedonlyonce) nTypicalcharacteristics: Explosive(abrupt)increaseinthenumberofdiseasedindividualsandthenthenumberdeclinesgraduallyovertime Food-borne outbreak on a cruise ship Graph goes up sharply and then declines gradually
Selection of Cases (Case control study)
Source of cases: -Hospital, Physicians clinic, Registries, Community (Where there is a record of cases Incident or prevalent cases?) -Incident cases (would have to be recently diagnosis): could take long time to develop and may die before recruited into study -If its rare, it would take a really longtime -If you have enough cases, this is preferable -Prevalent cases: larger pool of cases, but survival bias, change in exposures Homogenous criteria for case definition Certainty of diagnosis Survival bias: People with the most extreme cases have already died Incidence: Who DEVELOPS the disease during a given time Prevalence: Everyone who has it it at a given time
Satistics/ Biostatistics
Statistics is the science and art of dealing with variation of data in order to obtain reliable results § Biostatistics is the application of statistics to the biological sciences, health, and medicine § Methods differ with each study design, but interpretation of numbers stays the same § Examples § Computing age-adjusted cancer rates to determine trends over time and locality § Calculating the risk of developing brain tumors following cell phone use after adjusting for possible interfering variables § Quantifying the relationship between use of COX-2 inhibitors and clinical outcomes
How would you investigate whether chocolate consumption is associated with winning a Nobel Prize? • Please describe the study design, study population, how you would measure exposure and outcome and what confounders you would ascertain to control for in the design or analysis of your study.
The best way to address whether chocolate consumption is associated with the chances of winning a Nobel Prize, is by using a case-control study. This observational study design can investigate association between rare outcomes and common exposures.Nobel Prize winners would be the cases. Controls would come from the general population and would be matched by age, gender and country of origin. Exposure would be chocolate consumption (such as 3 times per week). Confounders are numerous and include factors related to socioeconomic status, education, other dietary variables, parent's education, etc.
P-Percentile
The pth percentile P is the value that is greater than or equal to p percent of the observations Common percentiles are: -25th= first quartile = (n+1)/4th observation -50th= second quartile = (n+1)/2th observation -The Median (P50) is the value that separates the lower 50% from the upper 50% of the observations. -75th= third quartile = 3 (n+1)/4th observation Slides 21-24 on Lecture 5 for examples
What does the r=0.791 and p<0.0001 in table 1 indicate?
The r value tells you that a positive correlation exists between the two variables (as one increases the other one increases as well) and that a linear correlation is a relatively good fit for the data (a value of 0 would indicate no association and a value of 1 a perfect linearity). Based on the p<0.0001 this association reaches statistical significance, as to reject the null hypothesis that these two variables are unrelated.
The steps in the epidemiologic approach to study a problem of disease etiology are:
Thestepsintheepidemiologicapproachtostudyaproblemofdiseaseetiologyare: -Initialobservationtoconfirmtheoutbreak -Definethedisease -Describethediseasebytime,place,andperson -Look for person-time-place interactions -Createahypothesisforpossibleetiologicfactors -Conductanalyticstudies -Summarizethefindings -Recommendandcommunicatetheinterventionsorpreventativeprograms
Squamous ampheritis is a uniformly lethal disease and currently only one drug, oliperab, is approved for treatment of patients with this disease. Even with this available treatment, the 5-year survival for this disease is a mere 30%. Your start-up pharmaceutical company has spent literally all its resources developing and clinically testing a new treatment modality for squamous ampheritis called valiperab. You oversaw a small, randomized controlled trial in which patients were randomized to valiperab (treatment group; 100 patients) versus oliperab (control group; 100 patients). A placebo control was not deemed ethical in this situation because these are sick patients that should at least receive standard of care therapy (oliperab). After the two-year trial, you found that fewer patients on the valiperab group had died compared to the oliperab group (25 vs. 36 patients). You calculate a risk ratio for death of 0.7 (p=0.062). Question 5A: Write a statement about both the statistical significance and clinical significance of your finding, justifying your answer using numbers supplied above. Question 5B. It has come to your attention that 7 patients who died in the valiperab group actually discontinued the drug shortly after the trial opened, because they were unable to maintain an erection while on valiperab. Should these patients be removed from the analysis? Why or why not?
This trial did not reach statistical significance, because the p-value is 0.062 and therefore greater than 0.05. That means we don't have enough statistical evidence against the hull hypothesis that valiperab is no better than oliperab. However, the drug clearly showed some clinical value, being at least as useful for treatment of squamous ampheritis as the existing drug. In fact, fewer people actually died in the valiperab arm, so this is a very useful drug to squamous ampheritis patients (causing a 30% decrease in one's risk of dying, compared to the existing drug oliperab). A larger trial is needed to demonstrate the efficacy with statistical significance. These patients cannot be removed from the analysis, because a clinical trial always analyses its results in an intent-to-treat analysis, which is based on the treatment allocation during randomization. If these people would be removed from the analysis, the ratio of males to females would be off between the groups (as all of them were male), so the two groups would no longer have random dispersion of measured and unmeasured confounders. Analyzing the results without these 7 patients would be a per-protocol analysis, in which the power of randomization is lost. The study would essentially be an observational study, which does not provide enough evidence of causation to approve a drug in the US. We only consider randomized controlled trials as evidence for causation.
Researchers carried out a meta-analysis of studies on the association between the use of menopausal hormones and colon cancer in women. Twenty independent estimates of the association between ever use of menopausal hormones and colon cancer led to a summary RR of 0.85 (0.73, 0.99), using a random effects model. The estimated RRs were lower among current or recent users (RR 0.73) as compared with short-term users (RR 0.88). Evaluate whether the following statement regarding this study is TRUE or FALSE. If the result from a single study had been reported on more than one occasion, only one set of results from this study would be included in the meta-analysis. Data from cross-sectional studies were likely excluded from this meta-analysis. The I2 test for heterogeneity would have a p-value of less than 0.05 All 20 studies gave similar estimates of the relative risk.
True True True False
Variables
Vsurvariable A characteristic taking on different values Random variable A variable taking on different possible values as a result of chance factors Qualitative or Categorical Implies attribute or quality -Split into binary (2 categories: User/non user), nominal (more than 2 categories, no order, ethnicity/maritial status), and ordinal (More than 2 categories, ordered, satisfaction, stages, survival -Summarized as counts, proportion or %: Risk relates to a ordered, categorical measure) Quantitative or Numerical -Implies amount or quantity -Split into Discrete (countable) and continuous (not countable, age/BMI, -Often Summarized as Mean +/- SD, Risk relates to continuous measure) nHigher level variables can always be expressed at a lower level, but the reverse is not true. n ne.g. Body Mass Index (BMI) is typically measured at an interval-level, continuous variable such as 27.4 n nBMI can be collapsed into Ordinalcategories such as: n>30: Obese n25-29.9: Overweight n18.5-24.9: Normal weight n<18.5: Underweight n nOr it can be organized into Binarycategories such as: nOverweight Not overweight
When is it desirable to conduct a case-control study?
When the disease has a long latency period *Retrospective (in the past) analysis *We can jump several years, which we can't do with something in the future *Time between exposure and outcome is long, provided that you can remember *Considers the past *Most appropriate when the OUTCOME is rare (because its already happened) *You need an exposure that is fairly common *You can only study one outcome at a time (You can only study one disease at a time, and you either have the disease or not) *You can study multiple exposures *Does not allow us to calculate incidence or prevalence
Scale Line Graph
Y-axis represents frequency. X-axis represents time.
Years of Potential Life Lost (YPLL)
Years of potential life lost measures the impact of mortality on society -It is calculated by summing the years that individuals would have lived had they experienced normal life expectancy -Age 75 is used in the calculation (age standard) -For example, a person who died at age 30 from heart disease will contribute 75-30=45 YPLL -YPLL is weighted more by premature deaths, while mortality is weighted by the larger number of deaths in older people
Cohort Study
ascertain the relationship between exposure and outcome A type of epidemiologic study where a group of exposed individuals (individuals who have been exposed to the potential risk factor) and a group of non-exposed individuals are followed over time to determine the incidence of disease -Prospective cohort study of UNC incoming class during 2017/2018 academic -Exposures: All entering students surveyed regarding beverage consumption and variety of other potential covariates -Follow-up: Survey updated annually to record changes in Red Bull consumption -Outcomes: GPA, admission into Stanford for graduate/ professional school -Participants are grouped by their exposure status -whether or not they have a suspected risk/protective factor -Participants are followed forward in time (aka prospective, longitudinal) to determine if one or more new outcomes occur -Subjects cannot have outcome variable on entry (must be at risk) -The rates of incidence among the exposed and unexposed groups are determined and compared. Advantages: -Know that predictor variable was present before outcome occurred (some evidence of causality) -Directly measure incidenceof a disease outcome -Can study multiple outcomes of a single exposure Disadvantages: -Expensive and inefficient for studying rare outcomes -Often need long follow-up period or a very large population -Loss to follow-up can affect validity of findings Start with people who are exposed, and compare them to people who aren't, and then you wait..... and follow them......
Incidence
number of new cases of a disease occurring in an at-risk population during a defined time interval -Example: In 2008, the incidence of chlamydia infection in 15-19 year old femaleswas 3800 per 100,000 in North Carolina. Compared to 3200 per 100,000 in the whole US population iNCidence = New Cases Incidence Proportion: Number of NEW casesofadiseaseoccurring in the population during a specified periodoftime / Number of persons at risk of developing disease during that time period -Ameasureofriskduring a time period -Example: numberofpeoplediagnosed with diabetes in 2014, dividedbythenumberofpeople who did not have diabetes on Jan 1, 2014. Cumulative incidence; aka"life-time risk" -Same as incidence proportion but time period is life-time -Example:numberofpeoplewhohaveeverhadasthmadividedbytotalnumberofpeoplewhowereaskedthequestionabouteverhavingasthmain 2014. Incidence Rates: nNumeratoristhesameasthatofincidenceproportions nDenominatoris the observed time atriskoftheevent -Itisnotjustthenumberofpeopleatrisk -In a study of tuberculosis,an individualwhowasfollowedfor5yearswillcontribute5person-yearsoffollow-uptothedenominator,whileanotherindividualwith3yearsoffollow-upwillcontribute3yearstothedenominator -Incidence rate=5 per 10,000person-years Person - Time allows for continuous enrollment and loss to follow-up
P-Values VS Point Estimate and Confidence Intervals
p-values answers the question... "Is there a statistically significant difference between the two groups?" Point estimates answers the question... "What is the magnitude of this difference?" Confidence interval answers the question... "How precisely did this study estimate this difference and is the difference statistically significant?" -If a 95% CI includes the null effect, the p-value is >0.05 (and we would fail to rejectthe null hypothesis) -If the 95% CI excludes the null effect, the p-value is <0.05 (and we would rejectthe null hypothesis) With RR = 3 and 95% CI [0.77, 13.7], the Point estimate suggests 3-fold benefit of Red Bull drinking. Wide interval: an effect somewhere between a 23% harm and 13.7-fold benefit. With RR = 1.8 and CI [ 1.14, 2.83], Point estimate suggests a benefit 80% for Red Bull drinking. Narrow interval: a benefit somewhere between a 14% and 2.8-fold. No indication of harm.
Null Hypothesis
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error (treatment or chance) nThere is no association between the independent and dependent/outcome variables nFormal basis for hypothesis testing
