HPA 311: Health Informatics and Health Communications
Cross-Sectional Survey
An observational study conducted at a point in time. Advantages: Cost-efficient, easy to implement, ethical Disadvantages: No temporal data, non-response bias
Which study has the advantage that it is cost and time efficient for rare outcomes?
Case Control Study
Categorical
Categorical variables have more than two responses but are not ordered (eg. Color: yellow, purple, green)
Logistic Regression Analysis
A 95% confidence interval is significant if 1 is not in the confidence interval provided (think of it like a range) A p-value is significant if it is <.05 Step 2: Determine the reference group Step 3: Interpret the odds ratio/parameter (i.e. how big the odds/slope of the line is)
Standard Deviation
A standard deviation of a data point is the square root of its variance, and is another description of values around the mean value of a measure.
Uses of Source Data
Alert to new disease or resistant disease Alert to potential spread beyond initial areas
When you are comparing two categorical variables to each other, you use the:
Chi-square statistic
When a variable can contain any value between a theoretical minimum and maximum, the variable type is:
Continuous
Continuous
Continuous variables assume in theory any values between a theoretical minimum and maximum.
When we want to measure how two different variables are associated with, or change with each other, we can examine their:
Correlation
Self Reporting
Data self reported by individuals who experience symptoms after receiving a drug or vaccine Uses May help identify unrecognized or unusual events Advantages/Disadvantages Useful when unusual events closely follow initial use of drug or vaccine. Tends to be incomplete, difficult to evaluate meaning because of selective process of reporting.
Source Data
Detailed report of an individual patient or organization
Dichotomous
Dichotomous variables have 2 possible responses
Examples of Biostatistical Applications Include
Estimating the extent of cardiovascular disease in the city of Boston, in Massachusetts Estimating the rate at which people develop cardiovascular disease Determining the risk factors for cardiovascular disease Assessing the effectiveness of a new drug
Disability-Adjusted Life Expectancy (DALYs)
Examines the impacts that specific diseases and risk factors have on populations, as well as provide an overall measure of population health status. Lowest rates of disability-adjusted life years lost are observed in North America (Canada), Australia, New Zealand, Japan, South Korea and Western European Countries
Randomized Control Trial
Experimental study where patients are randomized to receive one of several comparison treatments Advantages: Gold standard from a statistical point of view, minimizes bias and confounding Disadvantages: Expensive, requires extensive monitoring, inclusion criteria can limit generalizabilit
Prevalence is the number of new cases and incidence is the number of existing cases.
False
True or false: Incidence is the number of existing cases, and prevalence is the number of new cases.
False
True or false: We use logistic regression for continuous outcomes
False
True or false: We use multiple linear regression for dichotomous outcomes
False
True or false: You should trust every online source for health informatics.
False
Standard Notation in Statistics
Greek Letters: Full Population Roman Letters: Sample Population Sample Statistics Roman Letters Mean = X Standard Deviation = S Variance = S2 Population Parameters Mean = U Standard Deviation = a Variance = a2
Health Communications
How we perceive data, combine and use health informatics to make decisions.
Health Informatics
Includes the methods for colleccting , compiling, and presenting data
Heath Adjusted Life Expectancy (HALE)
Incorporates measurements of the quality of health through: Mobility (walking without assistance) Cognition (mental function) Self-care (activities of daily living Pain (regular pain that limits function) Mood (alteration in mood that limits function Sensory organ function (impairment in vision or hearing that impairs function) and
Multivariable Methods
Independent Variable 1. —> Independent Variable 2. —> Outcome Independent Variable 3 —>
Which population health measure is the health standard describing the rate of death in the first year of life:
Infant Mortality Rate
Two standards for summarizing the health status of populations, both can be used comparably across populations
Infant mortality rate Life expectancy
Variance
Is the quantification of the amount of variability, or how far each individual value is around the mean value of any measurement.
Which population health measure is the health standard describing the probability of dying at each year of life:
Life Expectancy
Some Issues and Limitations to Biostatistics are that we:
Must clearly define research question Must choose appropriate study design (i.e., the way in which data are collected) Must select a sufficiently large, representative sample Must carefully collect and summarize data Must quantify uncertainty Must appropriately account for relationships among characteristics Must limit inferences to appropriate population
Case Report
Observational studies: A case report provides a detailed report of specific features of cases. Case series are systematic reviews of common features of small number of cases Advantage: Cost-efficient Disadvantages: No comparison group, no specific research question
Prospective Cohort Study
Observational study involving a group (cohort) of individuals who meet inclusion criteria followed prospectively in time for risk factor and outcome information Advantages: Can assess temporal relationships Disadvantages: Need large numbers for rare outcomes, confounding
Case-Control Study
Observational study involving individuals with (cases) and without (controls) outcome of interest Advantages: Cost and time efficient for rare outcomes Disadvantages: Need careful selection of cases and controls, bias
Confidence Intervals
Observed interval from your data and give you lower and upper bounds or limits. Standard Confidence Interval is 95% A confidence interval is significant if the value 1 is not found in the interval
Dependent Variables
Outcome-variables
Quality Standards for Health Information on the Internet
Overall Site Quality: Is the purpose of the site clear? Is the site easy to navigate? Are the site's sponsors clearly identified? Are advertising and sales separated from health information? Authors: Are the authors of the information clearly identified? Do the authors have health credentials? Is contact information provided? Information: Does the site get its information from reliable sources? Is the information useful and easy to understand? Is it easy to tell the difference between fact and opinion? Relevance: Are there answers to your specific questions? Timeliness: Can you tell when the information was written? Is it current? Links: Do the internal links work? Are there links to related sites for more information? Privacy: Is your privacy protected? Can you search for information without providing information about yourself?
P-Values
P-value <0.5 indicates statistically significant findings. P-values >.05 are generally not described in analyses as they are statistically significant. <.05 = 95% Certain that value was found out of chance <.01 = 99 Certain that observed value did not happen by chance <.001 = 99.9 Certain that value was found outside of chance
Independent Variable
Predictor variable
Which study involves a group of individuals who meet inclusion criteria followed prospectively in time for risk factor and outcome information?
Prospective Cohort Study
What is the gold standard for study design?
Randomized Control Trial
Infant Mortality Rate
Rate of dearth in first year of life. Used as primary measure of child health. From 2000 to 2013, we see the non-Hispanic black infant mortality rate is the highest, and appears to be nearly twice that of non-Hispanic whites and Hispanics. It appears to be exhibiting sharper declines in infant mortality, which is good, but it still has a long way to go to reduce the disparity in infant mortality rates across these populations.
Which source of population health data is useful to alert to new diseases or potential spreads of disease beyond initial areas?
Single case or small series
Sources of Population Health Data
Single case or small series Statistics ("vital statistics") and reportable diseases Surveys - sampling Self-reporting Sentinel monitoring Syndromic surveillance
ANOVA
Sometimes we want to compare the average of these t-test values across three or more groups. For this, we would use ANOVA. An F-test is used to statistically compare the means across the groups. Your F-statistic will give you a p-value. If the p-value is <.05, then you can conclude that not all the group means are equal to each other, and you will need to investigate further to conclude which groups are statistically different from each other.
Z-Score
Standard statistics that tells us how many standard deviations above or below the population mean a raw score, or observation we have made, is from the mean.
Which source of population health data is useful to draw conclusions about overall population and subgroups from representative samples?
Surveys
T-Test (One-Way, Two-Way and Paired)
T-tests are used when you have continuous variables and have mean values that you want to compare. One-way t-tests compare your group mean to a target value; two-way t-tests compare two group means to each other. Again, your t-statistic will give you a p-value. If the p-value is <.05, then the two means are statistically different from each other. If the data belong together, or are paired - for example, comparing values before and after an intervention, then you want to use a paired t-test.
Incidence
Tells us the likelihood of developing disease among persons free of disease who are at risk of developing disease. New cases.
Prevalence
Tells us the proportion of participants with disease at a particular point in time. Existing number of cases.
Biostatistics
The application of statistical principles to medical, public health, and biological applications. Includes the collecting, summarizing, interpreting data, and making inferences that appropriately account for uncertainty
Life Expectancy
The probability of dying each year of life. Used to measure overall health of the population. Highest life expectancy: Canada, Australia, New Zealand, most of Europe, Japan, South Korea and Chile
True or False: Health communications are how we perceive, combine and use health informatics to make decisions.
True
True or False: Health informatics includes the methods for collecting, compiling and presenting data.
True
True or false: A confidence interval is significant if the value of 1 is not found in the interval.
True
True or false: A confounder variable is related to the risk factor and also to the outcome.
True
True or false: In simple linear regression with a single independent variable, the coefficient tells you how much the dependent variable is expected to increase (if the coefficient is positive) or decrease (if the coefficient is negative) when that independent variable increases by one.
True
Advantages/Disadvantages of Source Data
Useful for dramatic, usual, and new conditions Requires alert clinicians and rapid ability to disseminate information
Sentinel Monitoring
Uses: Source data from state or federally based surveillance data. Monitored data that evaluates patterns of outcomes to alert surveillance bodies of change in prevalence or incidence of conditions. Advantages/Disadvantages Can be used for "real time" monitoring. Requires considerable knowledge of patterns of disease and use of services to develop
Syndromic Surveillance
Uses: May be able to detect unexpected and subtle changes, such as bioterrorism or new epidemic producing common symptoms. Advantages/Disadvantages May be used for early warning even when no disease is diagnosed. Does not provide a diagnosis and may have false positives.
Surveys/Sampling
Uses: Drawing conclusions about overall population and subgroups from representative samples. Advantages/Disadvantages Well-conducted survey allow inference to be drawn about larger populations Frequent delays in reporting data
Vital Statistics and Reportable Disease
Uses: Required by law- sometimes penalties imposed for noncompliance Birth and death key to defining leading causes of disease Reportable disease may be helpful in identifying changes over time Advantages/Disadvantages Vital statistics are very complete because of social and financial consequences Reportable disease often relies on institutional reporting rather than individuals clinicians Frequent delays in reporting data
Which source of population health data includes birth, death, marriage, divorce and reporting of key communicable and specially-selected non-communicable diseases?
Vital statistics
Correlations
When we want to measure how two different variables are associated with, or change with each other, we can examine their correlation. If your correlation or r value is positive, then that means that when one variable increases, the other variable also increases. If your correlation or r value is negative, then that means that when one variable increases, the other variable decreases.
Chi-Square Test
When you are comparing two categorical variables to each other, you use the chi-square (Χ2) statistic. With each chi-square statistic is a p-value to describe if the two categorical variables are statistically significant from each other or not. If the p-value is <.05, then the two groups are statistically different from each other.
Multiple Linear Regression
Y = mx + b
Linear Regression
Y=MX+b MX = Slope B = Intercept
Which statistic is a standard that tells us how many standard deviations above or below the population mean a raw score, or observation we have made, is from the mean?
Z-Score
Which significance value below describes that you are 99.9% certain that your observed value did not happen by chance alone?
p-value<.001