Biostatistics
Random variables - two types
1) Discrete variables (e.g. dichotomous, categorical) 2) Continuous variables
Choosing a appropriate statistical test - factors
1) Type of data (nominal, ordinal, or continuous) 2) Distribution of data (e.g. normal) 3) Number of groups 4) Study design (e.g. parallel, crossover) 5) Presence of confounding variables 6) One-tailed vs. two-tailed (more common in med lit) 7) Parametric vs. nonparametric tests
Parametric tests - analysis of variance (ANOVA)
A more generalized version of the t-test that can apply to more than two groups 1) One-way ANOVA 2) Two-way ANOVA 3) Repeated-measures ANOVA Several more complex factorial ANOVAs can be used
Regression
A statistical technique related to correlation There are many different types; for simple linear regression, one continuous outcome (dependent) variable and one continuous independent (causative) variable Two main purposes of regression: 1) Development of prediction model (equation) 2) Accuracy of prediction (or accuracy of equation)
Hypothesis testing - perform the experiment and estimate the test statistic
A test statistic is calculated from the observed data in the study, which is compared with the critical value Depending on this test statistic's value, H0 is not rejected (often called fail to reject) or rejected In general, the test statistic and critical value are not presented in the literature; instead, p-values are generally reported and compared with a priori α values to assess statistical significance p-value: probability of obtaining a test statistic and critical value as extreme, or more extreme, than the one actually obtained Because computers are used in these tests, this step is often transparent; the p-value estimated in the statistical test is compared with the a priori α (usually 0.05), and the decision is made
Random variables - definition
A variable with observed values that may be considered outcomes of an experiment and whose values cannot be anticipated with certainty before the experiment is conducted
Normal distribution - how do we assess?
A visual check of a distribution can help determine whether it is normally distributed (whether it appears symmetric and bell shaped) Need the data to perform these checks: 1) Frequency distribution and histograms (visually look at the data; you should do this anyway) 2) Median and mean will be about equal for normally distributed data (most practical and easiest to use) 3) Formal test: Kolmogorov-Smirnov test 4) More challenging to evaluate this when we do not have access to the data (when we are reading an article), because most articles do not present all data or both the mean and median Mean/SD define a normal distribution; this is termed parametric as these parameters define normal distribution; ergo it is meaningless to look at mean/SD if the data is not normally distributed
ANOVA - two-way ANOVA
Additional factor (e.g. age) added to one-way ANOVA young groups: group 1 < compared to > group 2 <compared to> group 3 old groups: group 1 < compared to > group 2 < compared to > group 3
Decision errors - statistical significance vs. clinical significance
As stated earlier, the size of the p-value is not necessarily related to the clinical importance of the result; smaller values mean only that chance is less likely to explain observed differences Statistically significant does not necessarily mean clinically significant Lack of statistical significance does not mean that results are not clinically important (might be a huge difference in 2 therapies but too much variability in the data prevents detection of statistically significance) When considering nonsignificant findings, consider sample size, estimated power, difference study was powered to detect, and observed variability Remember, if you find a statistically significant difference, you have appropriate power by definition, regardless of sample size
CIs - why are 95% CIs most often reported?
Assume a baseline birth weight in a group n = 51 with a mean ± SD of 1.18 ± 0.4 kg 95% CI is about equal to the mean ± 1.96 × SEM (or mean ± 2 × SEM); in reality, it depends on the distribution being used and is a bit more complicated What is the 95% CI? It is (1.07-1.29), meaning there is 95% certainty that the true mean of the entire population studied is between 1.07 and 1.29 kg What is the 90% CI? The 90% CI is calculated to be (1.09-1.27); of note, the 95% CI will always be wider than the 90% CI for any given sample; therefore, the wider the CI, the more likely it is to encompass the true population mean
Population distributions - discrete distributions
Binomial distribution Poisson distribution
Discrete variables
Can take only a limited number of values within a given range Nominal: classified into groups in an unordered manner and with no indication of relative severity (e.g. male or female sex, dead or alive, disease presence or absence, race, marital status); these data are often expressed as a frequency or proportion Ordinal: ranked in a specific order but with no consistent level of magnitude of difference between ranks (e.g. NYHA functional class describes the functional status of patients with heart failure, and subjects are classified in increasing order of symptoms: I, II, III, or IV; Likert-type scales) Common error: measure of central tendency; in most cases, means and standard deviations (SDs) should not be reported with ordinal data
Nonparametric tests - nominal data
Chi-square (χ2) test: compares expected and observed proportions or percentages between two or more groups; test of independence; test of goodness of fit Fisher exact test: specialized version of the chi-square test for small groups (cells) containing <5 predicted observations McNemar: Paired samples Mantel-Haenszel: Controls for the influence of confounders
Confidence intervals (CIs)
Commonly reported as a way to estimate a population parameter In the medical literature, 95% CIs are the most commonly reported CIs (analogous to a p-value of 0.05) In repeated samples, 95% of all CIs include true population value (i.e. the likelihood or confidence or probability that the population value is contained within the interval) In some cases, 90% or 99% CIs are reported The differences between the SD, SEM, and CIs should be noted when interpreting the literature because they are often used interchangeably; although it is common for CIs to be confused with SDs, the information each provides is quite different and must be assessed correctly CIs can also be used for any sample estimate; estimates derived from categorical data such as risk, risk differences, and risk ratios are often presented with the CI
Estimating the survival function - log-rank test
Compare the survival distributions between two or more groups This test precludes an analysis of the effects of several variables or the magnitude of difference between groups or the CI (see below for Cox proportional hazards model) H0: no difference in survival between the two populations Log-rank test uses several assumptions: 1) Random sampling and subjects chosen independently 2) Consistent criteria for entry or end point 3) Baseline survival rate does not change as time progresses 4) Censored subjects have the same average survival time as uncensored subjects
Student t-tests - paired test
Compares the mean difference of paired or matched samples This is a related samples test (crossover study) group 1 measurement 1 < compared to > group 1 measurement 2 Example: comparing the mean/SD of LDL levels of all the people in a room 6 months ago to that of those people today
Student t-tests - one-sample test
Compares the mean of the study sample with the population mean group 1 < compared to > known population mean Example: comparing the mean/SD of LDL levels of a room of people to that of the average LDL levels of the US
ANOVA - one-way ANOVA
Compares the means of three or more groups in a study Also known as single-factor ANOVA This is an independent samples test group 1 < compared to > group 2 < compared to > group 3 Can have an infinite number of groups
Student t-tests - two-sample, independent samples, or unpaired test
Compares the means of two independent samples This is an independent samples test (parallel study) group 1 < compared to > group 2 Example: comparing the mean/SD of LDL levels of all the men to that of all of the women in a room Equal variance tests: 1) Rule for variances: if the ratio of larger variance to smaller variance is greater than 2, we generally conclude the variances are different 2) Formal test for differences in variances: F test 3) Adjustments can be made for cases of unequal variance Unequal variance: correction employed to account for variances
Types of statistics - inferential statistics
Conclusions or generalizations made about a population (large group) from the study of a sample of that population Choosing and evaluating statistical methods depend, in part, on the type of data used An educated statement about an unknown population is commonly referred to in statistics as an inference Statistical inference can be made by estimation (confidence interval) or hypothesis testing (e.g. null hypothesis)
Correlation vs. regression - introduction
Correlation examines the strength of the association between two variables; it does not necessarily assume that one variable is useful in predicting the other Regression examines the ability of one or more variables to predict another variable
Measures of data spread or variability - range
Difference between the smallest and largest value in a data set; does not give a tremendous amount of information by itself Easy to compute (simple subtraction) Size of range is very sensitive to outliers Often reported as the actual values rather than the difference between the two extreme values
Descriptive statistics - visual methods of describing data
Frequency distribution Histogram Scatterplot
Regression - accuracy of prediction
How well the independent variable predicts the dependent variable Regression analysis determines the extent of variability in the dependent variable that can be explained by the independent variable Coefficient of determination (r2) measured describing this relationship; values of r2 can range from 0 to 1 An r2 of 0.80 could be interpreted as saying that 80% of the variability in Y is explained by the variability in X This does not provide a mechanistic understanding of the relationship between X and Y but rather a description of how clearly such a model (linear or otherwise) describes the relationship between the two variables Like the interpretation of r, the interpretation of r2 depends on the scientific arena (e.g. clinical research, basic research, social science research) to which it is applied
CIs instead of standard hypothesis testing
Hypothesis testing and calculation of p-values tell us (ideally) whether there is or is not a statistically significant difference between groups, but they do not tell us anything about the magnitude of the difference CIs help us determine the importance of a finding or findings, which we can apply to a situation CIs give us an idea of the magnitude of the difference between groups and the statistical significance CIs are a "range" of data, together with a point estimate of the difference Wide CIs: many results are possible, either larger or smaller than the point estimate provided by the study; all values contained in the CI are statistically plausible If the estimate is the difference between two continuous variables, a CI that includes zero (no difference between two variables) can be interpreted as not statistically significant (a p-value of ≥0.05); there is no need to show both the 95% CI and the p-value The interpretation of CIs for odds ratios and relative risks is somewhat different; in that case, a value of 1 indicates no difference in risk, and if the CI includes 1, there is no statistical difference (see the discussions of case-control and cohort in other sections for how to interpret CIs for odds ratios and relative risks)
Continuous variables
Sometimes called counting variables Continuous variables can take on any value within a given range Interval scaled: data are ranked in a specific order with a consistent change in magnitude between units; the zero point is arbitrary (e.g. degrees Fahrenheit). Ratio scaled: exactly like interval but with an absolute zero (e.g. degrees Kelvin, heart rate, blood pressure, time, distance)
Regression - prediction model
Making predictions of the dependent variable from the independent variable y = mx + b or dependent variable = (slope)(independent variable) + intercept Regression is useful in constructing predictive models; the literature is full of examples of predictions; the process involves developing a formula for a regression line that best fits the observed data
Descriptive statistics - numerical methods of describing data: measures of central tendency
Mean Median Mode
Measures of data spread or variability - standard deviation
Measure of the variability around the mean; most common measure used to describe the spread of data Square root of the variance (average squared difference of each observation from the mean), so the SD is reported in the original units (nonsquared); alternatively, variance = SD^2 Appropriately applied only to continuous data that are normally or near-normally distributed or that can be transformed to be normally distributed By the empirical rule, 68% of the sample values are found within ±1 SD, 95% are found within ±2 SD, and 99% are found within ±3 SD The coefficient of variation relates the mean and the SD (SD/mean × 100%) Standard deviation should be presented with the mean
Measures of central tendency - median
Midpoint of the values when placed in order from highest to lowest Half of the observations are above and below When there are an even number of observations, it is the mean of the two middle values Also called 50th percentile Can be used for ordinal or continuous data (especially good for skewed populations) Insensitive to outliers
Population distributions - normal (Gaussian) distribution
Most common model for population distributions Symmetric or bell-shaped frequency distribution When measuring a random variable in a large-enough sample of any population, some values will occur more often than will others Probability: the likelihood that any one event will occur given all the possible outcomes
Measures of central tendency - mode
Most common value that occurs in a distribution Can be used for nominal, ordinal, or continuous data Sometimes, there may be more than one mode (e.g. bimodal, trimodal) Does not help describe meaningful distributions with a large range of values, each of which occurs infrequently Example: CYP2D6 poor and extensive metabolizers are two separate and distinct modes that occur within the enzyme system
Estimating the survival function - Cox proportional hazards model
Most popular method to evaluate the impact of covariates; reported (graphically) like Kaplan-Meier Investigates several variables at a time Actual method of construction and calculation is complex Compares survival in two or more groups after other variables are adjusted for Allows calculation of a hazard ratio (and CI)
Regression - types of analysis
Multiple linear regression: one continuous independent variable and two or more continuous dependent variables Simple logistic regression: one categorical response variable and one continuous or categorical explanatory variable Multiple logistic regression: one categorical response variable and two or more continuous or categorical explanatory variables Nonlinear regression: variables are not linearly related (or cannot be transformed into a linear relationship); this is where our pharmacokinetic equations come from Polynomial regression: any number of response and continuous variables with a curvilinear relationship (e.g. cubed, squared) Pick simplest model that fits the data
Descriptive statistics - numerical methods of describing data: measures of data spread or variability
Standard deviation Range Percentiles
Spearman rank correlation
Nonparametric test that quantifies the strength of an association between two variables but does not assume a normal distribution of continuous data Can be used for ordinal data or nonnormally distributed continuous data
Hypothesis testing - null and alternative hypothesis
Null hypothesis (H0): example: no difference between groups being compared (treatment A = treatment B) Alternative hypothesis (Ha): example: opposite of null hypothesis; states that there is a difference (treatment A ≠ treatment B) The structure or the manner in which the hypothesis is written dictates which statistical test is used; two-sample t-test: H0: mean 1 = mean 2 Used to assist in determining whether any observed differences between groups can be explained by chance Tests for statistical significance (hypothesis testing) determine whether the data are consistent with H0 (no difference) The results of the hypothesis testing will indicate whether enough evidence exists for H0 to be rejected If H0 is rejected = statistically significant difference between groups (unlikely attributable to chance) If H0 is not rejected = no statistically significant difference between groups (any "apparent" differences may be attributable to chance); note that we are not concluding that the treatments are equal
Normal distribution - estimation and sampling variability
One method that can be used to make an inference about a population parameter Separate samples (even of the same size) from a single population will give slightly different estimates The distribution of means from these separate random samples approximates a normal distribution; the mean of this "distribution of means" = the unknown population mean, μ; the SD of the means is estimated by the standard error of the mean (SEM), which conceptually represents the variability of the distribution of means; as in any normal distribution, 95% of the sample means lie within ±2 SEM of the population mean The distribution of means from these random samples is about normal regardless of the underlying population distribution (central limit theorem); you will get slightly different mean and SD values each time you repeat this experiment The SEM is estimated for a single sample by dividing the SD by the square root of the sample size (n): SEM = SD/sqrt(n) The SEM quantifies uncertainty in the estimate of the mean, not variability in the sample; this important for hypothesis testing and 95% CI estimation Why is all of this information about the difference between the SEM and SD worth knowing? 1) Calculation of CIs (95% CI is about mean ± 2 times the SEM) 2) Hypothesis testing 3) Deception (e.g., makes results look less "variable," especially when used in graphic format); remember, mean/SD is the most appropriate way to describe variability in normally distributed data
Parametric vs. nonparametric tests
Parametric tests assume: 1) Data being investigated have an underlying distribution that is normal or close to normal or, more correctly, randomly drawn from a parent population with a normal distribution 2) Data measured are continuous data, measured on either an interval or a ratio scale 3) Parametric tests assume that the data being investigated have variances (or SD) that are homogeneous between the groups investigated; this is often called homoscedasticity Nonparametric tests are used when data are not normally distributed or do not meet other criteria for parametric tests (e.g. discrete data); have less power than parametric tests (assuming parametric tests are being used appropriately) thus are not preferred for normal, continuous data
Measures of central tendency vs. measures of variability
Presenting data using only measures of central tendency can be misleading without some idea of data spread; studies that report only medians or means without their accompanying measures of data spread should be closely scrutinized Continuous data: present mean and standard deviation Ordinal data: present median and range/percentile/IQR
Non-directional - difference
Question: are the means different? Hypothesis: H0: Mean1 = Mean2 HA: Mean1 ≠ Mean2 OR H0: Mean1 - Mean2 = 0 HA: Mean1 - Mean2 ≠ 0 Method: traditional 2-sided t-test; confidence intervals
Non-directional - equivalence
Question: are the means practically equivalent? Hypothesis: H0: Mean1 - Mean2 ≥ Δ HA: Mean1 - Mean2 < Δ Method: two 1-sided t-test procedures; confidence intervals
Directional - superiority
Question: is mean 1 > mean 2? (or some other similarly worded question) Hypothesis: H0: Mean1 ≤ Mean2 HA: Mean1 > Mean2 or H0: Mean1 - Mean2 ≤ 0 HA: Mean1 - Mean2 > 0 Method: traditional 1-sided t-test; confidence intervals
Directional - noninferiority
Question: is mean 1 no more than a certain amount lower than mean 2? Hypothesis: H0: Mean1 - Mean2 ≥ Δ HA: Mean1 - Mean2 < Δ Method: confidence intervals
Decision errors - statistical power analysis and sample size calculation
Related to above discussion of power and sample size Sample size estimates should be performed in all studies a priori Necessary components for estimating appropriate sample size: 1) Acceptable type II error rate (usually 0.10-0.20) 2) Observed difference in predicted study outcomes that is clinically significant 3) The expected variability in number 2 4) Acceptable type I error rate (usually 0.05) 5) Statistical test that will be used for primary end point
Parametric tests - student t-tests
Several different types: 1) One-sample test 2) Two-sample, independent samples, or unpaired test 3) Paired test Common error: use of multiple t-tests with more than two groups (should use one of the ANOVA tests)
Nonparametric tests - tests for related or paired samples
Sign test or Wilcoxon signed rank test: compares 2 matched or paired samples (related to a paired t-test) Friedman ANOVA by ranks: compares ≥3 matched or paired groups
Parametric tests - post-hoc tests
Since you cannot use multiple t-tests to determine which groups actually differ, post-hoc tests are used to determine this Maintains appropriate α-error rate Conducted if ANOVA is statistically significant Examples: 1) Tukey HSD (Honestly Significant Difference) 2) Bonferroni 3) Scheffé 4) Newman-Keuls
Survival analysis - basics
Studies the time between entry in a study and some event (e.g. death, myocardial infarction, yes/no question) Censoring makes survival methods unique; considers that some subjects leave the study for reasons other than the event (e.g. lost to follow-up, end of study period) Considers that all subjects do not enter the study at the same time Standard methods of statistical analysis such as t-tests and linear or logistic regression may not be appropriately applied to survival data because of censoring
Measures of central tendency - arithmetic mean (i.e. average)
Sum of all values divided by the total number of values Should generally be used only for continuous (i.e. interval and ratio scaled) and normally distributed data Very sensitive to outliers and tend toward the tail, which has the outliers Most commonly used and most understood measure of central tendency Geometric mean
Pearson correlation - pearls
The closer the magnitude of r to 1 (either + or −), the more highly correlated the two variables; the weaker the relationship between the two variables, the closer r is to 0. There is no agreed-on or consistent interpretation of the value of the correlation coefficient; it is dependent on the environment of the investigation (laboratory vs. clinical experiment) Pay more attention to the magnitude of the correlation than to the p-value because it is influenced by sample size Crucial to the proper use of correlation analysis is interpretation of the graphic representation of the two variables; before using correlation analysis, it is essential to generate a scatterplot of the two variables to visually examine the relationship
Regression - example
The following data are taken from a study evaluating enoxaparin use; the authors were interested in predicting patient response (measured as antifactor Xa concentrations) from the enoxaparin dose in the 75 subjects who were studied The authors performed regression analysis and reported the following: slope = 0.227; y-intercept = 0.097; p <0.05; r2 = 0.31 What are the necessary assumptions to use regression analysis?: normally distributed data Provide an interpretation of the coefficient of determination: 31% of the variability in antifactor Xa concentrations is predicted from enoxaparin dose (i.e. lots of other factors effect antifactor Xa concentrations) Predict antifactor Xa concentrations at enoxaparin doses of 2 and 3.75 mg/kg What does the p <0.05 value indicate?
Measures of data spread or variability - percentiles
The point (value) in a distribution in which a value is larger than some percentage of the other values in the sample; can be calculated by ranking all data in a data set The 75th percentile lies at a point at which 75% of the other values are smaller Does not assume the population has a normal distribution (or any other distribution) The interquartile range (IQR) is an example of the use of percentiles to describe the middle 50% values; the IQR encompasses the 25th-75th percentile Often used in place of a range IQR + median is appropriate for looking at ordinal data
Decision errors - power (1-β)
The probability of making a correct decision when H0 is false; the ability to detect differences between groups if one actually exists Typically 0.8-0.9 Dependent on the following factors: 1) Predetermined α (typically 0.05) 2) Sample size (smaller sample sizes have lower powers) 3) The size of the difference between the outcomes you want to detect, called the effect size; often not known before the experiment is conducted, so to estimate the power of your test, you will have to specify how large a change is worth detecting (more power is needed to detect a smaller difference) 4) The variability of the outcomes that are being measured (as variability within data set increases, power decreases) 5) Items 3 and 4 are generally determined from previous data or the literature Power is decreased by (in addition to the above criteria): 1) Poor study design 2) Incorrect statistical tests (e.g. use of nonparametric tests when parametric tests are appropriate)
Decision errors - type 2 error
The probability of making this error is called β Concluding that no difference exists when one truly does (not rejecting H0 when it should be rejected) It has become a convention to set β to between 0.20 and 0.10; ergo we are more willing to make a type 2 error than a type I error
Decision errors - type 1 error
The probability of making this error is defined as the significance level α Concluding that a difference exists when one truly does not (rejecting H0 when it should not be) Convention is to set the α to 0.05, effectively meaning that, 1 in 20 times, a type I error will occur when the H0 is rejected; so, 5.0% of the time, a researcher will conclude that there is a statistically significant difference when one does not actually exist. The calculated chance that a type I error has occurred is called the p-value. The p-value tells us the likelihood of obtaining a given (or a more extreme) test result if the H0 is true; when the α level is set a priori, H0 is rejected when p is less than α; in other words, the p-value tells us the probability of being wrong when we conclude that a true difference exists (false positive) A lower p-value does not mean the result is more important or more meaningful but only that it is statistically significant and not likely to be attributable to chance
Pearson correlation - basics
The strength of the relationship between two variables that are normally distributed, ratio or interval scaled, and linearly related is measured with a correlation coefficient (parametric) Often referred to as the degree of association between the two variables Does not necessarily imply that one variable is dependent on the other (regression analysis will do that) Pearson correlation (r) ranges from −1 to +1 and can take any value in between: -1: perfect negative linear relationship 0: no linear relationship +1: perfect positive linear relationship Hypothesis testing is performed to determine whether the correlation coefficient is different from zero; this test is highly influenced by sample size
Hypothesis testing - types
These are situations in which two groups are being compared There are numerous other examples of situations these procedures could be applied to
Nonparametric tests
These tests may also be used for ordinal data (unlike parametric) or continuous data that do not meet the other assumptions of the parametric tests (normally distributed and homogenous variance/SD)
ANOVA - repeated-measures ANOVA
This is a related samples test Extension of paired t-test group 1 measurement 1 < compared to > group 1 measurement 2 < compared > group 1 measurement 3
Hypothesis testing - rejecting H0
To determine what is sufficient evidence to reject H0, set the a priori significance level (α) and generate the decision rule Developed after the research question has been stated in hypothesis form Used to determine the level of acceptable error caused by a false positive (also known as level of significance) Convention: a priori α is usually 0.05 Critical value is calculated, capturing how extreme the sample data must be to reject H0
Regression - two statistical test for simple linear regression
To test the hypothesis that the y-intercept differs from zero To test the hypothesis that the slope of the line is different from zero
Types of statistics - descriptive statistics
Used to summarize and describe data that are collected or generated in research studies; this is done both visually and numerically
Estimating the survival function - Kaplan-Meier method
Uses survival times (or censored survival times) to estimate the proportion of people who would survive a given length of time under the same circumstances Allows the production of a table ("life table") and a graph ("survival curve") We can visually evaluate the curves, but we need a test to evaluate them formally (e.g. log-rank test)
Choosing the most appropriate statistical test
Which is the appropriate statistical test to determine baseline differences in: 1) Sex distribution? chi-square (nominal and discrete) 2) LDL-C? two-sample t-test (assume normally distributed if means and SD are close between two groups) 3) Percentage of smokers and nonsmokers? chi-square (nominal and discrete) Which is the appropriate statistical test to determine: 1) The effect of rosuvastatin on LDL-C? paired test (same person, measurement 1 and measurement 2) 2) The primary end point (3 month change in LDL)? two-sample (comparing the delta of 2 different groups)
Nonparametric tests - tests for independent samples
Wilcoxon rank sum, Mann-Whitney U test, or Wilcoxon-Mann-Whitney Test: compares 2 independent samples (related to a t-test) Kruskal-Wallis one-way ANOVA by ranks: compares ≥3 independent groups (related to one-way ANOVA); post hoc testing can then be done with the Wilcoxon rank sum/Mann-Whitney U-test/Wilcoxon-Mann-Whitney Test
Normal distribution - landmarks for continuous, normally distributed data
μ: Population mean is equal to zero σ: Population SD is equal to 1 x and s represent the sample mean and SD