lecture 4
what type of graph does this describe? •Top of box at third quartile •Bottom of box at first quartile •Box shows the middle 50% of data; a.k.a. interquartile range •Median/midpoint depicted as a bar in the box middle •Whiskers at 1.5 X interquartile range above the 75th and below the 25th percentiles •Values outside the whiskers are outliers
Box Plot (A.k.a. Box and Whisker Plot)
what type of test is the only valid way of predicting a time-dependent outcome?
Cox Proportional Hazards Regression
Ad hoc analysis
Focuses on answering specific, one-off questions or addressing unforeseen problems that arise
what type of graph does this describe? •Line graphs •Created by connecting the midpoints of histogram columns
Frequency polygons
the preferred method for showing survival (Provides exact estimates of survival each time a patient dies)
Kaplan-Meier Curve
does correlation imply causation?
NO
One-way ANOVA vs two-way ANOVA
One-way ANOVA uses one independent variable two-way ANOVA uses two independent variables
what type of sampling is this? •Each member of the population has the same probability of being selected •Random number table/pseudo-random computer number generator. •Downside: can over-sample from one group of homogenous strata
Simple Random Sampling
intention to treat
aims to preserve the original randomization and to avoid potential bias due to exclusion of patients •Includes all patients enrolled in the study in the analysis •Patients who are enrolled are included in the analysis for the group they were randomized to even if they drop out/die
what does ANOVA analyze
analysis of variance •Looks for a difference between the means of the groups •If there is a difference, then need to make comparisons among pairs/combinations of groups
Censoring
analyzing survival for patients who are still living
research on two independent groups if nominal data (counts/frequencies)
chi-square
independent variable: nominal (qualitative) dependent variable: nominal which test is best?
chi-squared
what type of sampling is this? Create a sample by combining subgroups
cluster sampling
Descriptive statistics
collect and summarize data about a population
what type of sampling is this? •Availability sampling •Non-random •Commonly used in clinical studies •Geographical proximity •Availability at a given time •Randomly assign the members of the sample to treatment groups
convenience sampling
research on two independent groups if small sample size, small expected frequences
fisher's exact test
what type of graph does this describe? •Bars •X axis - measure of interest •Y axis - number of observations •Numbers or percentages •Area of each bar proportional to the number of observations
histogram
Pearson coefficient (r)
how close the data is to the line of best fit
Coefficient of determination (R^2)
how close the predicted values match the observed values (strength of a model)
independent variable: nominal dependent variable: numerical which test is best?
kaplan-meier
regression
linear relationship and outcome prediction
correlation
linear relationship between variables •Correlation coefficient (Pearson's) •Ranges from -1 to +1
correlation 0
no linear relationship
independent variable: >2 nominal variables dependent variable: numerical which test is best?
one-way ANOVA
correlation -1
perfect negative linear relationship
correlation +1
perfect positive linear relationship
independent variable: numerical dependent variable: numerical which test is best?
regression
what type of graph does this describe? •A.k.a. Bivariate Plot •Show relationship between two numerical characteristics •X - independent variable (cause) •Y - dependent variable (effect)
scatter plots
what type of plot does this describe? •Helps to show tally of observations •First step in creating a frequency table •Data observations is divided into subdivisions - classes/intervals •Stem is the first digit of each observation class Leaf is the second digit of each observation
stem and leaf plot
what type of sampling is this? •Population can be divided into homogenous strata •I.e., male/female gender •College grade levels •Prevents over sampling from one of the strata •Sample equal numbers from each strata •i.e., sample 100 college students, 25 from each of the four undergraduate grade levels Use simple random sampling within each strata
stratified sampling
what type of sampling is this? •Start with a random sample from the population •Choose a regular predetermined interval (every 10th person)
systematic sampling
independent variable: binary nominal dependent variable: numerical which test is best?
t test
research on two independent groups if outcome is measured numerically (assumes normal distribution & equal SD)
t test
research on one group - questions about means
t test - if normal distributions
Inferential statistics
take a sample of the population; make inferences about the entire population
research on one group if non-normal distributions
use a transformation (linear, non-linear: logs, nonparametric)
Multiple regression
used with two or more independent variables
research on two independent groups if skewed distributions or different standard deviations
wilcoxon rank sum
research on one group - questions about proportions
z distribution - if normal distribution
Propensity Scoring
•Alternative to multiple regression and analysis of covariance •Used to control for a group of confounding variables •Experimental studies the probability of being exposed (or unexposed) is 0.5 - its random
t test
•Answer research questions about one group of subjects measured on one or two occasions. •Very commonly used statistical test in medical research •Answer questions about means •Critical value is calculated using the α and sample size (degrees of freedom) •Test statistic is calculate using the sample mean and standard deviation, as well as the population mean and the number of samples
types of graphs/plots to use for nominal (qualitative) data - counts/frequency of occurrences
•Contingency Table •Bar Chart •Pie Chart Pictograph
Linear Regression
•Correlation describes a relationship, and regression describes both a relationship and predicts an outcome •Regression involves predicting the value of one characteristic from knowledge of another
frequency tables
•Display the frequency of observations (E.g., number of patients, percentages) •Created after data is divided for stem and leaf plot •Data divided into classes as in stem and leaf
Pre-specified Analyses
•More "respected" •Identify good prior reasons for anticipating that the proportional effects of treatment might be very different in different circumstances •Pre-specify a particular subgroup analysis in the study protocol •Include a prediction of the direction of the proposed interaction
number needed to harm
•Number of people who need to be treated to cause harm to one more person •1/ARI •ARI is the difference in the incidence of harm in people treated and the incidence of harm in the people not treated.
number needed to treat
•Number of people who need to be treated to prevent one event •1/ARR •If small ARR, larger number of people need to be treated to prevent one case
Comparing Two Frequency Distributions
•Percentage polygon •Like a frequency polygon •But converted to percentages
Chi Square
•Qualitative (nominal) variables •Tests for independence between the variables •Compares the observed frequencies in each group to the expected frequencies *Avoid when expected frequency is 2 or less
Fisher's Exact Test
•Replacement for Chi Square for nominal variables when there are small sample sizes and frequency expected events is 2 or less, and using a 2 x 2 table •Determines the probability of the observed frequencies
Post hoc - "after this"
•Statistical analyses that were specified after the data were seen •Used to uncover specific differences between three or more group means when an analysis of variance (ANOVA) test is significant •Sometimes called data dredging •Statistical associations that it finds are often spurious •Motivated by a desire to produce positive results or see a project as successful
types of graphs/plots to use for numerical (quantitative) data - numbers
•Stem and Leaf Plots •Histogram •Box Plots •Frequency Polygon
Hypothesis Testing Steps
•Step 1: State the Hypotheses - null hypothesis (which is what we test) and the research hypothesis (which is what we expect) •Step 2: Find the Critical Values: α (how much of the area under the curve composes our rejection region) and the directionality of the test •Step 3: Compute the Test Statistic - collect data and calculate our test statistic •Different tests are used for different data, and are calculated in different ways, •Step 4: Make the Decision - compare the test statistic to the critical value •Decide whether to reject/fail to reject the null hypothesis
Subgroup analysis
•Tx effect on the outcome of interest differs according to the presence/absence of a baseline/demo-graphic factor •Statistical analysis of the influence of a baseline factor on the effect •Often used in clinical trials •Goal is to learn how to use the treatment most effectively
Cox Proportional Hazards Regression
•Used in studies that look at the impact of multiple variables on survival •Numerical and nominal variables •Independent variables can vary with time •Results used to determine relative risk/odds ratio associated with each variable
per protocol
•aims to identify a treatment effect which would occur under optimal conditions •Includes only those patients who completed the treatment originally allocated •If done alone, it leads to bias