Exam I - Buchanan
what is the Ho and Ha for a case-control study?
Ho : OR = 1 (odds are equal in both groups) Ha: OR < or > 1 (odds are not equal in both groups)
what is the Ho and Ha for a kaplan-meier model
Ho : S1(t) = S2(t) survival properties are equal in the 2 groups Ha: S1(t) /= S2(t) survival properties are NOT equal in the 2 groups
describe the null and alternate hypothesis of a r test (pearson's test)
Ho : r = 0 (no linear association, scatterplot looks like snow) Ha : r > 0 OR r < 0 (positive or negative linear association present)
Interpret the following regression results: (select all) B: 2.8 R^2: 0.05 p value: 0.07 a. B suggests that x is associated w/ y b. B is not statistically significantly different from zero c. model does not fit well d. all of the above
b. B is not statistically significantly different from zero --> p value > 0.05, fail to reject null of r = 0 c. model does not fit well --> small R^2, only 5% of variance in y can be explained by increase in x
Which type of variable can have a range of values? a. discrete b. continuous c. categorical
b. continuous (73.4 kg)
What is the dependent variable in a logistic model? select all a. odds b. log (odds) c. logit
b. log (odds) c. logit
interpret this: age is measured as a continuous variable and OR of 1.05 (exposure is activity level) a. those aged >42 are 1.05 x as likely as likely as those <42 to be inactive b. odds of not being active increase by 5% for every year increase in age c. odds of not being active increase by 8.1 times as people get 1 year older
b. odds of not being active increase by 5% for every year increase in age
What type of test would be appropriate? is BP different in patients w/ and w/o DM? a. chi square b. t test c. spearman rho d. wilcoxon
b. t test (continuous)
what is the benefit of randomization
balances differences between patients that could effect risk or outcome of study
in a cox proportional hazard, what is the dependent variable?
hazard rate
good fit of the regression line is indicated by _____
high R^2
what does the cox proportional hazard investigate?
relationship between hazard rate and independent variables
match the statistical measure with the type of study a. relative risk b. odds ratio randomized controlled trials cohort studies case-control studies
relative risk is for randomized controlled trials & cohort studies odds ratio is for case-control studies
how is cumulative incidence (risk) represented
1-S(t) 1-survival
how to interpret an aOR
1-aOR = XX% less or more likely to experience an outcome in response to an exposure
T/F correlation is able to measure strength of a NONlinear relationship
F, this is why we need to check linear relationship w/ a scatterplot
T/F want a small p value for homer lemeshow test to show no evidence of lack of fit
F, want a large P value (want the Ho)
what is the equation for relative risk?
(a/(a+b)) / (c/(c+d)) a: exposed w/ outcome b: exposed w/o outcome c: unexposed w/ outcome d: unexposed w/o outcome incidence of outcome w/ exposure / incidence of outcome w/o exposure
what r value is for a perfect + correlation? what r value is for a perfect - correcation?
+1.0 -1.0
what is the range for log(odds)
- infinity to + infinity beneficial
T/F a survival analysis could be appropriate for a randomized clinical trial or a case-control study design
F, randomized clinical trial or cohort study design
what is the equation for the line of regression
-y = a + Bx B: slope (change y / change x)
an R^2 of 0.003 is what percent
0.03%
what are the 5 assumptions of a logistic regression
1. binary outcome (y/n data) 2. independent observations 3. independent variables not correlated (can check this) 4. independent variables and log(odds) are linear 5. large enough sample size
name the 5 assumptions for a pearson's test
1. continuous variables only 2. both variables approx normal 3. no outliers 4. linear relationship 5. independent observations
What 3 points should be checked in a regression?
1. p value 2. sign & units of B 3. R^2 value --> tells if model fits (if model isn't a good fit, then a large B means nothing)
to get the R^2 into a percent, multiply by ____
10
a multiple regression analysis predicts one dependent variable (y) from how many independent variables (x, predictors)?
2 or more each can be interval/ratio OR qualitative
in general, what % variance found by R^2 is considered a good fit?
>/= 50%
give examples of scenarios for censorship
ANYONE who doesn't make it to the end of the study, regardless of reason, is censored withdrawals loss to follow up not experiencing event during time of study
T/F correlation = causation
F!!!!!
T/F censoring does not reduce the # of patients who contribute to the curve
F, censoring DOES reduce the # of patients who contribute to the curve
T/F survival function and hazard function are not related
F, they are related
T/F cox proportional hazard model for survival analysis has no limitations
F, has limitations need to include adjusted survival curve
T/F linear functions fit the data for dichotomous outcomes
F, hence why we need to use logistic regression
T/F pearson's correlation coefficient (r) works well w/ outliers
F, instead use spearman's correlation coefficient
T/F logistic regression is a linear function
F, its non-linear
T/F pearson's correlation can be used to detect a quadratic association between two continuous variables
F
what happens to R^2 when more variables keep getting added? why is this bad? how is this fixed?
R^2 keeps increasing (NEVER will decrease) model with more variables will always seem to fit better, even better fit is not actually true fixed w/ adjusted R^2
how is survival function represented
S(t) survival
T/F a % range is an example of interval/ratio scale
T
T/F controlling for confounders in a logistic regression will require a multiple logistic regression
T
T/F hazard ratio has built in selection bias
T
T/F in a multiple regression, each additional variable is tested while the others are held constant
T
T/F log(odds) = B
T
T/F probability is a finite number, while odds is an infinite number
T
T/F the value of r is an absolute value
T
T/F you can only perform a linear regression for continuous data
T
T/f B can either equal increase in probability of outcome or disease
T
T/F the logit creates properties of a linear regression w/o actually being for a linear regression model
T beneficial
T/F probability of an outcome for a logistic regression should be performed
T --> makes the outcomes a percent to be able to run the logistic regression dependent variable of logistic regression is a probability
T/F categorical data is a subtype of discrete data. why?
T, categorical data is observations w/ limited values like counts (4 legs)
T/F for each additional variable in the multiple regression, there is another B
T, each variable tested gets its own B
T/F shape of a scatter plot is important in check assumptions for spearman's corrletion
T, if association is linear, then r will be near 0 and a true association could be missed
T/F the logit allows for prediction of odds of disease or odds ratio for exposure and disease
T, use slope beneficial
at what point of the study can survival estimates be unreliable & why
at the end of the study when a lg # of subjects have been censored
What are the 3 general methods to making experimintal conditions equal between study groups?
randomization placebo (could be double dummy) blinding
what does a tick in a kaplan-meier curve indicate
a censored subject
what type of measure (outcome) is studied to determine a difference in each of the following? a. difference in means b. difference in event rates c. difference in survival function
a. continuous b. dichotomous c. continuous w/ dichotomous (risk difference)
match each predictor w/ type of test: a. r b. R^2 c. adjusted R^2
a. r = pearson or spearman simple correlation b. R^2 = simple determination (explainable variance) c. adjusted R^2 = R^2 accounting for number of predictors (x) in the model
assign a level of correlation to each of the value ranges for r: a. 0-0.25 b. 0.25-0.75 c. 0.75-0.99 d. 1.0
a. weak b. moderate c. high d. perfect
what type of OR controls for confounding. in a logistic regression?
adjusted OR (aOR)
Which do I look at for a multiple linear regression, R^2 or adjusted R^2?
adjusted R^2
What is the type of data for each of these variables? age smoking status weight medication adherence
age --> continuous smoking status --> binary or ordinal weight --> continuous medication adherence --> binary
how to interpret R^2
amount of variance in y that can be explained by x
if y is a measure of risk for heart disease & x is # cigarettes smoked/day, interpret y-intercept (alpha) a. predicted value of y when x = 0 b. risk of heart disease for non-smokers when x = 0 c. a & b d. none of the above
c. x = 0 represents non-smokers
Can I use pearson's or spearman's if the scatterplot shows a "U" or "A" shape?
cant use either
the p value for a linear regression checks to see if alpha or beta is statistically significantly different from zero?
checking if B is statistically significantly different
describe the B in a logistic regression model
compares a change in log(odds) for every 1 unit increase in x
in interval data, the distance between two values is _______ & _________
constant & meaningful ex: Celsius scale
homoscedasticity means?
constant variance
the two subtypes of numeric measurements are ______ & _______
continuous & discrete
before calculating r for spearman's (rank order) correlation, what do you have to do w/ the data?
convert (continuous) data to rankings
r = rho = pearson's correlation coefficient = population _______
correlation
hazard ratios are for what model
cox proportional hazards similar to odds ratio of multivariable logistic regression
does a curve step down when a patient is censored or when a patient dies (or reaches the outcome)?
curve only steps down from patient death (or outcome) NOT censor
prediction of an outcome can be achieved w/ correlation or regression?
regression
are weight in 1971 & age in 1971 related? This question can be answered w/: a. t test b. pearson's c. spearman's d. linear regression
d. linear regression independent = age dependent = weight
give examples of events that could be studied in a survival analysis
death injury onset of illness time to recovery change in LDL change in SBP
the outcome is the independent or dependent variable?
dependent
binary is synonymous to _____
dichotomous
in a logistic regression, the independent (x) variable can be (3)
dichotomous categorical continuous
in a logistic regression, the dependent (y, outcome) variable must be
dichotomous (binary)
how do you calculate OR from a B value?
e^B
how is the adjusted hazard ratio calculated in a cox proportional hazard model?
e^coefficient
explain the proportional hazards assumption
effect of a risk factor is constant over time stratify if not proportional or interaction with time
what are the three possible goals of a survival analysis
estimate time to event for a group of individuals compare time to event between 2+ groups assess relationship of variables or covariates to time to event
binary data is mutually _________
exclusive
interpret OR >1 for determining if exposure is a risk factor for the disease
exposure increases disease risk (exposure is a risk factor)
interpret OR = 1 for determining if exposure is a risk factor for the disease
exposure is not a risk factor
interpret OR <1 for determining if exposure is a risk factor for the disease
exposure reduces disease risk (exposure is protective)
diagnosis & treatment can be guided by correlation or regression?
regression (estimating dose-response relationship)
what test checks the overall fit of a logistic regression?
hosmer lemeshow test
what does the slope of the survival curve tell
how fast survival changes
how to interpret B?
how much y changes when x changes by 1
what is an indicator in the shape of two survival curves that we would fail to reject Ho (aside from the p value)?
if the two survival curves overlap each other
how to interpret B in a multiple logistic regression
increase in log odds for a one unit increase in exposure of interest with all other exposures held constant
as censorship increases, the fraction of people experiencing death or an event ______ (increases or decreases)
increases / becomes larger
the predictor/exposure is the independent or dependent variable?
independent
how is an odds ratio (logistic regression) interpreted?
individuals w/ the exposure (x) have XX times OR XX% higher/lower the odds or chances of getting the disease compared to individuals without the exposure
what is the hazard rate
instantaneous incidence rate
what type of data is used for assessing a relationship between two continuous variables?
interval/ratio
how does regression build on correlation?
regression tells how to draw a straight line of best fit in a scatterplot in order to describe a relationship between X & Y
"stair-step" curve is kaplan meier or cox proportional hazard?
kaplan meier accounts for censoring
what are the 2 methods for survival analysis
kaplan meier cox proportional hazards
If hazard groups in a kaplan-meier cross, does the test gain or lack power?
lack power
What benefit does converting a probability to odds result in?
larger range of possible values for dependent variable (expands the range of outcomes)
what are the 4 assumptions for a linear regression?
linearity (of regression line) homoscedasticity (constant std.dev by looking @ resid plot) normal distribution independence
would linear or logistic regression be used to evaluate mortality?
logistic
what regression model estimates propensity scores
logistic regression
what regression model is used for a dichotomous outcome
logistic regression
describe 3 ways subjects become censored in survival analysis
loss to follow up or drop out study ends before person achieves event (death or outcome) counted as alive or outcome-free at time of study starting
the best line to fit the data __________ distance between observed and predicted data
minimizes ie. small residuals (actual y - predicted y)
describe monotonic vs non-monotonic relationship
monotonic: data moves in one direction (EITHER up or down) non-monotonic: data moves in multiple directions ( up AND down)
sign of r denotes _______; value of r denotes ________
nature; strength
can a cox proportional hazard be used if hazards cross each other?
no
can a multiple linear regression infer cause?
no
can a t test be used to assess association between two continuous variables?
no
can a t test be performed for nominal data?
no can do a z-test or chi square
can the distance between responses for ordinal data be quantitatively measured?
no ex: scale of strongly disagree to strongly agree
interpret an r of 0 (or very close to)
no correlation between the two variables
what would a scatter plot look like for an r of 0?
no pattern in the scatter points
what does an r value of 1 mean?
no scatter around the trend line
does hazard have an upper bound and is it a probability?
no to both h(t)>/=0
does an odds ratio demonstrate a difference in rate?
no, demonstrates a difference in odds of an outcome
can race and sex variables directly be used in a pearsons correlation test
no, they are nominal measurements
race and ethnicity are examples of what kind of data variables?
nominal (categorical)
what is the difference between nominal and ordinal data?
nominal data is unordered descriptions, while ordinal observations can be ordered (least happy to most happy)
what does the wald test assess
to see if B (the parameter) is significant
describe a disease odds ratio
odds of being a case of disease among exposed individuals divided by odds of being a case of disease among non-exposed individuals (a/b)/(c/d)
describe an exposure odds ratio
odds of being exposed among the cases divided by odds of being exposed amond the controls (a/c)/(b/d)
interpret OR = 1 for comparison between cases and controls
odds of exposure are equal among cases and controls
interpret OR >1 for comparison between cases and controls
odds of exposure for cases are greater than odds of exposure for controls
interpret OR <1 for comparison between cases and controls
odds of exposure for cases are less than odds of exposure for controls
explain an R^2 value of 0.008
only 1% of variability in y can be explained by x poor model fit
What type of data is this? no HS degree HS degree some college, no degree associates degree bachelors degree higher than bachelors degree
ordinal
the two subtypes of categorical measurements are _______ & _________
ordinal & nominal
what are the 3 assumptions for a spearman's correlation?
ordinal, interval, or ratio scale variables two variables measured on all study participants monotonic relationship between the two variables
what is the coefficient of a cox proportional hazard model?
parameter estimate
how are influential observations checked for a logistic regression?
pearson's
What two tests can be performed to assess association between numerical variables?
pearson's spearman's
how is unreliable survival estimates d/t censorship fixed?
peto test (as opposed to log-rank) --> weights survival time earlier in the curve more heavily ex: high mortality --> use peto
what does a scatter plot of the variables show for assumption checks?
possible outliers correlation ( + or -) linear relationship (or lack of)
how to calculate odds from a probability?
probability / 1 - probability
what is the hazard function
probability that if you survive to a specified time, you will succumb to the event in the next instant
propensity scores are matched in patients to balance baseline characteristics between study groups, similar result to what?
randomization
what type of bias does loss to follow up introduce?
selection bias
what is the benefit to using a multiple regression?
simultaneously considers influence of multiple explanatory variables on a response variable y AND adjusts out influence of confounders/other variables
spearman's correlation is used when data is strongly _______ or not normally ________ (ordinal)
skewed distributed
What 2 parts of a survival curve need to be evaluated?
slop shape of curve
pain intensity and narcotic dose correlation would be performed w/ pearson's or spearman's?
spearman's
what does the p value of <0.05 mean based on an alpha of 0.05
statistically significant difference between the two survival curves
how do we estimate curves for specific groups in the kaplan meier method
stratification
what type of longitudinal data does survival analysis refer to?
survival or time to event
what is compared in the kaplan meier survival analysis
survival probabilities of 2 groups ex: treated vs untreated patients
how would a high p value for a cox proportional hazard be interpreted?
test parameter does not significantly impact survival
what is r
the correlation coefficient
in the kaplan meier curve, there is only a downward trend if what happens?
the event occurs (death or other target outcome)
what does the shape of a survival curve tell
the pattern of survival varying over time
define time to event
time from entry into a study until subject has a specific outcome
in an SPSS table, does the B come from the unstanderdized B column or the standardized coefficients beta column?
unstandardized B of the constant row
how do we adjust for covariates in the kaplan meier method
use inverse probability weights
interpret this result: wt71 & wt82 have pearson's r of 0.876 & p<0.00001
weight in 1971 and 1982 are highly correlated, and this correlation is statistically significant
what should be the only difference in groups achieved through randomization
what treatment (exposure) each group gets
what does controlling for variables help determine?
which independent variables are truly related to the dependent variable
if B is 0, then?
y does not change as much as x does
Give some examples of dichotomous outcomes
y/n pancreatic cancer y/n worse sx y/n appropriate tx y/n survival after surgery
can adjustments for covariates & estimation curves be made in the kaplan meier method?
yes
can outcomes between groups be directly compared if there is randomization?
yes
can hazard ratio change over time?
yes, but not necessarily reported over time
ratio data has an absolute _____ point
zero ex: Kelvin scale, weight.