OEHS Applied Occupational Biostatistics Quiz review 2
1. What correlation statistic is generally considered a strong correlation? a. 0.8 b. 0.1<|r|<0.3 c. P<0.05 d. 0.5<|r| e. 0.3<|r|<0.5
0.5<|r|
27. If Risk Different comparing A+ vs A- among B- is 0.5 and if RR comparing B+ vs B- among A- is 0.4 What would you expect to find if there was difference effect modification a. Unable to determine b. 0.2 c. 0.9 d. 0.1
0.9
30. How do you define effect modification? a. A third factor which has a relationship between the exposure and the outcome and therefore may mask or inflate the true relationship between the exposure and the outcome. b. A variable which effectively modifies outcomes independently from exposures. c. A third variable which has a meaningful impact on the causal pathway between the exposure and the outcome d. A third variable which lies in the causal pathway and is a direct result of the exposure and leads to the outcome
A third variable which has a meaningful impact on the causal pathway between the exposure and the outcome
20. A research has three independent samples from coal mines across the US. Each sample was exposed to coal dust to varying degrees due to differences in training and safety requirements by their organizations. To compare the means on the overall exposures between samples. They decide to use... a. ANOVA b. Chronbach's Alpha c. T-test d. Regression
ANOVA
Logistic regression can be used for... a. To estimate adjusted prevalence rates. b. All three c. Explore how well a set of characteristics predicts a categorical outcome. d. To estimate the effect of a treatment
All three
Which of the following commands would I use if I wanted to run a logistic regression in SAS a. PROC PROBIT b. PROC LOGISTIC a. All three would work d. PROC CATMOD
All three would work
25. When using regression modeling to assess for effect modification, what is the most important term in the regression model? a. B3*exposure*covariate b. B1*exposure c. Everything is important d. B2*covariate e. Intercept
B3*exposure*covariate
Which type of graph would be best for demonstrating both median and quartiles? a. Venn Diagram b. Box Plot c. Pie chart d. Histogram e. Line graph
Box Plot
14. How do you determine how well your model explains your dependent variable? a. Calculate your Sum of Squares for the Regression (SSR) b. Calculate you Sum of Squares for the Error (SSE) c. Look at your Standard Error of the Regression d. Look at your p-value e. Look at your R-squared value
Calculate your Sum of Squares for the Regression (SSR)
33. Choose which case the sentence about moderation describes: The difference between the group means for the categorical independent variable differ depending on the group membership on the moderator variable. a. Continuous IV and Continuous Moderator b. Continuous IV and Categorical Moderator c. Categorical IV and Categorical Moderator d. Categorical IV and Continuous Moderator
Categorical IV and Categorical Moderator
The mean of a sample is a. computed by summing all the data values and dividing the sum by n-1 b. computed by summing all the data values and dividing the sum by the number of items c. Always equal to the mean of the population d. Always equal to the mean of the population
Computed by summing all the data values and dividing the sum by the number of items
4. Linear regression allows for comparison between what types of variables a. Binary or Dichotomous b. A mixture of all types c. Categorical d. Continuous e. Repeated measures
Continuous
29. Choose which Case the sentence about moderation describes: The slope of the relationship between the independent and dependent variable differs across the groups represented by the categorical moderator variable. a. Continuous IV and Continuous Moderator b. Continuous IV and Categorical Moderator c. Categorical IV and Categorical Moderator d. Categorical IV and Continuous Moderator
Continuous IV and Categorical Moderator
31. Choose which case the sentence about moderation describes: The slope of the relationship between the independent and dependent variable varies (i.e. increases or decreases) according to the level of the moderator variable. a. Continuous IV and Continuous Moderator b. Continuous IV and Categorical Moderator c. Categorical IV and Categorical Moderator d. Categorical IV and Continuous Moderator
Continuous IV and Continuous Moderator
17. An ANOVA will tell you that at least two groups are different from each other and which groups are different a. True b. False
False
6. All are true about multicollinearity except? a. A condition in which at least 2 independent variables are highly linearly correlated with the dependent variable. b. Multicollinearity is a problem because you are trying to explain the outcome with 2 variables that are very similar c. Generally, an independent variable that has more than a 0.2 with any other independent variable considered collinear and should not be included in the same model. d. A way to test for multicollinearity would be to calculate correlation coefficients for all pairs of predictor variables.
Generally, an independent variable that has more than a 0.2 with any other independent variable considered collinear and should not be included in the same model.
13. How do you decide if a factor is a confounder in multivariate regression? a. If it changes the relationship between the exposure and the outcome by 10% or more b. By adding it into the model c. By understanding the biological pathway for the relationship between the exposure and the outcome d. If the confounder is statistically significantly related to both the outcome and the exposure.
If it changes the relationship between the exposure and the outcome by 10% or more
10. In simple linear regression, the p-value for the overall model is testing what? a. If the slope of the line is different from 0 b. If the y-intercept, beta 0, is meaningfully different from 0 c. Evaluation of the 95% confidence interval around the line is significantly different than random error d. How much variation in Y is explained by the model. e. If there is an improved statistical ability to know what the value of Y is.
If the slope of the line is different from 0
If B1 > 0 for the relationship between diabetes mellitus and current low back pain how do you interpret that metric? a. Increase in the log odds of having current low back pain if you have diabetes mellitus b. There is a statistically significant relationship between current low back pain and a diagnosis of mellitus. c. The slope of the per unit relationship is positive d. There is an increase in current low back pain if there is an increase in diabetes mellitus e. The lindear relationship between current low back pain and a diagnosis of diabetes mellitus is meaningully increasing.
Increase in the log odds of having current low back pain if you have diabetes mellitus.
12. What is the best description for a least squares regression? a. It is the process of multiplying each of the error terms by eachother and then simply adding them up to give you a value b. The process of measuring the difference between the horizontal value of each pair of data in an independent linear fashion to provide the equation for a line which best fits the provided data. c. It is the maximization of the linear relationship between two independent variables d. It is the minimized squared difference between all observed and predicted values of the dependent variable to identify the best fitting line.
It is the minimized squared difference between all observed and predicted values of the dependent variable to identify the best fitting line.
5. Which is true about the coefficient of determination? a. It is calculated by taking the square root of the average prediction error b. It is the proportion of variance in the outcome explained by the model c. When this value is closer to 1, the model fit is poor d. The nonparametric version is the Spearman's rank-order correlation
It is the proportion of variance in the outcome explained by the model
18. The non parametric test that is analogous to the ONE-Way ANOVA is: a. Kruskal-Wallis test b. Wilcoxon Signed-Rank test c. None of the above d. Friedman test e. Mann-Whitney U test
Kruskal-Wallis test
According to traditional tests of mediation, which of the following conditions must be met for mediation? i. the independent variable significantly influences the dependent variable ii. The independent variable significantly influences the mediator iii. A previously significant relation between the independent variable and dependent variable is no longer significant a. All conditions must be met b. i. and ii. c. i. and iii. d. ii and iii. ??
MAYBE: All conditions must be met.
26. Effects modification is also referred to as... a. Augmentation b. Moderation c. Suppression d. Abstinence
Moderation
3. What is a good indication that assumptions of linear regression are met? a. Observing a random scatter of points on a residuals scatter plot b. Observing a linear pattern of points on a residuals scatter plot c. Observing a linear pattern of points on a residuals scatter plot d. All of these are good indications
Observing a random scatter of points on a residuals scatter plot
24. We want to test whether the mean length of a certain type of court case is more than 80 days by using 20 randomly chosen cases. The only variable in the data set, time, is assumed to be normally distributed. What test is utilized above? a. Student t-test b. Matched t-test c. One sample t-test d. Two sample t-test
One sample t-test
15. Including too many predictor variables in a regression model is an example of what? a. Outliers b. Multicollinearity c. Overfitting d. Residual confounding
Overfitting
8. What is the primary difference between Spearman and Pearson correlation tests? a. Pearson should be used only for 2 continuous variables, while Spearman can be used for ordinal or continuous variables. b. Spearman correlation only provides a measure of the strength of the relationship (p-value), while Spearman correlation provides both strength and direction of the relationship. c. There is no functional difference between the two. d. Pearson provides a more precise measure of correlation but is computationally demanding, while Spearman provides an estimate.
Pearson should be used only for 2 continuous variables, while Spearman can be used for ordinal or continuous variables.
28. Ideally, we want moderators to be: a. Related to both the predictor and the outcome b. Related to neither the predictor nor the outcome c. Related to the outcome but not the predictor d. Related to the predictor but not the outcome
Related to neither the predictor nor the outcome
19. What does a large F-statistic indicate? a. Something is significant among all the variables b. Which samples show significance c. There is no significance d. There is no difference in variance among the samples
Something is significant among all the variables
21. What is the difference between a t-test and an ANOVA a. T-test uses categorical data and ANOVA uses continuous b. T-Test can only compare two samples, and ANOVA can compare two or more samples c. T-Test tests for association and ANOVA tests for change d. They are not different
T-Test can only compare two samples, and ANOVA can compare two or more samples
which of the following is not an analytic procedure used to test for mediation? a. Regression b. T-test c. Sobel test d. Boostrapping approaches
T-test
7. If two variables are highly correlated, what do you know? a. That there are no other variables responsible for the relationship. b. That changes in one variable are accompanied by the predictable changes in the other. c. That high values on one variable lead to high values on the other variable. d. That they always go together.
That changes in one variable are accompanied by the predictable changes in the other.
Which f the following is false about logistic regression? a. The dependent variable is continuous b. Predictor variables can be continuous c. The dependent variable is categorical d. Predictor variables can be categorical
The dependent variable is continuous
Which of the following is false about logistic regression? a. Predictor variables can be categorical b. predictor variables can be continuous c. the dependent variable is categorical d. the dependent variable is continuous
The dependent variable is continuous
What does it mean if the residual path c is not zero? a. There is moderation There are multiple mediating factors There is a single dominant mediator There is no significance
There are multiple mediating factors
16. What does the F-statistics in the table below mean? (f is 1.26) a. There is not a significant difference between the sample means. b. There is a significant difference between sample means. c. There is not a significant difference between population means. d. There is a significant difference between population means
There is not a significant difference between the sample means.
11. How do you interpret the 95% Confidence Interval around beta1 if it is (-0.77, 1.22)? a. There is not a statistically significant relationship. b. This can only be interpreted if combined with the log odds and pseudo R-squared c. The exposure is statistically significantly protective for having the outcome. d. There is a non-linear relationship between the two variables. e. The p-value for this analysis is p<0.05
There is not a statistically significant relationship.
23. Your friend said they are using "between subjects ANOVA/ You as, "do you mean one-factor or one-way analysis of variance? They respond... a. No idea b. Pretty sure between subjects ANOVA is different from those c. I just do what my stats professors says will work. d. They are all the same
They are all the same
32. Effect modification is a threat to what? a. To understanding b. To interpretation c. To validity d. To maximal clinical practice
To interpretation
9. What should you do if you suspect the relationship between your dependent and independent (aka you predictor and outcome) data are not Linear? a. Adjust your R-squared value downward because it explains less variation. b. Generate a new regression equation c. Transform your data d. Plot the residual terms to identify the failure e. Exclude points which are not on the line.
Transform your data
34. This occurs when the effects of being exposed to two independent factors cannot explain the overall compounding effect. a. What is statistical interaction? b. What is interstate commerce? c. What is interdependence? d. What is augmented moderation?
What is interdependence?
22. ANOVA is used when.. a. You want to test for association between two variables b. You want to describe the variables in a model c. You want to test for change over time d. You want to compare variability within and between groups
You want to compare variability within and between groups
We are told that in an experiment, there are five possible random outcomes. Which is true? a. If the outcomes are equally likely then the trials are independent b. If after 20 trials, one outcome has not been observed then the probability that it will occur in the next trial is increased. c. If after 20 trials, one outcome has not been observed then the probability that it will occur in the next trial is increased. d. After 20 trials, each of the five possible outcomes will have occurred 4 times e. If after 20 trials, one outcome has not been observed then the probability that it will occur in the next trial is unchanged.
a. If the outcomes are equally likely then the trials are independent
Step 5 for testing nested models states that the sub model that has no significant different from the full model and has the least number of X variable will be the final model. What does this mean? a. It means Im lookin for the simplest model that explains the most variance possible. b. It means the full model is probably the best model because it is complex c. It means that it's possible to discover several models that are slightly better, but we'll always stick with the full model.
a. It means Im lookin for the simplest model that explains the most variance possible.
Which of the following is a non-parametric estimate of the survival function? a. Kaplan Meier b. Spearman Correlation c. Mann-Whitney U d. Wilcoxon Rank Sum
a. Kaplan Meier
What type of censoring would be used when an individual takes a COVID-19 anitbody test, which tells them they have had the disease but not when. a. Left Censoring b. Right Censoring c. Interval Censoring d. Type 1 Censoring
a. Left Censoring
In structural equation modeling, a form of multivariate linear regression, fit statistics are compared between nested variables to see which model represents the data best. For multiple logistic regression we use the... a. Likelihood Ratio Test b. Chi-square Goodness of Fit Test c. Turkey Test d. RMSEA
a. Likelihood Ratio Test
Which of the following statements is true about Nested Models? a. Nested models look at the association between outcome, exposure, and other variables that may influence the relationship b. There will be only 1 full model and only 1 reduced model c. Nested models only look at the relationship between the outcome variable and the exposure variable. d. The model with the most variables is the best model
a. Nested models look at the association between outcome, exposure, and other variables that may influence the relationship
What is Survival Analysis? a. Statistical methods for analyzing longitudinal (time-to-event) data on the occurrence of events b. A modern way to analyze dichotomous data to figure out odds of an event c. A computationally intensive method to calculate risk d. The preferred statistical method to identify true risk factors
a. Statistical methods for analyzing longitudinal (time-to-event) data on the occurrence of events
Why do we use a logit transformation for logistic regression? a. to convert the relationship to a linear relationship. b. To calculate the odds ratio for the relationship between dependent and independent variables. c. To conver the log odds to a simple odds ratio for analytical response. d. To transform data to make it fit the assumption of normality.
a. To convert the relationship to a linear relationship.
Multiple Logistic regression means you have a. more than one independent variable predicting a dichotomous outcome b. more than one dependent variable predicted by one independent variable c. performed more than on logistic regression d. probably spent too much time learning stats
a. more than one independent variable predicting a dichotomous outcome
What is the point of multivariate analysis? a. To be able to use complex mathematical modeling to improve the pseudo R-squared value b. A technique that takes into account a number of variables simultaneously c. To change the ability to interpret raw data. d. An interative process to calculate a best estimate.
b. A technique that takes into account a number of variables simultaneously
There are two types of factor analysis. Which would you most likely use with an existing, previously validated measure? a. Varimax Rotation b. Confirmatory c. Principle Axis Factoring d. Exploratory
b. Confrimatory
A researcher is evaluating the relationships between mental health and opioid use. They measure mental health using a series of 35 questions assessing symptoms of depression, anxiety, social support, job satisfaction, sleep, and family history of mental illness. They want to see if these 35 questions are somehow representative of more common underlying symptom constellations which cannot be directly observed as part of this study. What analytical method would be best to identify these underlying groups. a. Chronbach's Alpha b. Factor Analysis c. Kappa d. Tost Test e. Correlation
b. Factor Analysis
What is multicollinearity? a. Multicollinearity is an uncommon statistical issue. b. Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related. c. Multicollinearity is when stuff is too related and you might need to correlate the errors.
b. Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.
What is rater reliability? a. The validity of the raters b. The consistency between raters c. The legitimacy of the raters d. The inconsistency between raters
b. The consistency between raters
Nested models can be used for which of the following analyses? a. Multiple Logistic Regression b. They can be used for all c. Likelihood ratio test d. Multiple linear regression
b. They can be used for all
What are nested models? a. Two or more separate models with nearly the same variables, except different dependent variables b. Two or more separate models with nearly the same variables, except for one variable. c. Two or more separate models with very few similar variables d. Models that prefer birds
b. Two or more separate models with nearly the same variables, except for one variable.
Which of the following is not a reason we use survival analysis? a. Increased power b. Uses less information c. Look at common outcomes d. These are all reasons why we use survival analysis
b. Uses less information
Factor analysis is... a. all about explaining variance between measures b. a statistical data reduction technique c. my idea of a good time d. an exploratory approach
b. a statistical data reduction technique
You are taking a multiple choice test for which you have mastered 80% of the material. Assume this means that you have a 0.8 chance of knowing the answer to a random test question, and that if you dont know the answer to a question, you randomly guess among the four answer choices. Each question is independent of the others and there are no multi-part questions. What is your expect score? a. 80% b. 75% c. 85% d. 83.3% e. 77.8%
c. 85% - P(random question is correct) = 0.8+(0.25)(0.2) Because each question is independent you would also have a score of 85%
SSN number is an example of which type of data? a. Ordinal b. Binary/dichotomous c. Nominal d. Discrete e. Continuous
c. Nominal
Which type of censoring is the most common? a. Interval censoring b. Type 1 censoring c. Right censoring d. Left censoring
c. Right censoring
If you made it to the last quiz, but others did not, what sort of analysis would be appropriate to understand what may have happened to those no longer in the class? a. Regression analysis b. Multivariate analysis c. Survival analysis d. Statistical analysis
c. Survival analysis
A researcher is developing a new test for use in measuring exposure to psychosocial stressors at work. In an effort to test the reliability of the test, the research gave the test to the same sample of people twice. What form of reliability are they assessing? a. Factor analysis b. Parallel-forms c. Test-retest d. Internal consistency
c. Test-retest
According to Dr. Allen, building a statistical model should always be based upon clinical and theoretical knowledge, expectations, and experience. Why is he so passionate about this? a. The purely clinical approach ignores the numbers b. Theory should be the starting point of ALL research questions and eventual research. c. The purely statistical approach is too easily susceptible to problems inherent to the sample collected d. He's obsessed with stats.
c. The purely statistical approach is too easily susceptible to problems inherent to the sample collected
What is the purpose of nested models? a. To compare the means of the models b. To determine what kind of data is needed for the model c. To choose the best model for the data d. To determine the significance of the models.
c. To choose the best model for the data
Why do we use a correlation table in step 1 for nested models? a. To identify the research question b. To determine which analyses to use c. To identify possible confounding relationships among the variables d. To see what kind of data is in the model
c. To identify possible confounding relationships among the variables
You are comparing two separate models with nearly the same models, except for 1 model (a nested model comparison). You calculate the LR Stat using the difference of the -2log likelihood between the two models. That statistic gives you a p-value of 0.389. Which model do you use for your next step? a. Use the larger model b. It doesnt matter because there is a significant difference between the two. c. Use the smaller model d. It doesnt matter because there isn't a significant difference between the two.
c. Use the smaller model
Which of the following is NOT a measure of rater reliability? a. Kappa b. Chronbach's Alpha c. ICC d. All of them are measures of rater reliability
d. All of them are measures of rater reliability
What type of outcome can you use for survival analysis? a. Continuous b. Continuous or discrete c. Any type d. Binary or dichotomous e. any type of categorical data
d. Binary or dichotomous
Which of the following is not a key component in censoring? a. Clear timeline on when study starts and ends b. Clear definition of event c. Clear understanding if the event able to happen more than once d. Clear understanding of why individuals are dropping out
d. Clear understanding of why individuals are dropping out
Which of these statements about B is correct? a. If B=-1.7 then there is no linear relationship between the log odds and x. b. If B=-1.7 then there is an expected increase in the log odds of y for every 1-unit increase of x. c. If B=1.7 then there is an expected decrease in the log odds of y for every 1-unit increase of x. d. If B=1.7 then there is an expected increase in the log odds of y for every 1-unit increase of x.
d. If B=1.7 then there is an expected increase in the log odds of y for every 1-unit increase of x.
What type of graph is best for showing the relative contributions of a categorical variable? a. Histogram b. Venn Diagram c. Box Plot d. Pie Chart e. Scatter plot
d. Pie Chart
2. What is the one solution that will address all three practical issues of 1) multicollinearity, 2) inadequacy of expected frequencies and power or 3) low ratio or cases to variables a. Use the Likelihood Ratio Test b. Delete predictors c. Maximize the estimated log odds d. Calculate the estimates before using the logit transformation e. Delete observations
delete predictors
From a logistic regression perspective, rather than explaining variance, what is logistic regression trying to predict? a. probability of a dichotomous outcome b. the beta weight of the predictor on the outcome c. variance of the dichotomous outcome explained by the predictor variables d. whatever you tell it to
probability of a dichotomous outcome