Stats Ch 16
The idea that patterns of extreme scores will balance out if sampling continues indefinitely or trends are looked at over the long run is known as: a. attrition. b. regression to the mean. c. history of the mean. d. correlation.
b. regression to the mean.
Multiple regression differs from simple linear regression because it: a. repeats a linear regression several times, which can improve the results by averaging. b. uses more than one independent variable to make predictions. c. uses higher order polynomials to make predictions. d. employs the mathematical framework of calculus.
b. uses more than one independent variable to make predictions.
Penny is figuring the regression line for some data but needs help in first figuring the predicted value of Y. She knows that the slope is 3 and the intercept is 4. What is the predicted Y value for an X score of 7? a. 17 b. 19 c. 25 d. 31
c. 25
If a person's score on a(n) _____ variable is known, the person's score on the _____ variable can be predicted using simple linear regression. a. dependent; independent b. independent; dependent c. scale; nominal d. nominal; z score
NOT c. scale; nominal
According to the text, a good statistician examines the data points before proceeding and questions causality after the statistical analysis. What would that statistician be doing during each of these two phases of a regression analysis? a. Before the test, examine for linearity, and after the test, consider confounding variables that might help to understand cause. b. Before the test, examine for errors in the data, and after the test, conjecture about possible causes using the ABC model. c. Before the test, check for outliers, and after the test, run an experiment to reveal causal relations. d. Before the test, create appealing visual displays of data, and after the test, create theories about causation.
a. Before the test, examine for linearity, and after the test, consider confounding variables that might help to understand cause.
Good physical fitness has been correlated with many medical benefits, particularly in relation to blood pressure. A regression equation predicting an individual's blood pressure based on physical fitness level results in a negatively sloped line of best fit. What does this statement mean? a. People who are physically fit are predicted to have low blood pressure, while people who have low fitness levels are predicted to have high blood pressure. b. People who have low fitness levels have low blood pressure, while people who are highly active probably have high blood pressure. c. There is no correlation between blood pressure and physical fitness. d. Low fitness levels cause high blood pressure; high fitness levels cause low blood pressure.
a. People who are physically fit are predicted to have low blood pressure, while people who have low fitness levels are predicted to have high blood pressure.
Correlation involves relating variables, whereas regression involves prediction. a. True b. False
a. True
It is impossible for the regression line to do a poorer job than the mean of predicting the dependent variable. a. True b. False
a. True
It is inappropriate to use structural equation modeling if there is not an a priori theoretical idea regarding the pattern of relations among the variables being measured. a. True b. False
a. True
Like correlation, regression cannot prove causal direction of a relation between two variables. a. True b. False
a. True
The intercept is the predicted value for Y when X is equal to 0. a. True b. False
a. True
The predicted z score on the dependent variable will always be closer to its mean than the z score for the independent variable. a. True b. False
a. True
The simple linear regression equation uses the following formula: Ŷ = a + b( X). a. True b. False
a. True
The slope is the amount that Y is predicted to increase for an increase of 1 in X. a. True b. False
a. True
The standard error of estimate is a measure of how accurately we predict using the regression equation or line of best fit. a. True b. False
a. True
Dr. Garoule is trying to determine which of his patients has the highest likelihood of depression. He calculates a linear regression equation with the scores on an anxiety measure, which are positively correlated with scores on a scale measuring depression. Dr. Garoule converts patient D's anxiety score to a z score and predicts the z score for the depression scale to be -0.35. Is patient D's raw score for depression above or below the mean and why? a. below the mean; the z score is negative b. above the mean; the z score is negative c. below the mean; the z score is positive d. above the mean; the z score is positive
a. below the mean; the z score is negative
Structural equation modeling graphs depict a _____ among several variables, demonstrating how all of the variables combine to create a _____. a. network of relations; statistical model b. causal chain; correlation c. box plot; theory d. scatterplot; theoretical model
a. network of relations; statistical model
If the points on a scatterplot are all close to the regression line: a. the standard error of the estimate is small. b. r is a positive number. c. r is close to 0. d. the standard error of the estimate is large.
a. the standard error of the estimate is small.
Correlation coefficient and proportionate reduction in error are inversely related; that is, as the correlation coefficient increases, proportion reduction in error decreases. a. True b. False
b. False
In hierarchical regression, a computer determines the order in which variables are included in the equation. a. True b. False
b. False
Regression capitalizes on correlation by using what is known about the relation between two variables to make predictions beyond those variables. a. True b. False
b. False
Regression is typically used when analyzing the results of an experiment. a. True b. False
b. False
Simple linear regression is a statistical technique that includes two or more predictor variables in a prediction equation. a. True b. False
b. False
The intercept is the amount that Y is predicted to increase for an increase of 1 in X. a. True b. False
b. False
The intercept is the predicted value for X when Y is equal to 0. a. True b. False
b. False
The intercept is the predicted value for Y when X is equal to 1. a. True b. False
b. False
The predicted z score on the dependent variable will always be closer to the z score for the independent variable than its mean. a. True b. False
b. False
The simple linear regression equation uses the following formula: Ŷ = a( X) + b. a. True b. False
b. False
The slope is the amount that X is predicted to increase for an increase of 1 in Y. a. True b. False
b. False
The slope is the predicted value for Y when X is equal to 0. a. True b. False
b. False
When conducting statistical analyses using multiple regression, there is only one choice for how to conduct that analysis. a. True b. False
b. False
Desmond thinks his new tutoring methods are highly effective compared to commercially available methods. He selects the worst students in his statistics class and tries his new tutoring strategy. Which statement describes a threat to the validity of his hypothesis even if the students do very well after the tutoring sessions? a. Instrumentation errors will skew the results. b. Regression to the mean is likely to occur. c. Confirmation bias is likely to occur. d. The testing sequence is a confounding factor.
b. Regression to the mean is likely to occur.
If shyness is negatively correlated with the number of friendships a person has, which statement regarding the line of best fit, or regression line, would be true? a. The line will start in the lower left corner of the graph and end in the upper right corner. b. The line will start in upper left corner of the graph and end in the lower right corner. c. Because the correlation is negative, a regression line cannot be drawn. d. The y intercept will be negative.
b. The line will start in upper left corner of the graph and end in the lower right corner.
What is the key difference between stepwise and hierarchical multiple regression? a. Theory helps to determine the order that variables are entered in stepwise but not in hierarchical multiple regression. b. With stepwise multiple regression a computer determines the order in which independent variables are included in an equation, whereas in hierarchical multiple regression the researcher uses theory to determine the order of inclusion. c. Hierarchical multiple regression serves as an exploratory analysis, without pre-existing beliefs about variables. d. Stepwise multiple regression can handle more variables than a hierarchical analysis.
b. With stepwise multiple regression a computer determines the order in which independent variables are included in an equation, whereas in hierarchical multiple regression the researcher uses theory to determine the order of inclusion.
Body weight can be predicted based on the amount of calories consumed by an individual due to the positive correlation between the two variables. When looking at the line of best fit for the linear regression the data points are clustered close together. Predicted shyness based on the number of friendships a person has is also correlated, but the data points are more scattered around the line of best fit, showing a general negative correlation. Which has the higher predictive power and why? a. calories consumed and body weight, because it is a positive correlation b. calories consumed and body weight, because the variance is lower c. shyness and number of friendships, because it is a negative correlation d. shyness and number of friendships, because the variance is lower
b. calories consumed and body weight, because the variance is lower
In the social sciences, there are numerous variables that can be discussed and considered as important phenomena, but they cannot be observed directly. These are called _____ variables, and _____ variables, which can be observed and measured, are used to assess the intangible variables. a. intangible; tangible b. latent; manifest c. tangible; intangible d. manifest; latent
b. latent; manifest
Regression is a type of statistical analysis that is most useful for: a. calculating z scores. b. predicting behavior. c. determining standard deviations. d. finding the direction and strength of a relation between two variables.
b. predicting behavior.
Dr. Kim thinks his regression equation is very accurate, but he wonders if perhaps the mean is just as good at predicting scores as the regression equation. He knows that the correlation coefficient ( r) for the two variables is quite high, -0.82. Should Dr. Kim use the mean or the regression equation to predict scores? a. regression equation; the proportionate reduction in error (r2) is low b. regression equation; the proportionate reduction in error (r2) is high c. mean; the proportionate reduction in error (r2) is low d. mean; the proportionate reduction in error (r2) is high
b. regression equation; the proportionate reduction in error (r2) is high
A multiple regression analysis revealed the following equation relating the time (in hours) it takes to complete a puzzle based on the number and size of pieces: Ŷ = 1.5 + 0.014 ( X number of pieces) - 1.2 ( Y size of pieces). If a puzzle has 500 pieces, with a size value of 0.5 inches, how long will it take to complete? a. 6.3 hours b. 7.1 hours c. 7.9 hours d. 9.3 hours
c. 7.9 hours
_____ is a useful statistical analysis for predicting behavior, and _____ is a useful technique for finding the direction and strength of a relation between two variables. a. Psychometrics; correlation b. Correlation; regression c. Regression; correlation d. Psychometrics; regression
c. Regression; correlation
The standardized regression coefficient, which is equal to the Pearson correlation coefficient in a simply linear regression, is also called: a. alpha. b. standardized deviation prediction. c. beta weight. d. slope.
c. beta weight.
The statistic that describes the variability of a set of data points to the line of best fit in a linear regression is the standard: a. deviation. b. deviation of the estimate. c. error of the estimate. d. error.
c. error of the estimate.
Predicting an individual's IQ score from two variables, for example, socioeconomic status and education level, would involve the use of: a. bivariate regression. b. simple linear regression. c. multiple regression. d. nonlinear correlation.
c. multiple regression.
You want to predict your score on the statistics final exam using your grade point average for the semester. Which statistical technique is best for this type of analysis? a. bar graph b. correlation c. simple linear regression d. standardized z scores
c. simple linear regression
For exploratory analysis when there is no predictive theory in place, _____ is a common way to analyze data using equations with more than one independent variable. a. stepwise simple regression b. hierarchical multiple regression c. stepwise multiple regression d. nonlinear correlation
c. stepwise multiple regression
When a regression equation includes just one independent variable, the value of the standardized regression coefficient is: a. equal to the slope of the regression equation. b. the inverse of the correlation coefficient. c. the same as the Pearson correlation coefficient. d. equal to the y-intercept of the regression equation.
c. the same as the Pearson correlation coefficient.
There is an extremely high negative correlation between altitude and the percentage of oxygen in the air. Is it correct to say that high altitudes cause low amounts of oxygen in the air based on a linear regression equation and the Pearson correlation coefficient? a. Yes, because the regression analysis reveals a strong correlation. b. Yes, because the correlation is negative. c. No, because the correlation is negative. d. No, because regression analysis does not imply causation.
d. No, because regression analysis does not imply causation.
In looking at a graph of data, there seems to be a curved pattern, possibly because of the influence of a third variable. Should simple linear regression be used? a. Yes; the data are linear. b. Yes; the data are nonlinear. c. No; the date are linear. d. No; the data are nonlinear.
d. No; the data are nonlinear.
_____ refers to the accuracy of a prediction based on the regression equation or the amount of error that is eliminated compared to predictions based on the mean of the dependent variable. a. Predictive validity b. Orthogonal regression coefficient c. Reliability d. Proportionate reduction in error
d. Proportionate reduction in error
Assume a positive correlation is found between the number of hours students spend studying for an exam and their grade on the exam. If the regression equation for these data is calculated and the y intercept is 65, what conclusion can be drawn? a. The standard error of the estimate is low. b. The regression line crosses the x-axis at a score of 65. c. The slope of the regression line is 65. d. When students do not study at all, we would predict a score of 65 on the exam.
d. When students do not study at all, we would predict a score of 65 on the exam.
When drawing a line of best fit, it is "best" to use _____ point(s) of _____ values. a. 1; low b. 2; high and medium c. 3; low d. at least 2; low and high
d. at least 2; low and high
In the equation for a line in statistics, the _____ is the predicted amount of increase for Y when X is increased by 1, and the _____ is the predicted value for Y when X crosses the y-axis ( X = 0). a. intercept; slope b. intercept; standard error c. slope; standard error d. slope; intercept
d. slope; intercept
