Chapter 7 - Advanced Regression Analysis
For 0 < β1 < 1, the log-log regression model implies a positive relationship between x and E(y); as x increases, E(y) increases at a slower rate. This may be appropriate in the food expenditure example where we expect food expenditure to react positively to changes in income, with the impact diminishing at higher income levels. If β1 < 0, what is the relationship between x and E(y)? As x increases, E(y) decreases at a faster rate As x increases, E(y) decreases at a slower rate As x increases, E(y) increases at a slower rate Implies a positive and increasing relationship between x and y
As x increases, E(y) decreases at a slower rate
Another commonly used transformation that captures nonlinearities is based on the natural logarithm. Which of the following variables are commonly log-transformed? Select all that apply! Multiple select question. Scores House prices Income Age
House prices Income
Important risk factors for high blood pressure reported by the National Institute of Health include weight and ethnicity. High blood pressure is common in adults who are overweight and are African American .a public policy researcher in Atlanta surveyed 150 adult men about 5′10″ in height and in the 55-60 age group. Data were collected on their systolic pressure, weight (in pounds), and race (Black = 1 for African American, 0 otherwise). The resulting regression equation which includes the interaction between weight and race is: Systolic = 70.8312 + 0.4362Weight + 30.2482Black − 0.1118(Weight × Black). The interaction variable is negative and statistically significant at the 5% level. Interpret what a negative interaction implies in this example: Implies that weight does not interact with race Implies that non black men carry their weight better in terms of the systolic pressure than their black counterparts. Implies that systolic pressure is the same regardless of race Implies that black men carry their weight better in terms of the systolic pressure than their non black counterparts.
Implies that black men carry their weight better in terms of the systolic pressure than their non black counterparts.
What is the term in regression models when a predictor variable has a different partial effect on the outcome depending on the values of another predictor variable? Predictive analytics Interaction effect Least squares analysis Partial effect
Interaction effect
The logistic regression model cannot be estimated with standard ordinary least squares (OLS) procedures. Instead, we rely on which method? No known method identified Logistics least squares Maximum likelihood estimation (MLE) Predicted probabilities
Maximum likelihood estimation (MLE)
When an estimated model begins to describe the quirks of the data rather than the real relationships between variables, this is called: Overfitting Cross-validation Validation Metrics
Overfitting
Which nonlinear regression model is appropriate when the slope, capturing the influence of x on y, changes in magnitude as well as sign? Exponential Regression Model Quadratic regression model Logarithm regression model Log-Log Regression Model
Quadratic regression model
There are numerous applications where the relationship between the predictor variable and the response variable cannot be represented by a straight line and, therefore, must be captured by an appropriate curve. What are some simple transformations of the variables for nonlinear relationships? Select all that apply! Multiple select question. Dummy variables Squares Goodness of fit Natural logarithms
Squares Natural logarithms
The exponential regression model is specified as ln(y) = β0 + β1x + ε. What does β1 × 100 measure? The approximate change in E(y) when x increases by 100 units The approximate percentage change in E(y) when x increases by 100 units The approximate percentage change in E(y) when x increases by one unit The approximate change in E(y) when x increases by one unit
The approximate percentage change in E(y) when x increases by one unit
In regression models, we use both numerical and dummy (categorical) variables as predictor variables. What are the 3 interaction variables discussed in the chapter? Select all that apply! Multiple select question. The interaction of two numerical variables. The interaction of two dummy variables with a numerical variable The interaction of two dummy variables The interaction of a dummy variable with a numerical variable
The interaction of two numerical variables. The interaction of two dummy variables The interaction of a dummy variable with a numerical variable
What is the linear regression model applied to a binary response variable called? The Binary response model The linear probability regression model The response and regress model The logistic regression model
The linear probability regression model
Select all that apply A useful method to interpret the estimated coefficient is to highlight the changing impact of x on p. For instance, given x = 10, we compute the predicted probability as 0.4256. For x = 11, the predicted probability is pˆ=0.4700. Therefore, as x increases by one unit from 10 to 11, the predicted probability changes. Which of the following is true? Select all that apply! The predicted probability increases by 0.0444 if x increases from 20 to 21 The predicted probability changes by 0.0444 but it could increase or decrease The predicted probability increases by 0.0444 The increase in pˆ will not be the same if x increases from 20 to 21
The predicted probability increases by 0.0444 The increase in pˆ will not be the same if x increases from 20 to 21
In a quadratic regression model y = β0 + β1x + β2x2 + ε, the coefficient β2 determines the relationship between x and y. Which of the following is true? Select all that apply! is U-shaped (β2 > 0) or inverted U-shaped (β2 < 0). Multiple select question. The relationship between x and y is U-shaped when (β2 < 0) The relationship between x and y is U-shaped when (β2 > 0) The relationship between x and y is an inverted U-shaped when (β2 > 0) The relationship between x and y is an inverted U-shaped when (β2 < 0)
The relationship between x and y is U-shaped when (β2 > 0) The relationship between x and y is an inverted U-shaped when (β2 < 0)
The accuracy rate is calculated as the number of correct predictions divided by the '_______________' number of observations.
Total
In the holdout method, the sample data set is partitioned into two independent and mutually exclusive data sets—the training set and the validation set. True False
True
Place the steps of the holdout method in the proper order: We partition the sample data into two parts, labeled training set and validation set correct toggle button unavailable We use the training set to estimate competing models. correct toggle button unavailable We use the estimates from the training set to predict the response variable in the validation set correct toggle button unavailable We calculate the accuracy rate for each competing model. The preferred model will have the smallest RMSE (or the largest accuracy rate).
We partition the sample data into two parts, labeled training set and validation set correct toggle button unavailable We use the training set to estimate competing models. correct toggle button unavailable We use the estimates from the training set to predict the response variable in the validation set correct toggle button unavailable We calculate the accuracy rate for each competing model. The preferred model will have the smallest RMSE (or the largest accuracy rate).
It is common to assess the performance of linear probability and logistic regression models on the basis of the '________________________' rates defined as the percentage of correctly classified observations
accuracy
Select all that apply Predictions with the exponential regression model are made by yˆ=exp(b0+b1x+se2/2). Which of the following is true? b0 and b1 are the standard errors of the estimates se2 is the standard error of the estimate b0 and b1 are the coefficient estimates se is the standard error of the estimate
b0 and b1 are the coefficient estimates se is the standard error of the estimate
In the log-log regression model, both the response variable and the predictor variable are transformed into natural logs. We can write this model ln(y)=β0+β1ln(x)+ε. For 0 < β1 < 1, the log-log regression model implies a positive relationship between x and E(y); as x increases, E(y) increases at a '_________________' rate.
slower
At a University of California campus, data were collected on the starting salary of business graduates (Salary in $1,000s) along with their cumulative GPA, whether they have an MIS concentration (MIS = 1 if yes, 0 otherwise), and whether they have a statistics minor (Statistics = 1 if yes, 0 otherwise). Use the estimated equation Salary = 44.0073 + 6.6227GPA + 6.6071MIS + 6.7309Statistics. What is the additional salary a graduate would earn with an MIS degree? $6,7309 $6,6227 Can not determine $6,607
$6,607
Using the estimated equation Salary = 44.0073 + 6.6227GPA + 6.6071MIS + 6.7309Statistics. For a graduate with a GPA of 3.5, compute the predicted salary (in $1,000s) for a business graduate with neither an MIS concentration nor a Statistics minor. $73,918 $73,794 $67,187 $80,525
$67,187
Important risk factors for high blood pressure reported by the National Institute of Health include weight and ethnicity.High blood pressure is common in adults who are overweight and are African American.a public policy researcher in Atlanta surveyed 150 adult men about 5′10″ in height and in the 55-60 age group. Data were collected on their systolic pressure, weight (in pounds), and race (Black = 1 for African American, 0 otherwise). The resulting regression equation is: Systolic = 80.2085 + 0.3901Weight + 6.9082Black. What is the expected Systolic blood pressure for a 170 pound black male? 150 157 153.44 146.52
153.44
An educational researcher is trying to analyze the determinants of the applicant pool for the specialized Master of Science in Accounting (MSA) program.Two important determinants are the marketing expense of the business school and the percentage of the MSA alumni who were employed within three months after graduation.For a given marketing expense of $80,000, predict the number of applications received if Marketing equals 80 and Employed equals 50 using the equation Applicantŝ=−49.5490+0.3550Marketing+1.0149Employed. 50 30 80 49.54
30
An educational researcher is trying to analyze the determinants of the applicant pool for the specialized Master of Science in Accounting (MSA) program. Two important determinants are the marketing expense of the business school and the percentage of the MSA alumni who were employed within three months after graduation. Using the equation Applicantŝ = −49.5490 + 0.3550Marketing + 2.0Employed, answer the following question. If the number employed increased by 30, how many more applicants would there have been? 50 60 Cannot answer without knowing how many total were employed 30
60
In the semi-log regression model not all variables are transformed into logs.The semi-log model that transforms only the response variable is often called: Logarithmic regression model Log-log regression model Exponential regression model Quadratic regression model
Exponential regression model
Note that the greater the k, the lesser will be the reliability of the k-fold method and the greater will be its computational cost. True False
False
In the semi-log regression model not all variables are transformed into logs. A semi-log model that transforms only the predictor variable is often called: Quadratic regression model Exponential regression model Logarithmic regression model Log-log regression model
Logarithmic regression model
Which of the following is true of Cross-validation? Select all that apply! Multiple select question. The k-fold cross-validation method is a cross validation method Sometimes the data are partitioned into an optional third set called a training data set The sample is partitioned into a training set and a validation set to assess how well the estimated model predicts with unseen data The holdout method is a cross validation method
The k-fold cross-validation method is a cross validation method The sample is partitioned into a training set and a validation set to assess how well the estimated model predicts with unseen data The holdout method is a cross validation method
Which of the following describes the k-fold cross-validation method? Multiple select question. The choice of the model will be sensitive to how the data are partitioned. The k-fold method is less sensitive to data partitioning than the holdout method The sample data set is partitioned into two independent and mutually exclusive data sets—the training set and the validation set. The sample data are partitioned into k subsets, where one of the k subsets is used as the validation set
The k-fold method is less sensitive to data partitioning than the holdout method The sample data are partitioned into k subsets, where one of the k subsets is used as the validation set
R2 measures the percentage of sample variations of the response variable explained by the model. Which of the following are true when comparing linear and log-transformed regression models? Select all that apply Multiple select question. Using R2 we can compare the percentage of explained variations of y with that of ln(y). We need to compute the percentage of explained variations of y We cannot compare the percentage of explained variations of y with that of ln(y). The normal R2 will help to explain the percentage variation of y
We cannot compare the percentage of explained variations of y with that of ln(y). We need to compute the percentage of explained variations of y
The natural logarithm converts changes in a variable into '_____________________' changes
percentage