Business Analytics Ch. 5

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

The R2 value ranges between _____ in the regression model.

0 and 1

Arrange the steps to detect multicollinearity for categorical variables in the correct order of occurrence in the context of a regression model. (Place the first step at the top.)

1. Add a new dummy-coded categorical variable. 2. Analyze the estimated regression model with the dummy variables included and with the dummy variables removed. 3. Assume there is multicollinearity if a major change in the estimated regression coefficients or in the p value for the independent variables exists.

In the context of the model independent variable selection, arrange the steps involved in the feature selection process in the correct order of occurrence. (Place the first step at the top.)

1. Consider all features in a dataset. 2. Select a subset of features in a dataset 3. Apply additional selection criteria 4. Refine the subset of features in a dataset for improved performance

Arrange the steps involved in using the metric Root Mean Squared Error (RMSE) in the correct order of occurrence in the context of predictive regression performance. (Place the first step at the top.)

1. Squaring the residuals 2. Averaging the residuals 3. Taking the square roof of the result

Match the R2 values (in the left column) with their implications (in the right column) in the context of the regression model.

Closer to 1 - The regression model is considered a good predictor of the dependent variable. Closer to 0 - The regression model is not a good predictor of the dependent variable.

Match the types of the common approach to regression analysis (in the left column) with their features (in the right column).

Descriptive modeling It involves fitting a regression model to find the relationship between dependent and independent variables. Explanatory modeling It focuses on finding causal inferences based on theory-driven hypotheses.

Match the types of regression models (in the left column) with their performance measures (in the right column).

Descriptive or explanatory Coefficients, goodness of fit, significance, and overall model fit (R2) Predictive Validation dataset metrics such as Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE)

Match the types of regression models (in the left column) with their descriptions (in the right column).

Descriptive or explanatory It is used to imply causation and association between dependent and independent variables. Predictive It is used to anticipate a new observation.

Match the types of regression models (in the left column) with their features based on datasets (in the right column).

Descriptive or explanatory The model is built using the entire dataset. Predictive The model is built by dividing the dataset into training and validation datasets.

Match the types of regression models (in the left column) with their focus (in the right column).

Descriptive or explanatory To interpret the coefficients and the strength of the relationships between dependent and independent variables Predictive To anticipate new data records

_____ involves creating a dichotomous value from a categorical value in the context of modeling categorical variables.

Dummy coding

True or false: High multicollinearity makes the independent variable coefficient estimates (weights) in a regression model stable.

False

Identify the true statements about overfitting in the context of the model independent variable selection. (Check all that apply.)

It happens when there is a notable difference between accuracy using training data and accuracy using validation data. It is avoided by building a simple (or parsimonious) regression model.

Which of the following are true statements about hold-out model validation? (Check all that apply.)

It has an optional test sample that can be used to verify the model on a third dataset. It involves randomly selecting about two-thirds of the data to build the regression model.

Which of the following are true statements about N-fold cross model validation? (Check all that apply.)

It involves dividing a dataset into mutually exclusive samples. It involves building a model using a training dataset and validating using a single selected validation set.

Identify the true statements about the test data in the context of the regression model. (Check all that apply.)

It is not used to decide the model to be recommended or to improve an algorithm. It is also known as holdout data.

Identify a true statement about N-fold cross model validation in the context of the regression model.

It is popularly used in advanced analytics techniques.

Identify the true statements about the Ordinary least squares (OLS) method in the context of simple linear regression. (Check all that apply.)

It minimizes the differences between the observed and predicted values of a dependent variable. It is effective in determining the best fit for the set of data.

Identify a true statement about hold-out model validation in the regression model.

It uses a validation sample to evaluate the predictive performance of a regression model.

Identify the true statements about Root Mean Squared Error (RMSE) in the context of predictive regression performance. (Check all that apply.)

Lower RMSE values signify shorter distances from the actual data point to the regression line. It uses the same units as a dependent (target) variable.

Match the metrics that are used to understand the amount of error that exists between different models in linear regression (in the left column) with their descriptions (in the right column).

Mean Absolute Error (MAE) It measures the absolute difference between the predicted and actual values of the model. Mean Absolute Percentage Error (MAPE) It is the percentage absolute difference a prediction is, on average, from the actual target. Root Mean Squared Error (RMSE) It indicates how different the residuals are from zero.

___ ___ is used to determine whether two or more independent variables are good predictors of the single dependent variable.

Multiple regression

In simple linear regression, the most common procedure to estimate the regression line is adopting the _____.

Ordinary least squares method

___ happens when sample characteristics are included in the regression model that cannot be generalized to new data in the context of the model independent variable selection.

Overfitting

___ ___ captures the strength of a relationship between a single numerical dependent or target variable, and one or more (numerical or categorical) predictor variables.

Regression modeling

___ linear regression is used when the focus is limited to a single, numeric dependent variable and a single independent variable.

Simple

Regression modeling can predict customer purchase spending based on the email promotion and income level. In this context, match the types of variables (in the left column) with their descriptions (in the right column). Instructions

The dependent variable (Y) It is the variable that is being predicted. The independent variables (X) It is the variable that is used to make the prediction.

Match the types of regression models (in the left column) with their features (in the right column).

The explanatory model - It aims at identifying the regression line that has the best fit to learn about the relationships in a dataset. The predictive model It aims at anticipating new individual records that the model has never seen before.

What is the goal of the regression line in the context of linear regression?

To minimize the distances between actual points and the regression line

Match the types of datasets (in the left column) with their descriptions (in the right column).

Training data It is the data that is used to build a regression model. Validation data It is the data that is used to assess the developed regression model. Test data It gives a final estimate of the performance of the regression model.

True or false: The feature selection process is often repetitive when looking for the best combination of variables in the context of the model independent variable selection.

True

A relationship between the independent and dependent variables is represented by a straight line that best fits the data in ___ ___.

linear regression

In the context of the model independent variable selection, ___ is a situation where the predictor variables are highly correlated with each other.

multicollinearity

A marketing analyst seeks to determine whether several independent variables can improve the prediction of the single dependent variable sales. In this situation, the analyst most likely uses _____.

multiple linear regression

Dummy coding makes categorical variables dichotomous using only _____.

ones and zeroes

In the context of the metrics used to understand the amount of error that exists between different models in linear regression, the difference between the observed and predicted value of the dependent variable is represented by _____.

residuals

In the regression model, when detecting multicollinearity for numerical variables, one should _____.

run a correlation matrix of the constructs

In analyzing the effect of the size of a commercial space in square feet (x) on its sale price (y), a marketing analyst proposes sales price as the target variable and the space's square feet as the predictor variable. This scenario most likely exemplifies the use of _____.

simple linear regression


संबंधित स्टडी सेट्स

PL/SQL Chapter 1: Introduction to PL/SQL

View Set

Ch.22 Respiratory System (Outline part 2)

View Set

Ch 21 Respiratory Care Modalities

View Set

CITI Program Training- Biomedical Responsible Conduct of Research

View Set

Econ 201: Exam 3 Practice Questions

View Set

Chapter 10 - check your understanding

View Set

Minnesota Pre Licensing Exam Prep

View Set