Quiz 5,6,7

Ace your homework & exams now with Quizwiz!

Classification involves predicting the value of a continuous variable. T/F

False

Decision trees can only be used for classification problems. T/F

False

In clustering, the goal is to maximize the distance within clusters and minimize the distance between clusters. T/F

False

K-Means Clustering differs from Hierarchical Clustering in that the Hierarchical technique requires the number of clusters to be specified a priori, while K-Means involves a visual inspection of a dendogram to determine the number of distinct clusters that exist in the data. T/F

False

The primary difference between supervised and unsupervised learning is that there is no response variable to predict or estimate in an unsupervised learning setting. T/F

True

The validation set approach is a simple way to estimate the test error associated with fitting a particular statistical learning method on a set of observations. The process of selecting observations for the training and test/validation sets MUST be random. T/F

True

Though its performance is often inferior to many contemporary machine learning algorithms, a key benefit of logistic regression is that it is easier to explain than output from more sophisticated models. T/F

True

Which of the following is true of unsupervised learning (select all that apply)? A) Often performed as part of an exploratory data analysis (EDA) B) Often much more challenging than supervised learning C) There is no way to check our work because we don't know the true answer — the problem is unsupervised D) Tends to be more subjective, and there is no simple goal for the analysis, such as prediction of a response E) All of the above are true of unsupervised learning.

E) All of the above are true of unsupervised learning.

Which of the following are scenarios where classification techniques should be applied (select all that apply)? A) Predicting which customers are likely to respond to a promotion B) Forecasting profit margins for next quarter C) Calculating the probability of each employee being promoted within the next year D) Estimating sales for a new product line E) All of the above warrant classification techniques.

A and C

A confusion matrix is a classification tool used to evaluate which of the following (select all that apply)? ' A) True Positive Rates (Sensitivity) B) True Negative Rates (Specificity) C) False Positive Rates D) False Negative Rates E) Overall Accuracy Rates F) A and B Only

A<B<C<D<E

Which of the following are examples of an interaction effect (select all that apply)? A) Travel frequency has a weak effect on employees' attrition risk, except when employees work overtime. B) Medication A has a positive effect on stress but when combined with Medication B, the effect can be lethal. C) The degree to which promotions increase sales depends on the strength of the economy; in that, when the economy is stronger, the effect of promotions on sales is augmented. D) GPA over the last two years of undergraduate studies as well as grit are two significant predictors of success for graduate school. E) A great boss can attenuate the negative effects of increased work demands on employee burnout.

A<B<C<E

Which of the following are true statements regarding k-fold cross-validation (select all that apply)? A) k-fold CV involves randomly separating observations into groups (or folds). B) k represents the number of groups into which the observations are partitioned. C) folds are used for model training; the remaining fold is used to evaluate model performance. D) is generally set to the number of observations in the dataset. E) k-fold CV is a more rigorous method of evaluating model performance as compared with the validation set approach.

A<B<C<E

Which of the following are business applications for which unsupervised learning techniques would be appropriate (select all that apply)? A) Netflix recommendations: determining which movies to recommend based on customers' viewing history and movies that others with similar viewing history have watched B) Customer churn: calculating the probability of each customer churning. C) Amazon recommendations: recommending products based on purchasing habits of others with similar purchases D) Market segmentation: identifying subgroups of people who might be more receptive to a particular form of advertising, or more likely to purchase a particular product E) Search engine optimization: choosing what search results to display to a particular individual based on the click histories of other individuals with similar search patterns

A<C<D<E

Consider the following regression equation: yi =β0 +β1xi +β2x2i +β3x3i +εi, where εi is the error term. Which of the following statements are true concerning this function (select all that apply)? A) This is a quadratic regression equation. B) This function would likely be appropriate to model the relationship between two variables when there is a parabolic shape about the data. C) It would be a wise modeling move to add additional predictors with x raised to the 4th and 5th powers, as we could achieve a more flexible fit. D) A and B only E) All of the above are true statements.

B) This function would likely be appropriate to model the relationship between two variables when there is a parabolic shape about the data.

In a classification problem, the ability to explain how the model arrived at its prediction is of less importance relative to a regression problem. A) True B) False C) The business objective drives this decision in both classification and regression problems.

C) The Business Objective drives this decision in both classification and regression problems.

Which of the following are non-linear transformations that can be applied to predictors when there are non-linear associations in the data (select all that apply)? A) Higher-degree polynomial functions B) Square root transformation C) Logarithmic transformation D) All of the above are simple ways to transform predictors to deal with non-linear associations.

D) All of the above are simple ways to transform predictors to deal with non-linear associations.

PCA does not require predictors to be scaled. Predictors with significantly greater variation (e.g., sales ranging from $10 to $1,000,000 compared with day of the week ranging from 1 to 7) will not influence the principal components that emerge from this analysis. T/F

False

The coefficients in logistic and linear regression output are interpreted the same way. T/F

False

When training a model, the goal is to achieve the highest accuracy possible on the training set. T/F

False

A biplot is a tool for PCA that illustrates principal component loadings for each variable in a data set (i.e., which variables account for the most variation in the data). T/F

True

A decision tree is a simple procedure that identifies variables that provide optimal separation of classes (node purity) by splitting the data on their values. T/F

True

A scree plot is a tool for PCA that aids in deciding how many principal components to use (i.e., choosing the smallest number of principal components that are required in order to explain a sizable amount of the variation in the data). T/F

True

Clustering is a broad class of methods for discovering unknown subgroups in data. T/F

True

Cross-validation techniques help achieve an optimal bias-variance tradeoff. T/F

True

K-Nearest Neighbors is a simple procedure that predicts the class of an observation by assigning the majority class for a set of observations with the most similar characteristics (i.e., those with the closest predictor values). T/F

True

One of the primary advantages of using decision trees is that it is very easy to explain how the model arrived at its prediction for a given observation.

True

Polynomial regression extends the linear model by adding additional predictors, which are obtained by raising each of the original predictors to a power. In this way, models are more flexible and can fit curvilinear relationships. T/F

True

Principal Components Analysis (PCA) is a technique for determining which set of predictors captures the most information in the data, without needing to examine relationships between the predictors and a response variable. T/F

True

Random forests provide a lift in predictive power over decision trees. Random forests involve building many decision trees (a forest of trees) on different random subsets of both observations and predictors. The performance is then averaged over all the trees to protect against overfitting. T/F

True

The Gini index is referred to as a measure of node purity. This is used to evaluate variable importance in classification problems. A small value indicates that a node contains predominantly observations from a single class; therefore, a predictor that results in a relatively large Gini decrease indicates that when the data are split on the respective predictor, there is considerable class separation. T/F

True


Related study sets

Preterm Labor and Premature Rupture of Membranes

View Set

NurseLogic 2.0 Nursing Concepts Beginning Test

View Set

Peds Exam 3- Intracranial Regulation

View Set

Principles of Economics Chapter 35

View Set

Endocrine - QUESTIONS - CH. 47, 49

View Set

Week 6 FN Sherpath EAQ Rationales - Older Adults

View Set

CPA #6 - Critical Thinking & Argumentation

View Set

vocabulary words and definitions + idioms- Unit 4

View Set

Professional Communications Exam 2

View Set