BADM 211: Test 2
Interpreting logistic regression coefficients involves ________ by _________.
multiplying; natural log; multiplicative relationship between predictors (natural log) and odds
Principal components analysis relies on the notion that information is __________ across multiple variables.
normalized/ similar/ highly correlated
Principal components analysis is limited to _________ variables.
numerical
Linear regression is one modeling technique that isn't considered machine learning because _______________.
of the normal equation - we can compute the coefficient values using this - there's no learning from the data, we have the equation where we can plug numbers in
The naive benchmark is the average of ___________ in the _____________ set.
outcome values; training set
If we have two classes, YES and NO, and if the YES class is more important (such as fraud), then sensitivity is the ability of the classifier to accurately identify the _____________.
percentage of the YES members correctly
Which of the following is NOT an example of classification?
prediction of someone's income
The predict proba function in a logistic regression model generates the __________.
probability of class membership (2 classes: class 0 and class 1)
Given the information provided about a response variable (see pix), the naive benchmark would be computed as the _________ error of using _________ as a fixed value for Price.
regression; mean [average (y-bar)] - continuous numeric variable - naive benchmark: mean of y and add up error
For the highest performance in linear regression modeling, the relationship between the predictors and the outcome should be ____________.
linear
In the terms of logistic regression, the logit can be modeled as a _______ function of the _______.
linear; predictors
Mathematically, ridge regression assumes that ______ of the predictors ____________.
- can't take any coefficients to zero (makes it small, but never zero) - will ONLY do regularization (prevents over-fitting of training data) - will NOT do feature selection - none; get removed
Mathilde from Marketing has requested that you determine which features to retain in a model that is designed to explain the relationship between revenue and products. To that end, you will use the lasso for feature selection but then you must remodel because of ______________.
- lasso shrinks the value of coefficients - lasso will zero out the coefficients that it doesn't think are valuable
Principal components analysis is particularly useful when predictors are __________ and __________.
- measured on the same scale and highly correlated
The first step in creating a gain or lift chart is to ___________.
- sort the set of records in descending order (high to low) by propensity - propensity: the probability of class membership (the probability that a record belongs to each of the classes), when the outcome variable is categorical [e.g. the propensity to default]
Mathematically, the lasso assumes that ________ of the predictors ____________.
- takes coefficient to zero, saying the predictor is no longer part of this model - does feature selection (there are some predictors in the model that shouldn't be there and should be selected out) - does regularization (penalty added to the error) --> prevents over-fitting of training data - some; get removed
The logistic response function generates values from _________ to __________.
0 to 1
Victoria from Operations has asked you to build a machine learning model to predict the mean-time-before-failure for an industrial robot. She has provided you with a dataset that contains 20 predictors, but you can choose the number of observations. Given that the model will utilize 10-fold cross-validation, you'll need a minimum of ____________ samples for minimum predictive accuracy.
10 * 20 = 200 200 = 90% * x 200=.9x x=222.222
Interpreting these confusion matrices (see pix), there are ___________ false negatives.
26
In the k-fold cross-validation, the model will be fit _____ times before computing an average error measure.
5
The principle components generated by a PCA model are ___________.
linearly uncorrelated variables
The lasso, also called _________ regularization, adds a penalty component to the RSS equation equivalent to the sum of the coefficients __________.
L1; absolute values
Ridge regression, also called __________ regularization, adds a penalty component to the RSS equation equivalent to the sum of the coefficients _____________.
L2; squared
The conditional probability that event A will occur given that event B has already occurred may be written as _____________.
P(A|B)
Data mining adds _____________ to data visualization and exploratory analyses.
machine learning models
Of regression performance measures, ____________ is signed thus giving an indication of average over- or under- predicting the response variable.
mean error
The three most effective basic plots are ________________.
bar charts, line graphs, scatter plots
Regression modeling involves estimating the coefficients and _____________.
choosing which predictors to include and in what form
Oversampling rare events is a method used to address ____________ situations.
class imbalance
The predict function in a logistic regression model generates the _________.
class membership (generates a column of 0s and 1s)
Interpreting the PCA component analysis (see pix), one can determine that the first principal component is dominated by the _______________.
fiber
Which of the following is an example of prediction?
forecasting sales
Dummy coding categorical variables can greatly _____________ of the dataset.
inflate the dimension
Using the same dataset, a good explanatory model and a good predictive model may use different ________________.
input variables/ forms of input variables
If a categorical variable is ordinal, it maybe be coded as ______________.
integer values
In a logistic regression model using the Python LogisticRegression algorithm, setting penalty = l1 and C = 1e42 selects _________ and then __________.
lasso; C to a massive number (C=1/λ; lambda has to be a very big number to make it close to zero)
Machine learning refers to algorithms that _____________.
learn directly from the data
Normalizing data is usually accomplished through one of two ways: 1) compute the z-score of the variable, and 2) ______________________.
rescale the variable to a uniform range
Look closely at the picture. Given the information provided about this residual distribution, you should choose __________ as the regression error performance measure.
residuals are NOT normally distributed (left skewed) --> RMSE (root mean square error)
Regularization models are also known as ___________ models.
shrinkage
If the classes are well-separated, then a __________ will exhibit good performance.
small dataset
Referring to the logistic model description (see pix), one can see that __________ prior to modeling because of the _________.
standardize or normalize that data; synthetic data injection; penalty (elasticnet)
If we have two classes, YES and NO, and if the YES class is more important (such as fraud), then the false omission rate is the proportion of ____________ predictions that are wrong.
the NO (we thought it was going to be a NO, but it was a YES and we falsely omitted the data)
In regression, prediction error is computed as ____________.
the difference between actual outcome value and predicted outcome value eᵢ = yᵢ - ŷᵢ
Explanatory linear regression modeling should be presented with __________ for the best calculation of coefficients.
the entire dataset
Both ridge regression and the lasso shrink all coefficients except ____________.
the intercept
The naive rule for classification involves classifying all records as members of ___________.
the majority class (the most prevalent class)
The naive rule for classification is to classify all records as part of the ______________.
the majority class (the most prevalent class)
Of the regression assumptions, the most important for explanatory modeling is that ___________ follows a ___________ distribution.
the noise ∊ (or equivalently, Y); normal
A linear regression equation is 'solved' by minimizing the _____________.
the sum of squared deviations between the actual outcome values (y) and their predicted values based on that model (ŷ) ∑(yᵢ - ŷᵢ)²