Statistical Learning Study

Ace your homework & exams now with Quizwiz!

You have a bag of marbles with 64 red marbles and 36 blue marbles. What is the value of the Gini index for that bag?

(0.64)(1-0.64)+(0.36)(1-0.36)

Quadratic Discriminant Analysis (QDA) offers an alternative approach to LDA that makes more of the same assumptions, excep thtat QDA assumes that

each class has its own covariance matrix.

You are fitting a linear model to data assumed to have Gaussian errors. The model has up to p = 5 predictors and n = 100 observations. Which of the following is most likely true of the relationship between Cp and AIC in terms of using the statistic to select a number of predictors to include?

Cp will select the same model as AIC.

A fitted model with more predictors will necessarily have a lower Training Set Error than a model with fewer predictors.

False

A model with a high Cp coefficient is preferable.

False

If one feature (compared to all others) is a very strong predictor of the class label of the output variable, then all of the trees in a random forest will have this feature as the root node.

False

R^2 is a good measure of model adequacy for a logistic regression model.

False

Suppose you are given a dataset of cellular images from patients with and without cancer. If you are required to train a classifier that predicts the probability that the patient has cancer, you would prefer to use Decision trees over logistic regression.

False

The bootstrap method involves sampling withouth replacement.

False

We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain p+1 models, containing 0, 1, 2, ..., p predictors. The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k+1) variable model identified by best subset selection.

False

Logistic regression is a _____ used to model a binary categorical outcome using numerical and categorical predictors.

Generalized linear model

The LASSO, relative to least squares is:

Less flexible and hence will give improves prediction accuracy when its increase in bias is less than its decrease in variance.

logistic regression assumes a:

Linear relationship between continuous predictor variables and the logit of the outcome variable.

We want to predict gender based on annual income and weekly working hours. The training set consists of annual income and weekly working hours for 900 men and 800 women. Which method should one prefer?

Logistic regression

The logistic regression coefficients are usually estimated using the _____

Maximum likelihood estimation

The logistic regression coefficients are usually estimated using the _____ method

Maximum likelihood estimation

You have a bag of marbles with 64 red marbles and 36 blue marbles. What is the value of the entropy for that bag?

-0.64*log(0.64)=0.36*log(0.36)

A logistic regression model was used to assess the association between cardiovascular disease (CVD) and obesity. P is defined to be the probability that the people have CVD, and obesity was coded as 0-non obese, 1=obese., resulting in the model: ln(P/1-P) = -2+0.7obesity What is the log odds for CVD in persons who are obese as compared to not obese?

0.7

You are trying to fit a model and are given p=30 predictor variables to choose from. Ultimately, you want your model to be interpretable, so you decide to use Best Subset Selection. How many different models will you end up considering? (you can leave the answer expressed in terms of p)

2^30 models

Give an example of a supervised learning problem.

A supervised learning problem is one where we have an output to rely on to find the best predictors. One example can be trying to find out the best predictors for crime in a state. You can use historical data from the past couple of years to find which predictors turn out to be the most reliable to predict future crime rates from one year to the next. Then you can use that information to make a prediction on crime rates for future years.

Which of the following can be used to evaluate the performance of logistic regression model?

AIC

Bagging algorithms attach weights to a set of N weak learners. They re-weight the learners and convert them into strong ones. Boosting algorithms draw N sample distributions (usually with replacement) from an original data set for learners to train on.

False

Give an example of an unsupervised problem.

An unsupervised learning problem is a problem where we do not have an output to rely on for our predictors. What we can do for example is something called cluster sampling where we can identify trends in the data set. One example can be trying to see if there are certain temperatures that allow for better transmission of the cold virus.

Bagging = _____ _____

Bootstrap Aggregating

Which of the following gives the differences between he logistic regression and LDA?

If the classes are well separated, the parameter estimates for logistic regression can be unstable. If the sample size is small and the distribution of features are normal for each class. In such case, linear discriminant analysis is more stable thanm logistic regression.

We want to predict gender based on height and weight. The training set consists of heights and weights for 80 men and 60 women. Which method should one prefer?

LDA

_____ is a phenomenon where a model closely matches the training data such that it captures too much of the noise or error in the data. This results in a model that fits the training data very well, but does not make good predictions under test or in general.

Overfitting

What are the differences between Random Forest (RF) and Boosted Regression Trees (BRT) algorthms?

RF builds multiple independent trees, while BRT builds multiple dependent trees that take into account the fit of the previous tree. RF grows trees in parallel, while BRT is sequential. RF uses the bagging method to select random subsets, and BRT uses the boosting method.

ROC stands for

Receiver Operating Characteristic

How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary least squares regression?

Ridge has larger bias, smaller variance.

The standard error (SE) of an estimator reflects how it varies under repeated sampling. For simple linear regression:

S E ( β 1 ^ ) = σ 2 ∑ i = 1 n ( x i − x ¯ ) 2

The ROC curve is obtained by plotting...

Sensitivity vs. (1-Specificity)

Given a matrix X, the expression UEV^T denotes the _____ of X

Singular Value Decomposition

Decision trees such as regression or classification trees are known to _____

Suffer from high variance. Stratify or segment the predictor space into a number of simple regions.

A good strategy is to grow a very large tree T 0, and then prune it back in order to obtain a subtree. Cost complexity pruning — also known as weakest link pruning — is used to do this. We consider a sequence of trees indexed by a nonnegative tuning parameter α. For each value of α there corresponds a subtree T ⊂ T 0 such that ∑ m = 1 | T | ∑ i : x i ∈ R m ( y i − y ^ R m ) 2 + α | T | Here | T | indicates the number of terminal nodes of the tree T, R m is the rectangle (i.e. the subset of predictor space) corresponding to the m-th terminal node, and y ^ R m is the mean of the training observations in R m. Imagine that you are doing cost complexity pruning as defined above. You fit two trees to the same data: T 1 is fit at α = 1 and T 2 is fit at α = 2. Which of the following is true?

T1 will have at least as many nodes as T2.

You are doing a simulation in order to compare the effect of using Cross-Validation or a Validation set. For each iteration of the simulation, you generate new data and then use both Cross-Validation and a Validation set in order to determine the optimal number of predictors. Which of the following is most likely?

The validation set method will result in a higher variance of optimal number of predictors.

In binary logistic regression:

The dependent variable is divided into two equal subcategories.

What do residuals represent?

The difference between the actual Y values and the predicted Y values.

In a simple linear regression model, y=B0 + B1x, what does the B1 represent?

The estimated change in average per y unit change in x.

In order to perform Boosting, we need to select some parameters. List 3 of those parameters

The number of trees, the number of splits in those trees, and...

Which of the following can be a stopping ruls in fitting a Classification Tree?

The tree is stopped when all groups are relatively homogeneous. The tree is stopped when a predefined maximum number of splits is reached.

Which one of the following is the main reason for pruning a Decision tree?

To avoid overfitting the training set.

Some of the advantages of decision tree models are:

Trees closely mirror human decision-making process Trees are easy to explain Trees don't require dummy variables to model qualitative variables. Trees can be displayed graphically and are easily interpreted by non-experts.

Adjusted R^2 aims to penalize models that include unnecessary variables

True

In simple linear regression, the square of the correlation between X and Y (that is r2) and the fraction of variance explained (that is R2) match

True

Is logistic regression a supervised machine learning algorithm?

True

The link function of linear regression is the identity function (i.e. y=y), whereas the logit is the link function for logistic regression.

True

We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, we obtain p+1 models, containing 0, 1, 2, ..., p predictors. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1) variable model identified by forward stepwise selection.

True

When using LASSO, normalizing your input features influences the predictions.

True

False Negative (FN) rate is also known as _____

Type II Error

Which of the following is NOT a benefit of the sparsity imposed by the Lasso?

Using the Lasso penalty helps to decrease the bias of the fits.

While doing a homework assignment, you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one. Which of the following is most likely true?

Using the Quadratic Model with decrease the Bias of your model.

Given an ROC curve, we can use the _____ as an assessment of the predictive ability of the model.

area under the curve (AUC)

Logistic regression is a _____ used to model a binary categorical outcome using numerical and categorical predictors.

generalized linear model

If we want to build a logistic regression model in R, we can use the function:

glm() with the option 'family = binomial'

If we want to build a logistic regression model in R, we can use the function

glm() with the option 'family="binomial"'

Tree/Rule based classification algorithms generate _____ rule to perform the classification

if-then

In K-Nearest Neighbors, the choice of K can have a drastic effect on the yielded calssifier. Too low of a K yields a classifier that...

is too flexible has too high a variance has low bias

(in statistical learning) LASSO stands for

least absolute shrinkage and selection operator

Predicting how many points a student can get in a competitive exam based on hours of study can be solved using _____ regression model.

linear

Whether a student will pass of fail in the competitive exam based on hours of study can be solved using _____ regression model

logistic

For 0 ≤ p ≤ 1, ln(p/1-p) is called the _____

logit function

In simple linear regression, the least squares approach chooses ^B0 and ^B1 to _____

minimize the RSS

For a classification tree, predictors are made based on the notion that each observation belongs to the _____ of the training observations in the region to which the observation belongs.

most commonly occurring class.

A frequent problem in estimating logistic regression models is a failure of the likelihood maximization algorithm to converge. In most cases this failure is a consequence of data patterns known as _____

multi-collinearity

To present the results of a logistic regression model, it is often helpful to use graphs of _____

predicted probabilities

Ridge Regression

reduces variance at the expense of higher bias.

The _____ R package can be used with the caret package to train tree-based models.

rpart

When creating a logistic regression model in addition to the accuracy of the classifier, it is also important to check the values of _____

the log-odds

_____ is one example of a non-parametric method.

thin-plate spline

LASSO can be interpreted as least squares linear regression where

weights are regularized with the l1 norm.


Related study sets

Chapter 53: Assessment of Kidney and Urinary Function

View Set

warning signs of abusive relationships

View Set

Eng 201 Test 1, SLUH Odyssey 25 questions, KSU English 2110 Kerfoot Exam 1 Study Guide- terminology, Epic of Gilgamesh, World Lit Exam 1, English 2110, World Literature I Midterm practice (English 2111), World Lit Exam 1 (Epic of Gilgamesh + The Odys...

View Set

Block 2 Pathology Test 1 Chapter 8 Rickets and Osteomalacia

View Set

Network Security Modules 5-7: Monitoring and Managing

View Set