ISLR-Ch2

Ace your homework & exams now with Quizwiz!

If we are interested in prediction only, will a less flexible or a more flexible method give us accurate results?

An inflexible method will give us better results because a flexible method has the potential for overfitting.

Explain how train error and test error rate behave in a classification setting

As in the regression setting, the training error rate consistently declines as the flexibility increases. However, the test error exhibits a characteristic U-shape, declining at first before increasing again when the method becomes excessively flexible and overfits.

Talk about what happens when flexibility increases in terms of bias variance and Test MSE

As we increase the flexibility of a class of methods, the bias tends to initially decrease faster than the variance increases. Consequently, the expected test MSE declines. However, at some point increasing flexibility has little impact on the bias but starts to significantly increase the variance. When this happens the test MSE increases.

What is the relationship between flexibility and test MSE

As with the training MSE, the test MSE initially declines as the level of flex- ibility increases. However, at some point the test MSE levels off and then starts to increase again.

Advantages of Parametric Method?

Assuming a parametric form for f simplifies the problem of estimating f because it is generally much easier to estimate a set of pa- rameters, such as β0,β1,...,βp in the linear model, than it is to fit an entirely arbitrary function f.

Explain what a Bayes classifier does

Bayes classifier simply assigns a test observation with predictor vector x0 to the class j for which Pr(Y = j|X = x0) (2.10) is largest. Note that (2.10) is a conditional probability: it is the probability that Y = j, given the observed predictor vector x0.

What is an indicator variable

I(yi ̸= yˆi) is an indicator variable that equals 1 if yi ̸= yˆi and zero if yi = yˆi. If I(yi ̸= yˆi) = 0 then the ith observation was classified correctly by our classification method; otherwise it was misclassified.

Give a situation where we might prefer a restrictive method over an inflexible method.

If we are mainly interested in inference, then restrictive models are much more interpretable. Flexible models lead to such complicated estimates of f that it is difficult to understand how any individual predictor is associated with the response.

What is a qualitative/categorical variable?

In contrast, qualitative variables take on values in one of K different classes, or categories. Examples of qualitative variables include a person's gender (male or female), the brand of prod- uct purchased (brand A, B, or C), whether a person defaults on a debt (yes or no), or a cancer diagnosis (Acute Myelogenous Leukemia, Acute Lymphoblastic Leukemia, or No Leukemia)

What is unsupervised learning?

In contrast, unsupervised learning describes the somewhat more chal- lenging situation in which for every observation i = 1,...,n, we observe a vector of measurements xi but no associated response yi.

What is prediction? How can we predict Y?

In many situations, a set of inputs X are readily available, but the output Y cannot be easily obtained. In this setting, since the error term averages to zero, we can predict Y using ˆ ^ Y-hat= f(X),

Why is f-hat treated as a black box?

In this setting, fˆ is often treated as a black box, in the sense that one is not typically concerned with the exact form of f^, provided that it yields accurate predictions for Y .

Which predictors are associated with the response?

It is often the case that only a small fraction of the available predictors are substantially associated with Y . Identifying the few important predictors among a large set of possible variables can be extremely useful, depending on the application.

Rank in order of flexibility:

Lasso, linear regression, generalized additive models, (bagging,boosting,support vector machines)

More flexibility results in ____ bias

Low bias

What is Test MSE

MSE calculated using test data. we are interested in the accuracy of the pre- dictions that we obtain when we apply our method to previously unseen test data.

What is train MSE

MSE calculated using the training data

What is the mean squared error?

MSE is measure of quality of fit used most commonly in the regression setting. It measures the extent to which the predicted response value for a given observation is close to the true response value for that observation.

Why is test MSE large when we overfit the data ?

When we overfit the training data, the test MSE will be very large because the supposed patterns that the method found in the training data simply don't exist in the test data.

relationship between Y and X = (X1,X2,...,Xp)

Y =f(X)+ε.

systematic information that X provides about Y

f represents the systematic information that X provides about Y .

What is f-hat?

fˆ represents our estimate for f

What is reducible error?

fˆ will not be a perfect estimate for f, and this inaccuracy will introduce some error. This error is reducible because we can potentially improve the accuracy of fˆ by using the most appropriate statistical learning technique to estimate f.

What must be true about variance and bias if we want to minimize the MSE

in order to minimize the expected test error, we need to select a statistical learning method that simultaneously achieves low variance and low bias.

What provides an upper bound on the accuracy of our prediction for Y?

irreducible error will always provide an upper bound on the accuracy of our prediction for Y

Is logistic regression a classification or regression problem?

is typically used with a qualitative As such it is often used as a classification method. But since it estimates class probabilities, it can be thought of as a regression.

Bayes Decision Boundary

line represents the points where the probability is exactly 50%. This is called the Bayes decision boundary. The Bayes classifier's prediction is determined by the Bayes decision boundary

Trade-off between linear and non linear models for inference

linear models allow for relatively simple and interpretable inference, but may not yield as accurate predictions as some other approaches. In contrast, some of the highly non-linear approaches that we discuss in the later chapters of this book can potentially provide quite accurate predictions for Y , but this comes at the expense of a less interpretable model for which inference is more challenging. linear model

In a two-class problem, explain how the Bayes classifier works

only two possible response values, say class 1 or class 2, the Bayes classifier corresponds to predicting class one if Pr(Y = 1|X = x0) > 0.5, and class two otherwise.

What is a classification problem?

while those involv- ing a qualitative response are often referred to as classification problems.

What is ε?

ε is a random error term, which is independent of X and has mean zero.

A good classifier is one for which the error rate is smallest. True or False

True.

What does the accuracy of Yˆ depend on?

1. reducible error and 2. the irreducible error.

True or false: linear regression is more interpretable than lasso

False

What is Supervised learning ?

For each observation of the predictor measurement(s) xi, i = 1,...,n there is an associated response measurement yi. We wish to fit a model that relates the response to the predictors, with the aim of accurately predicting the response for future observations (prediction) or better understanding the relationship between the response and the predictors (inference).

Describe how Knn works

Given a positive in- teger K and a test observation x0, the KNN classifier first identifies the K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j: Finally, KNN applies Bayes rule and classifies the test observation x0 to the class with the largest probability.

What is the relationship between flexibility and training MSE

Greater flexibility = lower training MSE

More flexibility leads to —— variance

High variance

Can the relationship between Y and each predictor be adequately sum- marized using a linear equation, or is the relationship more complicated?

Historically, most methods for estimating f have taken a linear form. In some situations, such an assumption is reasonable or even desirable. But often the true relationship is more complicated, in which case a linear model may not provide an accurate representation of the relationship between the input and output variables.

What are non parametric methods?

Non-parametric methods do not make explicit assumptions about the func- tional form of f . Instead they seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly.

we see that the expected test MSE can never lie below Var(ε), the irreducible error from (2.3). True or false.

Note that variance is inherently a nonnegative quantity, and squared bias is also nonnegative. Hence, we see that the expected test MSE can never lie below Var(ε), the irreducible error from (2.3).

What is the bias of a statistical method?

On the other hand, bias refers to the error that is introduced by approxi- mating a real-life problem, which may be extremely complicated, by a much simpler model.

What are quantitative variables?

Quantitative variables take on numerical values. Examples include a person's age, height, or income, the value of a house, and the price of a stock

What is the relationship between the response and each predictor?

Some predictors may have a positive relationship with Y , in the sense that increasing the predictor is associated with increasing values of Y . Other predictors may have the opposite relationship. Depending on the complexity of f, the relationship between the response and a given predictor may also depend on the values of the other predictors.

Why is it easy to compute train MSE but hard to get the test MSE

Test data is hard to find

When is MSE small? When is it large?

The MSE will be small if the predicted responses are very close to the true responses, and will be large if for some of the observations, the predicted and true responses differ substantially.

What is degrees of freedom?

The de- grees of freedom is a quantity that summarizes the flexibility of a curve

What is cluster analysis?

The goal of cluster analysis is to ascertain,on the basis of x1,...,xn,whether the observations fall into relatively distinct groups.

Other names for input variables

The inputs go by different names, such as predictors, independent variables, features, or sometimes just variables

How do we measure accuracy in a classification setting ?

The most common approach for quantifying the accuracy of our estimate fˆ is the training error rate, the proportion of mistakes that are made if we apply our estimate fˆ to the training observations:

What is the most common approach to fitting a model?

The most common approach to fitting the linear model is referred to as (ordinary) least squares.

Disadvantages of parametric models?

The potential disadvantage of a parametric approach is that the model we choose will usually not match the true unknown form of f. If the chosen model is too far from the true f, then our estimate will be poor.

Why is the irreducible error larger than zero?

The quantity ε may con tain unmeasured variables that are useful in predicting Y : since we don't measure them, f cannot use them for its prediction. The quantity ε may also contain unmeasurable variation.

What is Bias and Variance Tradeoff?

The relationship between bias, variance, and test set MSE given in Equa- tion 2.7 and displayed in Figure 2.12 is referred to as the bias-variance trade-off. Good test set performance of a statistical learning method re- quires low variance as well as low squared bias. This is referred to as a trade-off because it is easy to obtain a method with extremely low bias but high variance (for instance, by drawing a curve that passes through every single training observation) or a method with very low variance but high bias (by fitting a horizontal line to the data). The challenge lies in finding a method for which both the variance and the squared bias are low.

Why Estimate f?

There are two main reasons that we may wish to estimate f: prediction and inference

What is overfitting?

These more complex models can lead to a phenomenon known as overfitting the data, which essentially means they follow the errors, or noise, too closely.

What is training data?

These observations are called the training data because we will use these observations to train, or teach, our method how to estimate f.

The Bayes classifier produces the lowest possible test error rate, called the Bayes error rate. True or False

True

True or false: lasso is less flexible than linear regression

True

With K = 1, the KNN training error rate is 0, but the test error rate may be quite high. True or false

True

What is the fundamental problem with choosing a method that minimizes the MSE?

Unfortunately, there is a fundamental problem with this strategy: there is no guarantee that the method with the lowest training MSE will also have the lowest test MSE. Roughly speaking, the problem is that many statistical methods specifically estimate coefficients so as to minimize the training set MSE. For these methods, the training set MSE can be quite small, but the test MSE is often much larger.

Is cluster analysis supervised or unsupervised?

Unsupervised

What is variance of a statistical method?

Variance refers to the amount by which fˆ would change if we estimated it using a different training data set. Since the training data are used to fit the statistical learning method, different training data sets will result in a different f^. But ideally the estimate for f should not vary too much between training sets. However, if a method has high variance ˆ then small changes in the training data can result in large changes in f^.

What is inference?

We are often interested in understanding the way that Y is affected as X1,...,Xp change. In this situation we wish to estimate f, but our goal is not necessarily to make predictions for Y . We instead want to understand the relationship between X and Y , or more specifically, to understand how Y changes as a function of X1, . . . , Xp.

How to address problems of parametric methods?

We can try to address this problem by choosing flexible models that can fit many different possible functional forms for f.

What is a regression problem?

We tend to refer to problems with a quantitative response as regression problems

The choice of K has a drastic effect on the KNN classifier obtained.

When K = 1, the decision boundary is overly flexible and finds patterns in the data that don't correspond to the Bayes decision boundary. This corresponds to a classifier that has low bias but very high variance. As K grows, the method becomes less flexible and produces a decision boundary that is close to linear. This corresponds to a low-variance but high-bias classifier

What is overfitting?

When a given method yields a small training MSE but a large test MSE, we are said to be overfitting the data. This happens because our statistical learning procedure is working too hard to find patterns in the training data, and may be picking up some patterns that are just caused by random chance rather than by true properties of the unknown function f.

What is Y-hat?

Yˆ represents the resulting pre- diction for Y .

What is parametric method of statistical learning?

a two-step model-based approach: 1. First, we make an assumption about the functional form, or shape, of f. For example, one very simple assumption is that f is linear in X 2. After a model has been selected, we need a procedure that uses the training data to fit or train the model. In the case of the linear model (2.4), we need to estimate the parameters β0,β1,...,βp.

Mention what happens to train MSE and test MSE as the flexibility of model increases

as the flexibility of the statistical learning method increases, we observe a monotone decrease in the training MSE and a U-shape in the test MSE. As model flexibility increases, training MSE will decrease, but the test MSE may not.

What is irreducible error?

because no matter how well we estimate f, we cannot reduce the error introduced by ε.

What is an advantage of non parametric method?

by avoiding the assumption of a particular functional form for f, they have the potential to accurately fit a wider range of possible shapes for f

What is cross-validation?

cross-validation, which is a method for estimating test MSE using the training data.

computing the Bayes classifier is impossible for real data. Why?

real data, we do not know the conditional distri- bution of Y given X

What is a disadvantage of non parametric method?

since they do not reduce the problem of estimating f to a small number of parameters, a very large number of observations (far more than is typically needed for a parametric approach) is required in order to obtain an accurate estimate for f.

What is statistical learning?

statistical learning refers to a set of approaches for estimating f. f represents the systematic information that X provides about Y .

What is semi supervised learning?

suppose that we have a set of n observations. For m of the observa- tions, where m < n, we have both predictor measurements and a response measurement. For the remaining n − m observations, we have predictor measurements but no response measurement. Such a scenario can arise if the predictors can be measured relatively cheaply but the corresponding responses are much more expensive to collect. We refer to this setting as a semi-supervised learning problem.

Which is smaller? Test MSE or Train MSE?

we almost always expect the training MSE to be smaller than the test MSE because most statistical learning methods either directly or indirectly seek to minimize the training MSE.


Related study sets

OB/ GYN: Physiology of the Female Pelvis

View Set

Week 2: Maternity by Lowdermilk & Perry: Chapters 12, 13, 14, 15, 26

View Set

HW 1: Fundamentals (microeconomics)

View Set

Chapter 27 Pathophysiology NCLEX-Style Review Questions

View Set

Geografi (Midterm) : Menghitung jumlah penduduk

View Set

Chapter 9: Pathways that Harvest Chemical Energy

View Set

Leccion 3 self check Autoevaluation

View Set