Actex Ch 1-3 - PA

Ace your homework & exams now with Quizwiz!

in terms of bias and variance, how does stepwise selection improve predictive power of a glm?

compared to full model with all potentially useful predictors, reduced model will have a lower degree of complexity. the reduced complexity will reduce flexibility, which reduces variance, but increases the squared bias. provided that the drop in variance outweighs the rise in bias, the prediction performance of the glm will improve

what is the regularization parameter? what happens as the regularization parameter increases?

controls the amount of regularization/shrinkage in model as lambda increases, complexity decreases, and therefore bias increases and variance decreases

as training error increases, model flexibility ______

decreases

what shape does test error make as model flexibility increases?

u-shape

what is the elastic net

(1-alpha)*ridge + (alpha)*lasso lasso=last more general regularized regression a=0 simplifies to ridge

3 common data issues for numeric variables

1. highly correlated predictors 2. skewness 3. whether or not they should be converted to a factor

between BIC and AIC, which one tends to favor models with fewer predictors?

BIC the penalty term for number of predictors gets multiplied by 2 for AIC penalty term for number predictors gets multiplied by ln(n) for BIC

What is stratified sampling? Why use it?

Dividing the population into subgroups then selecting a sample from each of these groups We use it to produce representative training and test sets w.r.t target variable. a pro is that every group is properly represented.

Pros and cons of stepwise selection

Pros: Much less computationally intensive than best subset selection (only requires us to fit 1+(p(p+1))/2 models) Cons: It does not consider all possible combinations of features and therefore may not choose the best combination We can only add or drop one feature at a time, not multiple ONLY EVER DROP ONE VARIABLE AT A TIME because interactions may make one significant once another is dropped

What are the three loss functions? Describe them.

Square loss - most common for numeric target vars absolute loss zero-one loss - most common for categorical target vars

What is faceting? when is it useful to use?

a way to categorize data into distinct groups. observations are displayed in separate plots placed side by side on the same scale useful to use in boxplots to detect interaction for both numeric and categorical target variables. also useful in bar charts for categorical target variables and predictors

what is the bias-variance trade-off?

as model's flexibility increases, variance increases while bias decreases. the goal is to minimize the mse.

best visual displays for categorical v categorical target var and predictor?

bar charts stacked, dodged, filled these can tell us any sizeable differences in class proportions among different factor levels.

are bar charts best for numeric target variables or categorical target variables? why?

best for categorical target variables. these can tell us which levels are most common and if there are any sparse levels.

best visual displays for numeric target variables?

best visual displays for numeric target variables are histograms and boxplots. histograms can tell us about right skew boxplots can tell us about unusual values

as model flexibility increases, what happens to the bias? variance? interpretability?

bias decreases variance increases interpretability decreases

when should categorical variables that have numeric data type be converted to a factor?

convert if: - var has small number of distinct values (ex- qtr of year) - var values are merely numeric labels (no sense of numeric order) - var has complex relationship w target variable (factor conversion gives model more flexibility to capture relationships) do not convert if: - var has large number of distinct values (hour of day, could cause overfitting and high dimension) - var values have sense of numeric order that is useful in predicting target variable - var has simple monotonic rel w target - future obs will have new variable value (cal yr)

risks of a very flexible model

f likely suffers from overfitting

difference between flexibility and interpretability?

flexibility is the model's ability to follow the data closely interpretability is the model's ability to be understood

what are the graphical displays to detect interactions for a numeric target variable?

for numeric and cat predictor - use scatterplot colored by cat predictor for two categorical predictors - use boxplot for target split by one predictor and faceted by the other predictor for two numeric predictors - bin the predictors or use decision tree

graphical displays to detect interactions for categorical target variables?

for numeric and cat predictors - use boxplots split by target var and faceted by cat predictor for cat predictors - use bar chart for one predictor filled by targeted and faceted by other predictor

forward v backward selection pros/cons

forward selection good when large number of variables

define signal

general relationships between the target variance and predictors this is applicable to both training and test data

as model flexibility increases, variance _____________

increases this is because the model is matching the training observations mroe closely and therefore is more sensitive to training data. small change in training obs can cause massive changes in f

difference between interaction and correlation

interaction concerns a 3 way relationship bw 1 target and 2 predictors correlation concerns the relationship bw two numeric predictors

what are three common forms of penalty terms in regularization?

lasso ridge regression elastic net lasso - abs value Bj some coeffs may be zero ridge- B^2j none reduced to zero elastic net - some coeffs may be zero

when lambda is large enough, some coefficients become exactly zero for lasso reg or ridge reg?

lasso reg

what does it mean if the model underfits the data with regards to training error, bias, variance, test error?

model too simple to capture the signal in the data training error increases, bias increases, variance decreases bias drops faster than variance increases, so test error decreases

if a model has small bias, are the predictions more accurate or less accurate on average?

more accurate

why is it a concern to have right skewed data? what are possible solutions for right skewness?

note that right-skewness is associated with outliers. concern because model fit will be disproportionate with extreme values on right tail and sum of squares can be heavily influenced by the observations in right tail, which distorts visualizations possible solutions: log transformation square root transformation

what is regularization?

process that reduces overfitting by shrinking the size of the coefficient estimates, especially those of non-predictive features works to optimize training loglikelihood adjusted by penalty term that reflects size of coefficients strikes balance between goodness of fit and model complexity

pros/cons of correlation

pros- tells us about linear relationship between numeric variables con- only can tell us about relationships for linear. may not capture other types of relationships.

what are the pros/cons of a larger training set?

pros- training is more robust cons - evaluation on test set is less reliable alternative- do split based on time variable. more recent years part of test set

disadvantage of parametric methods vs non-parametric methods?

recall parametric methods have a functional form of f specified parametric methods risk choosing a form for f that is far from the truth nonparametric methods need an abundance of observations

define interaction

relationship between a predictor and the target variable depends on the value/level of another predictor

best visual displays for numeric v numeric target and predictor in bivariate exploration?

scatterplots these can tell us about relationships between the predictors. we can also vary these by a third categorical variable by changing color/shape of obs on plot, aka interaction.

why is sparse levels an issue for categorical predictors? any solution(s)?

sparse levels reduce robustness of models and may cause overfitting solution is to combine sparse levels with more populous levels where target var behaves similarly

best visual display(s) for numeric target var and categorical predictor?

split boxplots or histograms split boxplots can show us where the mean varies based on different categorical levels. we recalculate mean and std dev and all that. histograms can be adapted to visualize split by level

what are cons of sampling?

subject to respondent bias suffer from low response rate

what is bias?

the part of the test error caused by the model not being flexible enough to capture the underlying signal the avg closeness bw f and f hat, aka component of model error due to f hat not being complex enough.

One problem with RSS and R^2

these two metrics are goodness-of-fit measures of a linear model with no explicit regard to its complexity or prediction performance. as complexity increases, rss will always decr and r^2 will always increase

effects of lambda and bias-variance trade-off

when lambda=0, regularization penalty vanishes and coefficient estimates are identical to the ols estimates as lambda increases, effect of regularization becomes more severe. flexibility of model drops, variance drops, bias increases. prediction accuracy should improve. as lambda approaches infinity, regularization penalty dominates. all slope coeffs go to zero. linear model becomes the intercept-only model

what is target leakage? how do we detect it?

when predictors in a model "leak" information about the target variable that would not be available when the model is deployed in practice. this is an issue because these variables cannot serve as predictors in practice and would lead to artificially good model performance if mistakenly included. to detect it, need to understand what each variable means. these variables are observed at same time or after the target variable. need to study the data dictionary carefully when provided on exam.

what does it mean if the model overfits the data with regards to training error, bias, variance, test error?

when the model becomes too complex, the drop in bias is outweighed by the increase in variance. test error eventually increases. this displays the u-shape behavior.


Related study sets

Evolve HESI Leadership/Management

View Set

ISDS Exam 3 (Chapters 11-14 & 16) Multiple Choice

View Set

5.12 Lecture: Bohr Model of the Atom

View Set

Human Resource Selection (Gatewood et al.) Chapters 1-5

View Set

Combo with "EAR 106 Exam 3" and 3 others

View Set

Ch 40 Management of Patients with Gastric and Duodenal Disorders

View Set