ISDS 471 Exam 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

What do we usually observe if the model is overfitted?

Fitting training data very well Fitting validation data very badly

Interpretation of the beta coefficients

For some variable age, the beta coefficient is -150, means per unit increase in age, the selling price decrease by 150

Stepwise

Forward selection but at each step also may drop non-significant predictors like in backward elimination Drawback: R^2 are biased high, estimates for regression coefficients are biased high in absolute value, narrow confidence interval, s.e. of regression coefficient estimates are biased low

How many dummy variables can we create for a categorical variable with m levels?

M

What is homoscedasticity?

Variance of residuals are the same, check with scatter plot of residuals over fitted y All points have the same distance from the line

What is overfitting?

When we use the same data to build the model and assess performance Using validation to choose parameters Assessing multiple models on same validation

Which of the following is (are) assumption(s) for MLR? Normality Linearity Homoscedasticity All of the above

all of the above

Which of the following is unsupervised learning? - Data visualization - Data reduction - Cluster analysis - All of the above

all of the above

Why do we want to evaluate supervised learning tasks? Compare models Select the tuning parameter Learn the prediction or classification accuracy All of the above

all of the above

What is AUC

area under curve, most common metric 1, perfect discrimination between classes .5, no better than naive rule

Which of the following is not supervised learning? - Association rule - Classification - prediction

association rule

AE

average error, gives idea of systematic over or under prediction

For a data set with 200 observations and 200 variables, which of the following variable selection method do you think is inappropriate? -Backward -Best subset -Both

both

How do we check the assumption for normality? QQ plot Histogram Both

both

What type of data to use bar chart for

categorical

What can we use heatmap for?

check the missing value pattern with conditional formatting Check the collinearity

What is the naive rule?

classify all records as belonging to the most prevalent class Used as benchmark

How do we check the missing value pattern in excel?

conditional formatting --> new rule

What type of data to use histogram for

continuous

How do we check collinearity

correlation in data analysis toolbox

Which of the following is not a measure of prediction error? -error rate -average error -MAE -RMSE

error rate

What is ER

error rate accuracy = 1 - error rate

FNR

false negative rate # true C1 that are classified C0 / # true C1 % of C1 incorrectly classified

FPR

false positive rate # true C0 that are classified C1 / # true C0 % of C0 incorrectly classified

What are the 4 variable selection methods for MLR

forward, backward, stepwise, best subset (exhaustive search)

What is collinearity

high correlation among independent variables

What is collinearity?

high correlation among independent variables use pairwise correlation matrix to delete the variables when their correlation is above .8 or below -.8

MAD/MAE

mean absolute error (deviation), gives an idea of the magnitude of errors

MAPE

mean absolute percentage error

In an MLR model 1 with 12 predictors, you have adjusted R2 = 0.710 and Mallow's Cp = 13.09. In the other MLR model 2 with 13 predictors, you have adjusted R2 = 0.712 and Mallow's Cp = 11.35. Which model is better? -Model 1 -Model 2

model 1, mallow's cp is closer to the number of coefficients (13)

If data are partitioned into three parts, what is validation data not used for?

model valuation

What assumptions do we need in MLR?

normality, linearity, independence, homoscedasticity,

Why do we combine categories when there are too many categories for a categorical variable Avoid categories of too few observations - Reduce the number of dummy variables - Other two are correct

other two are correct

how many dummy variables do we need? how many dummy variables can we create?

p-1 p

What is ROC

receiver operating characteristic plots sensitivity on y-axis and 1-specificity on x-axis, positive relationship Curves closer to the top left = better performers Comparison curve is the diagonal, reflects performance of naive rule

RMSE

root-mean-squared-error (yhat - y)

What is linearity?

scatterplot of y over continuous x

if cut off increases, what happens to sensitivity, specificity, FPR, FNR

sensitivity decreases specificity increases FPR decreases FNR increases

Confusion matrix ........................Predicted class Actual ............1............ 0 1 ........................a.............b 0 ........................c............ d

summarizes correct and incorrect classifications from a dataset a and d give number of correct classifications b = number of class 1 incorrectly predicted as 0 c = number of class 0 incorrectly predicted as 1

Which dummy variable is usually considered as reference category?

the last category

If given a regression output, which variable should be deleted?

the one with a high p-value, usually higher than .05

What type of data to use line chart for

time series

SSE

total sum of squared error

Which data set do you conduct the model diagnosis on?

training data

What is independence?

when residuals or error terms are independent of each other, check with scatterplot of residuals over fitted y when points in a small chunk below or above regression line are grouped together

How many observations can XLminer handle for training data?

10,000 observations

What is the miclassification error rate? ..................Predicted class Actual ......1 ......2 ...........3 1 ................10........0 ..........2 2 ...............3 ........9............3 3.................1..........1..............11

10/40

What are partitioned data used for?

2 parts -Training (trains model) and validation (evaluates model) 3 parts -Training (trains model), validation (compares and selects model/parameters), test part (evaluates)

What is the sensitivity when cutoff is .6 Actual ......Prob(Y=1) 1....................93 1.....................91 1.....................76 0....................69 1.....................51 1.....................44 0.....................31 0.....................25 0.....................17 0.....................04

3/5 (sensitivity is # of C1 that are correctly classified as C1 / # true C1). We are saying everything above .6 is C1, that means we have three 1's that are correctly classified. But there are actually 5 1's in total.)

Example of the naive rule: Sample with 70% C'1 and 30% C0's, what is the error rate for the naive rule?

30% / .3 We are putting all observations in C1

The following 5 questions are based on the distribution chart below. Actual ......Prob(Y=1) 1....................93 1.....................91 1.....................76 0....................69 1.....................51 1.....................44 0.....................31 0.....................25 0.....................17 0.....................04 Using cutoff of .5, what is the value of a? ......................Predicted class Actual..........1.......0 1.....................a........b 0....................c.........d

4

What is a high correlation

> .8 or < -.8

Outliers

> Q3 + 1.5 * IQR < Q1 - 1.5 * IQR

Variance explained by the model

Adjusted R^2 select model with highest value, if 2 models have similar values, select the one whose mallow's cp is closest to the # of coefficients (variables + 1)

Best subset (exhaustive)

All possible subsets of predictors assessed, judge by adjusted R^2 Drawback: computationally intensive, not good for dimensional more than 23

Examples of supervised learning tasks

Classification (predict categorical target/outcome variable, binary) Prediction (predict numerical target/outcome variable)

How do we check the distribution of numeric variables in excel

Either plot box plot or histogram

FNR will increase if cutoff increases from .2 to .8 Actual ......Prob(Y=1) 1....................93 1.....................91 1.....................76 0....................69 1.....................51 1.....................44 0.....................31 0.....................25 0.....................17 0.....................04

# of true C1 that are classified as C0 / # of true C1 with .2, FNR = 0 with .8, FNR = 3/5 true

Specificity

# true C0 that are classified C0 / # true C0 % of C0 correctly classified

Sensitivity

# true C1 that are classified C1 / # true C1 % of C1 correctly classified

Which of the following is not a partial search algorithm? -Best subset selection -forward selection -backward selection -stepwise regression

-best subset selection

Which of the following variable selection method is best? -forward selection -backward selection -Stepwise regression -cannot be determined

-cannot be determined

What is the FPR when cutoff = .7 Actual ......Prob(Y=1) 1....................93 1.....................91 1.....................76 0....................69 1.....................51 1.....................44 0.....................31 0.....................25 0.....................17 0.....................04

0/5 (we are saying everything above .7 is 1. FPR is true C0 that are classified as C1 (which is nothing in our case) / # of true C0, which is five because there are five 0's below .7)

Using .5 as a cutoff, what is the value of c?

1 (everything above .5 is 1 for c, we are asking how many actual 0's are predicted as 1. There is one 1 above .5)

What is normality?

Assumes residuals are normally distributed, Histogram or residuals or QQ plot Take a slice of the data, should be dense in the middle and sparse on the sides Normall distributed around regression line

What is supervised learning?

Predict single target or outcome variable, often binary

Measures to evaluate prediction (how well does the model predict new data? Not how well it fits the data it was trained with)

RMSE, MAD/MAE, MAPE, AE, SSE

Measures to evaluate classification

ROC, AUC, ER, Sensitivity, Specificity, FPR, FNR, Confusion matrix

How to determine which variables are important?

Small p-values are important, if a p-value is greater than .05 it is not

Backward

Start with all predictors, eliminate least useful one by one, stop when all remaining predictors have statistically significant contribution Drawback: computing the initial model with all predictors can be time consuming and unstable, not good when n < p

Forward selection

Start with no predictors, add one by one starting with largest R^2, stop when addition no longer statistically significant Drawback: may miss pairs or groups of predictors that perform well together but perform poorly as single predictors


Conjuntos de estudio relacionados

Auscultation of the Heart and Bedside Maneuvers

View Set

BIO Quarter 2 Pre-test (Post-test Study)

View Set

Chapter # 2 - Real Property Rights and Limitations

View Set

الأسئلة الشفوية لمسابقة أستاذ الإبتدائي

View Set

Western Civilization Since 1500 Final

View Set

Legal Dimensions of Nursing Practice PrepU

View Set