STA 363 Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

multiple linear regression phrasing conclusions

(response) increases on average by (b1) units for every one-unit increase in X1 when other variables are held constant

True or False? Consider two models for the same data. Model 1 has AIC = -32.9, and model 2 has AIC = -28.8. Model 2 is the better fitting model to the data

false

True or False? Cross-validation is a method used to determine what variables are significant in a statistical model

false

True or False? In ANCOVA models, we typically start by fitting a no-interaction model and then simplify the model if warranted

false

True or False? In a multiple regression model given by Y= B0 + B1X1 + B2X2 + B3X3 + e, B1 can be correctly interpreted as the mean change in Y given a one-unit increase in X1

false

True or False? In a one-way ANOVA, the F-Test tests the null hypothesis H0: u1 = u2 = ... = uk = 0

false

True or False? Multicollinearity is a situation in a multiple regression where some of the predictors are related to the response variable

false

Repeated measures ANOVA assumptions

independence (between subjects), constant error variance, and normality CANNOT CHECK WITH AUTOPLOT

Simple linear regression assumptions

independence (no plot), constant error variance (check scale-location), Normality (check QQ plot), and linearity (check residuals vs fitted)

two-way ANOVA assumptions

independence (no plot), constant error variance (check scale-location), and Normality (check QQ plot)

Blocked ANOVA assumptions

independence (no plot), constant error variance (check scale-location), and normality (check QQ plot)

one-way ANOVA assumptions

independence (no plot), constant error variance (check scale-location), and normality (check QQ plot)

multiple linear regression assumptions

independence (no plot), constant error variance (check scale-location), normality (check QQ plot), and linearity (check residuals vs fitted)

t-test assumptions

independence, equal variance (unless we are doing the unequal variances test)

two-way ANOVA EDA

interaction plot (line plot) Use x-axis for one factor, and separate lines for the other

categorical predictors... understand coding of dummy variables

interpretations of model coefficients are based on the reference category: all categories are compared to the reference. Since reference category is represented by all dummy variables equal to 0, intercept represents the group that is in the reference category for all categorial predictors

Poisson Regression

is appropriate in some applications when the variable of interest is a count.

multiple linear regression usage

multiple linear regression is used for similar purposes as simple linear regression, but it is used when we have multiple predictors of interest

A confidence interval for the mean response in a regression model is...

never wider than the corresponding prediction interval for the response

Repeated measures ANOVA EDA

profiles plots (line plots grouped by subject, so each line represents on subject)

Model coefficients... other coefficients (when exponentiated)

represent odds ratios Must remember that effects are multiplicative (ex. a two-unit increase in a predictor will increase the response by (e^beta)^2 times

Usual Observations... outliers

residuals larger than +/- 3

Unusual observations... what plot do we look at?

residuals: the difference between the observed value of the dependent variable and the predicted variable leverage: a measure of how far away the independent variable values of an observation are from those of the other observation

Two variables are said to interact if...

the effect that one of them has on the response depends on the value of the other

True or False? Logistic regression is a modeling tool for binary response variables. However, you can use either quantitative or qualitative predictors in a logistic regression model

true

True or False? Poisson regression is a type of generalized linear model useful for data where the response Variable Y is a count

true

True or False? R^2 is a poor means by which to compare the quality of fit of two models because R^2 will never decrease by adding predictors to the model to the data

true

when are reduced f-tests used?

when we want to compare our full model to a "reduced" model, which has fewer predictors. ONLY FOR LINEAR MODELS. For GLM, use likelihood ratio test instead.

one-way ANOVA usage

when we want to compare the means of a response at two or more factor levels of one factor of interest

Paired t-test usage

when we want to compare the means of two samples, and there is a natural pairing between measurements (same subjects measured at different times)

Blocked ANOVA null hypothesis

α1 = α2 = ... = αk = 0 OR μ1 = μ2 = ... = μk

one-way ANOVA null hypothesis

α1 = α2 = ... = αk = 0 OR μ1 = μ2 = ... = μk

one-way ANOVA model form

Yi = μ + αi + εi μ is the overall mean αi is the effect of group i ε is the error

Blocked ANOVA model form

Yij = μ + αi + Bj + εij μ is the overall mean αi is the effect of group i βj is the effect of block j ε is the error

t-test model form

Yij = μ + αi + εij, for i=1,2 μ is the overall mean αi is the effect of group i ε is the error (special case of ANOVA)

two-way ANOVA model form

Yij=μ+αi+βj+αβij+εij μ is the overall mean αi is the effect for factor 1 βj is the effect for factor 2 αβij is the interaction ε is the error

R^2

higher is better

one-way ANOVA follow up procedures

if the ANOVA f-test is significant for the factor of interest, Tukey or Dunnett multiple comparisons

Simple linear regression model form

Y = β0 + β1X1 + ε be sure to define Y and X1

Simple linear regression null hypothesis

t-test: β1 = 0 which means predictor has no effect on response variable

t-test usage

When we want to compare the means of two independent samples

Simple linear regression phrasing conclusion

(response) increases on average by (b1) units for every one-unit increase in X1

When to use any of the different models (a) ANOVA (b) Linear Regression (c) Generalized Linear Models

(a) One-way, Two-way, Blocked, Repeated Measures (b) simple and multiple (c) logistic regression, poisson regression

t-test EDA

Box plot

Simple linear regression follow-up procedures

Box-cox for potential power transformations if the model appears to be non-linear or has non-constant variance

ANCOVA... fitted models

For X=0, fitted model is just beta_0 + beta_1*X For X=1, fitted model is (beta_0 + beta_2) + (beta_1 + beta_3)*X

two-way ANOVA null hypothesis

For factor 1:α1 = α2 = ... = αk = 0 For factor 2:β1 = β2 = ... = βk = 0

Cross-validation... what does the number of folds control?

How many groups we create from the data for testing sets

Blocked ANOVA follow up procedures

If the ANOVA f-test is significant for the factor of interest, Tukey or Dunnett multiple comparisons; NO multiple comparisons for blocking factor

Box-Cox

If the confidence interval for the optimal λ includes 1, then no transformation is needed. If it does not include 1, then a transformation is appropriate Peak represents optimal power transformation (x^2 or sqrt(X))

Use deviance to describe variability

If the model is a good fit, null deviance should be large compared to residual deviance Null deviance is basically total variation Residual deviance is basically error variation. Want error to be relatively small is a "good" model

two-way ANOVA follow up procedures

If there is a significant interaction, Tukey or Dunnett multiple comparisons for each factor at each level of the other factor; otherwise, treat same as one-way ANOVA

Paired t-test EDA

Look at profile of how each observation changes over time

Logistic Regression... relationships between p, odds, and log odds

Odds = p/(1-p) = P(Success)/P(Failure) Log Odds = log(odds)

Interpret model output from the chosen model

Same as any other linear model output at this point Review: f-test, t-test, coefficients

Paired t-test model form

Same as t-test (with slightly more complicated error structure)

How to address multicollinearity

Scaling predictors means we standardize them by centering and scaling - every predictor is represented by Z-scores instead

t-test phrasing conclusion

There is a significant difference in the mean (response) between (group 1) and (group 2)

Main concepts behind model validation

Training data: fit model (compute model coefficients) Test data: evaluate model (compute RMSE)

Multicollinearity

VIFS (>10 indicates a multicollinearity issue)

one-way ANOVA EDA

box plot

categorial predictors... understand how to interpret linear model coefficients for categorial predictors

comparing groups ex) average difference in the response variable between males and females, or average difference between age 21-30 and age 11-20

Choose models based on cross-validation output

check RMSE values

Best Subsets method

checks every combination of predictors. Step-wise selection only checks some of the models

Main limitation for best subsets?

computation is slow

Benefits of model validation?

eliminates the bias that comes from using the same data for both fitting and for evaluation

categorical predictors... understand coding of dummy variables (Predictors with 3+ factor levels)

have to choose a "reference" category, and set up k-1 dummy variables, where k is the number of factor levels

categorical predictors... understand coding of dummy variables (Binary Predictors)

just one dummy variable coded

Poisson Regression... model form

log(λ) = beta_0 + beta_1*X + ...

Logistic Regression... model form

logit(p) = beta_0 + beta_1*X + ...

Unusual Observations... high-leverage

look for natural gaps in the leverage (x-axis) - can also compute a threshold

Reduced f-tests... how to interpret the output

look for the f-stat, degrees of freedom, and the p-value in the output

BIC

lower is better a criterion for model selection among a finite set of models

AIC

lower is better an estimator of out-of-sample prediction error

ANCOVA... understand when you can and cannot interpret main effects (like two-way ANOVA)

main effects are the coefficients for the non-interaction terms. If there is a significant interaction, cannot interpret main effects.

Paired t-test follow up procedure

none

t-test follow-up procedures

none

Repeated measures ANOVA model form

same as one-way or two-way ANOVA, depending on context (with a slightly more complicated error structure)

Simple linear regression EDA

scatterplot

multiple linear regression EDA

scatterplot matrix

Reduced f-tests... variables are being tested based on code

should be straightforward: whatever variables are left out of the reduced model. We are NOT testing the variables present in both models

Blocked ANOVA EDA

should include block factor in the plot. For one-way, use linetypes or color, or boxplot with facets over blocks. Interaction plot for two-way blocked design, can facet over the blocks

Interpret model output from stepwise selection output

shows each iteration with AIC values as well as which variables were removed or added at each step

two-way ANOVA phrasing conclusions

similar to one-way ANOVA, but may be more complicated if we have significant interactions

Step-wise selection (forward)

start with empty model MUST SPECIFY SCOPE

Step-wise selection (backward)

start with full model

Model coefficients... Exponentiated intercept

the odds of (success) when all predictors are equal to 0 This may include dummy variables, must know which factor level is the reference category

full model f-test is a special case where...

the reduced model is an intercept only model, notation in R: response ~1

one-way ANOVA phrasing conculsions

there is a significant difference in the mean (response) between (factor level) and (other factor level)

Blocked ANOVA phrasing conclusion

there is a significant difference in the mean (response) between (factor level) and (other factor level), adjusting for (block factor)

Paired t-test phrasing conclusion

there is a significant difference in the mean (response) between (group 1) and (group 2)

Violations of the linearity assumption in a regression model may be addressed by...

transforming one or more of the predictor variables

multiple linear regression follow up procedures

trying different sets of predictors or transformations to make the model a better fit (checking adjusted R squared)

Paired t-test Assumptions

u1 = u2 OR the true mean difference D = 0

unusual observations... what can be done about them

verify that they are legitimate data entries. If so, should not remove them. Can use a dummy variable to represent a single observation Can fit model both ways, see if results are different

ANCOVA... test for significant interactions

we can use either the t-test in the model output or do a reduced f-test to test this. Leave interaction out of model if p-value is not less than 0.05 (prefer simpler model)

two-way ANOVA usage

when the model has two factors of interest

Blocked ANOVA usage

when there is a confounding factor that either has a known effect or we are not interested in its effect

Simple linear regression usage

when we are trying to estimate the relationship between one predictor and a response

Repeated measures ANOVA usage

when we have a within-subjects factor, or multiple measurements for each experimental unit

t-test null hypothesis

μ1 = μ2

Paired t-test Null hypothesis

μ1 = μ2 OR the true mean difference D = 0


Conjuntos de estudio relacionados

Quizzes Ch. 13, 14, 15, 16, & 17

View Set

Taylor's Clinical Nursing Skills

View Set

History: Ch. 23-5 The Impact of the New Deal

View Set

Ch. 20: The Foreign Corrupt Practices Act (FCPA)

View Set