Introduction to multiple linear regression

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What are the two ways in which multi-collinearility may be defined?

- A strong correlation among predictors - A weak correlation of one predictor with multiple predictors

Assumptions of multiple linear regression RE the dependent and independent variables

- Dependent = continuous and qualitative - Independent = qualitative or categorical

In multiple linear regression, residuals must me... (?) (3)

- Independent - Random - Normally distributed

If there was no colinearility between predictors, what would this mean, and what would the results of a multiple regression be equal to?

- That each predictor has unique variability only - Running multiple simple linear regressions for each of the predictors would output the same information as a multiple linear regression

Rather than a linear linear regression produces what?

A 3D plane

The equation for multiple regression is what?

An extension of the equation of a straight line (i.e. of the equation of the simple linear regression) by adding another term

Why is multiple linear regression used in the first place if we get colinearility?

As often factors are related

In multiple linear regression, why shouldn't there be a high degree of colinearility?

As this makes you unable to assess the contribution of each predictor

The size of the sample considered in a multiple linear regression must...(?)

Be at least x10 the number of predictors that have been tested in total (even if the final model considers less of these predictors)

In general, multicolinerality makes it difficult to...(?)

Determine the effects of a predictor

Why does R-squared become limited if more predictors keep getting added to the model?

Even if the predictor doesn't count for much unique variability, it adds another degree of freedom to the model, therefore although the F-ratio stays relatively the same, the significance of this ratio will decrease

Outline what multiple linear regression does

Fits data to a model that define y as a function of 2 or more variables

Even though the VIF for a factor may be less than 10, why may there still be colinearility?

If different predictors of the model have an average VIF of around 10 - this would still indicate overlap by different factors accounting for variability

What value/s of TF are problematic i.e. indicate a potential degree of collinearility?

Less than 0.1

What value/s of VIF are problematic i.e. indicate a potential degree of collinearility?

Less than 1 OR greater than 10

By taking into account more than one predictor variable in a multiple regression compared to a simple linear regression, what does this allow you to do?

More reliably judge what the effect will be

Performance of a multiple linear regression is a good alternative to what?

Performance of separate correlations

What becomes limited each time a predictor is added to a model?

R-squared

Comparing two models, in which the first only considers height as a variable, and the second that considers not only height but also weight and sex, what does an improvement of the first model by the second suggest?

That height is an unstable coefficient

What do we mean if we say that a VIF or TF value is problematic?

That it indicates a potential degree of colinearility

In multiple linear regression, the relationship should be linear with non-zero variance. What is meant by the latter point?

That there is some variability within the model, otherwise there would be no need for the model

In a venn diagram that represents the variability in the dataset, what do the smaller circles that lie within the largest circle represent?

The amount of variability accounted for by a particular predictor

Homoscedasicity is an assumption of multiple linear regression. Define it

The assumption that variance around the regression line is the same for all values of the predictor variable

What does it mean when we say that the standard errors of the B coefficient increase as colinearility increases?

The b coefficient becomes unstable, thus causes us to become less confident about those effects

Define what is meant by 'multicolllinearity'

The phenomenon in which one predictor variable in a multiple linear regression model can be linearly predicted from others with a substantial degree of accuracy

What increases as colinearility increases?

The standard errors of the B coefficient increases, i.e. the coefficient becomes unstable, causing us to become less confident about those effects

In a venn diagram that represents the variability in the dataset, what does the largest circle represent (it may enclose smaller circles)?

The total amount of variability in the dataset

In addition to an increase in the standard errors of the b and limiting the R-squared value with addition of predictors to a model, what else becomes a problem?

There is a lack of clarity in the contribution of each individual predictor

If there is likely to be a degree of colinearility within a model, what can you do?

There is very little that statistics can do to help this, however, if possible, it can be sensible to combine factors that are similar to a single factor e.g. height and weight combined into BMI

What is TV?

Tolerance factor - the reciprocal of the VIF (variance inflation factor)

In a venn diagram that represents the variability in the dataset, what does overlap of the smaller circles present?

Variability explained by these factors combined

What does VIF stand for?

Variance inflation factor

Comparing two models, in which the first only considers height as a variable, and the second that considers not only height but also weight and sex. There is an improvement of the first model by the second, suggesting that height is an unstable coefficient. We could remove height as a predictor (ensuring that we note this in the final reporting) and then re-run the model. However, what argument might be posed against this?

We cannot be sure that height does have an effect at all

Comparing two models, in which the first only considers height as a variable, and the second that considers not only height but also weight and sex. There is an improvement of the first model by the second, suggesting that height is an unstable coefficient. Should we exclude height from the data set?

We could do, and if we chose to do this, we must report that height is not a significant predictor and re-run the model


Ensembles d'études connexes

compromise of 1850, daniel webster & john c. calhoun

View Set

ACCT 201 B - Connect chp. 5 reading

View Set

intermolecular forces and phase change

View Set

Onychomycosis (or tinea unguium)

View Set

Mao's rise to power' and the 'Chinese civil war 1911 - 1949

View Set