Correlation & Regression

Ace your homework & exams now with Quizwiz!

what is a Point-biserial correlation?

- between 1 continuous variable and 1 binary (dichotomous) variable [binary variable is necessarily nominal (dichotomous)] - same general calculation as Pearson correlation coefficient in SPSS - continuous variable must therefore satisfy criteria for parametric analysis

how is 'best fit' line estimated?

- it is often a least squares line that minimizes the squared residuals - i.e. for each datum: - determine distance between the data point & line - then square each distance, and sum squares (another 'sum of squares') - values for slope and intercept are chosen that minimize this sum

what is a beta value when dealing with regression?

Beta value (ß)- measures degree to which predictor variable affects dependent variable - beta unit = sd - if beta = 2.5: 1 sd in predictor = 2.5 sd in dep. variable regression coefficient (ß)

How might one easily gauge the linearity of the regression?

By plotting the residuals: - residuals are the difference between each datum and the regression line - if the plot of residuals is relatively 'flat' and does not deviate from the line, the regression is linear - studies formulating estimates from a regression should include a plot of residuals and a test of linearity

what is a correlation coefficient? what assumptions must be made?

Correlation coefficient: quantifies the extent to which one variable changes with respect to another variable -Pearson product-moment correlation: for continuous data -other types of data (ordinal, nominal) require computation of a different type of correlation coefficient (e.g. Spearman, biserial, etc.) The derivation of the Pearson correlation coefficient and the basic regression are equivalent but the correlation and regression serve different purposes The data must meet the same parametric criteria listed for t-tests and the ANOVA - normality, homoscedasticity, etc. - correlation is esp. sensitive to outliers

why should correlations and regressions contain CIs? what is r squared?

Correlations and regressions should include CI lines. -here the lines appear to be curved bc they describe all possible straight lines - If CI lines are far apart, may indicate that the study did not include sufficient subjects or observations. Coefficient of determination (r squared) - proportion of variance shared by both variables Typical method for reporting correlation: -->

what is simple linear regression?

In a simple linear regression, there is 1 dependent & 1 independent variable - the regression calculation permits estimation of the dependent variable when the independent variable is known. e.g. estimating drug dosage based on weight Regression analysis generates a formula that describes the relationship between the variables - the variable that is predicted (dependent) has also been termed the 'outcome' variable or regressand the independent variable has been termed 'predictor' or 'regressor' This formula can be used to predict values of the dependent variable (y) if x is known.

what is Interpolation: ? what is Extrapolation?

Interpolation: prediction from regression within observed range Extrapolation: prediction from regression outside observed range - not procedurally appropriate

what is a Kendall rank correlation (tau)? When is this used?

Kendall rank correlation (tau) - Kendall correlation can be used as alternative to Spearman rank correlation - generally very similar to Spearman coefficient. possibly preferable to Spearman when sample sizes are very small and/or there are many tied ranks

how is correlation calculated?

Like variance, SD & ANOVA, the correlation is calculated from squared average deviations from the mean (sum of squares) - in this case, the deviations are the distances of each datum from the line of best fit (residuals)

what do linear regressions in clinical analysis assume?

Linear regressions in clinical papers often assume that the relationship between the variables is linear- thus, entire analysis may be meaningless if this is not true The results of regression analyses should not be 'extrapolated' beyond the ranges specified in the study- or to other groups or populations

what are types of non-parametric correlations?

Non-parametric correlations - The most common non-parametric correlations that you are likely to encounter are: - Kendall rank correlation - Spearman correlation - Point-Biserial correlation As this relationship is not strictly linear, a non-parametric correlation may be more appropriate. A ranked correlation in this case will yield a higher coefficient. - monotonic relationship (implies consistent general direction)

how can a Pearson's correlation coefficient be interpreted?

Pearson product-moment correlation coefficient (r): - varies from -1 to +1 (usually 2 decimal places): - +1 implies a perfect direct relationship between the variables (the 2 variables ^ in a proportionate manner, such that when plotted, the slope is +1 and ascending) - -1 implies a perfect inverse relationship between the variables (slope is -1 and descending) - coefficient of 0 indicates theres no relationship between the variables may be helpful to describe correlations as direct or inverse rather than +/- bc absolute value is used with 'direct' or 'inverse' - strength of correlation is quantified by various scales - e.g. 'Cohen's standard', that vary by discipline - in general, meaningful correlations require realistic amounts of variability in the data Strong correlations in clinical research do not usually exceed .75

how is a regression different than a correlation?

Regressions and correlations use the same statistical approach for somewhat different purposes. -A correlation= simple measure of association between variables. -A regression uses relationship between variables to better understand how 1 variable influences another. In fact, if 1 understands this relationship sufficiently, regression analysis permits prediction of the change in 1 variable with respect to another. - i.e. change in independent variable may be used to predict changes in a dependent variable (drug dosage and response (effect) or how cognition changes with aging (age is used to predict changes in memory)

should non-linearly related data be labeled as correlated?

Sometimes, relationship between variables is not linear. -rather, relationship varies in a some other systematic way (age and memory- poorer in neonate and elderly than in the 'average' person). Such relationships may be curvilinear or require more complex math for adequate characterization. - technically, non-linear relationships should NOT be labeled as correlations Because an r calculation would not be productive for non-linear data, it is often useful to graph the data before attempting a correlation or regression analysis. - via spreadsheets and statistical software

what is a spearman rank correlation?

Spearman rank correlation (rS ) (Greek letter ρ, pronounced rho) -for either ordinal variables or continuous data that is not normally distributed -variables may be ordinal, interval or ratio. data must be ranked - also quantifies 'monotonic' relationship. A directional, but nonlinear relationship between variables - correlation is a measure of strength and direction - monotonic relation: direct or inverse, but not strictly linear

Coefficient of Determination (R squared), what is it? coefficient of nondetermination, what is it?

The correlation coefficient (r) and the regression coefficient (ß) are equivalent in a simple linear regression- for various reasons, r is not a satisfactory predictor for a multiple regression Coefficient of Determination (R squared)- proportion of variance in dependent variable that is predicted from independent variable - square of correlation in linear regression, ranging from 0 to 1 - If R squared = 0, dependent variable cannot be predicted from indep. variable - If R squared = 0.1, 10% of variance in Y is predicted from X, etc.- if r = 0.7, r squared = 0.49 (about half of the variability in X is attributable to Y) - implying that about half of the variability is attributable to something else- the 'something else' is termed the coefficient of nondetermination (1 - r squared)

which statements are accurate? Which statements are accurate? a) Systolic and diastolic blood pressures were assumed to be linearly related to each of apnoea-hypopnoea index, age, and neck circumference b) It can be concluded that the effect of the apnoea-hypopnoea index was independently associated with systolic and diastolic blood pressure in patients with apnoea not taking antihypertensive drugs c) The results of the analyses can be extrapolated outside the observed range of values for apnoeahypopnoea index, age, and neck circumference d) The results of the analyses can be generalised to all patients referred to the sleep clinic with suspected sleep apnoea

a. (True) b. true c. false d. false

describe the following types of regression? -multiple linear regression -ordinal regression -logistic regression -multinomial regression

multiple linear regression: 1 dep. variable + 2 or more indep. variables -How do weight and age together affect response to a drug? ordinal regression: 1 dep. ordinal variable + 1+ indep. variable(s) - How do therapeutic dosage & gender modify the patient's pain? logistic regression: 1 dep. variable (dichotomous) + >1 indep. variable(s) - How do gender and blood pressure affect mortality? multinomial regression: 1 nominal dep. variable + 1+ indep. variable(s) - Does the presence of the gene affect cancer risk? **Multiple regressions involve 1 dependent variable. Technically, multivariate regressions may involve multiple dependent variables. Be leary of multiple regressions with many variables.- prone to 'over-fitting' (model is too complex for data) - findings are unlikely to be generalizable - error is reduced by increasing sample size - false positives (from multiple comparisons) - simple models are usually preferable Each estimate requires >12 observations - some designs may require many more


Related study sets

Chapter 3 Lesson 1: Indo-European

View Set

Chapter 3: "Customer Relationship Management Strategies"

View Set

Intro to Communications Final exam

View Set