Quantitative - 3 - Correlation, Linear Regression

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Intercept, ᾶ or alpha

You have the regression line, the line cutting through the scatter plot, the point where it touches y is called the _____________

Whole Unitary changes. e.g. 1,2,3

β / beta / slope The number of units that change, not x = 1.5 or 1.7 but ___________

h0 h1

β / beta / slope = 0 _____________. β / beta / slope ≠ 0 _____________.

The regression line

The ____________ _____ should literally 'fit' the data better. It will minimize the residuals.

null hypothesis.

When the slope β / beta / = 0 there is no association between the variables and that will be your ______________

Mean squares

A measure of average variability.

r squared value

Linear Regression: Example: Years of education vs Earnings. Output window: Correlation results. P-value. and the _____________

Earnings. dependent/ response / output / Y

Linear Regression: Example: Years of education vs Earnings. SPSS Select: Analyze Select: Regression Select: Linear The dependent variable here is _____________ because you are trying to predict earnings with education! The variable being predicted is the ________________ variable!

Percentage of Variance

Linear Regression: What we are looking here is the variance! If you just square the correlation co-efficient, that will give you automatically the _______________ accounted for by knowing the value of the other variable.,

expected mean

Linear regression captures the ______________ of: y / response / output / dependent at particular value of x and imposes any relationships.

r= -1

Pearson's correlation. The symbol for a perfect negative linear relationship is ____

r=1

Pearson's correlation. The symbol for a perfect positive linear relationship is ____

r=0

Pearson's correlation. We will be dealing almost exclusively with samples. If there is no relationship between X and Y r = ____

Weak Relationship. Moderate Relationship. Strong Relationship.

Pearson's correlation. perfect linear relationships do not exist. There are some rules of thumbs r= 0 - 0.3 ______________ r= 0.3 - 0.7 _____________ r= 0.7 - 0.9 _____________

Correlation. Yet independent of the scale, temperature vs energy usage

Refers to any of a broad class of statistical relationships involving dependence.

Dependence

Refers to any statistical relationship between two random variables or two sets of data.

cancel each other out

Regression line. It is squared because they would otherwise _________________ when added up.

Independent variable

Scatter plot. Customarily plotted along the horizontal axis / x axis

dependent variable

Scatter plot. Customarily plotted along the vertical axis / y axis

Predictive

Series of correlations. When you have a manageable list then you get __________ power.

β / beta / slope

So you start with x = 0, y = value. Now what is going to happen when x =1 or x = 2 . By how much is y changing for every change in x? The intercept / ᾶ / alpha, the point where it touches y (response / output / dependent variable), is observed so the only computation we have do is the ____________

βi

Standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in a standardized form. It is the change in the outcome associated with a one standard deviation change in the predictor.

explanatory variable / independent

The Simple Linear Regression Model. Simple because it has only one _______________ as opposed to multiple regression which has more than one.

independent/ explanatory / X / predictor

The assumption of Heteroscedasticity. The variance of your residuals in your Analysis are not consistent or not constant across your _____________ variable

cancel out

The average is the point where all of the differences below the mean will ___________ all of the differences above the mean.

homoscedasticity vs heteroscedesticty

Assumption of linear regression

foundation

Intercept / ᾶ / alpha. It forms the _______________ of the regression equation, meaning that we start with the intercept and then we build from there.

intercept. It is where it starts!

Linear regression formula; y = a+bx y is the response / output / dependent x is the predictor / explanatory / independent a is the _______________

predictor / explanatory / independent

Linear regression. For every x score: ___________________: a range of y scores are associated

significant p-value

Linear regression. You start off with two continuous variables and you do a correlation study, you get a significant correlation. With a _________ _______ you are then able to interpret the strength, the magnitude and the direction of the correlation co-efficient.

Correlation does not imply causation!

Mathematical correlation does not imply correlation in real life. This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations.

Regression line.

Mean and Regression line both summarize the central tendency. What lies above lies below! The mean is the average of a single variable. The _______________ is the average of two variables. (A two-dimensional average. This is what it's all about!) When you have two variables then the average is not going to work cause the average works for a single variable.

Single variable

Measures of central tendency, variability, and spread summarize a _________________ by providing important information about its distribution.

Correlation Co-efficient

Measures the strength of a relationship between two variables having a metric / interval / ratio scale and it ranges form negative 1 to positive 1

least error possible

Over two variables the regression line is where you will make least error. What you are trying to do here is not maximize the hits or the success but maximizing the ___________________

Statistics

Pearson and Linear regression. Interpretations and procedures are going to be different cause we are looking at different _______________ using the same variables.

outliers

Regression line. The whole point is that you are making the least error possible not on one score but also on the _____________

least

Regression line. You will be off, but you will be off in the ___________ possible way

default

SPSS Analyse Select: correlate Select: bi-variate option Move the variables that you want to correlate in the variable box Pearson correlation coefficient is the _________

P-value

SPSS Output window: Under the Pearson correlation number, one finds the ______________. Asymp Sig (2-tailed)

input / explanatory = x axis outcome / predicitve = y axis

SPSS. The first thing is to do a scatter plot. Graph menu. Select: Chart builder. Select: scatter/ dot plot Drag the variables into the axis. Explanatory variable _______________ Predictive variable ___________

Linear correlation

The correlation coefficient A value of 0 implies that there is no _______________________ between the variables.

Residual or error

The difference between the best fit-line and the observed value is called the __________________

SSE, sum of squares error

The goal of linear regression is to create a linear model that minimizes the _____________

zero

The intercept / ᾶ / alpha is the value of y when x equals __________

slope

The intercept is the value of y when x equals zero, then from there on you move to β / beta or the ______________ of the line.

regression analysis, including the analysis of variance.

The possible existence of heteroscedasticity is a major concern in the application of ______________________ because the presence of heteroscedasticity can invalidate statistical tests of significance.

Coefficient of Determination

The proportion of variance in one variable explained by a second variable. It is Pearson's correlation coefficient squared.

Minimum

The regression line is the line where the sum of square errors on the data point to the means is at a ______________

Sum of squares

The residuals are squared and then added together to generate ________________ residuals / error a.k.a. SSE

Population. Sample

The symbol for Pearson's correlation is "ρ" when it is measured in the _____________ "r" when it is measured in a _____________

Predicted value

The value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.

independent/ explanatory / X

The variable that is the PREDICTOR, not predicted, is the __________________

Correlations

This is the tool that gives power for psychologist to give predictions!

zero

Three data sets: 160,170,180. Mean = 170 differences from the mean = positive / negative 10. If you take into consideration the sign you would end up in your computation with _______ differences from the mean.

Bi-variate analysis

Two variables and certain aspects of the relationship between variables. To know and predict a value for the dependent variable if we know a case's value of the independent variable e.g. correlation. Can be helpful in testing simple hypotheses of association and relationship.

Mean

Uni-variate. With only one variable, and no other information, the best prediction for the next measurement is the _________ of the sample itself. If it is two variables the best way to predict is to use the Regression line.

Auto-correlation

When the residuals of two observations in a regression model are correlated.

homoscedasticity

You assume __________________ and you also assume that the sample of y scores at every x is normally distributed at every x

You are able to predict!

You pile up demographics variables, you have other variables that have been proven to be correlated with the variable you are interested in. You add them all up and all of a sudden you are accounting for a larger percentile of the other variable.

β / beta / slope. Intercept / ᾶ / alpha.

_________ is the expected increase or decrease in the dependent variable *Y* as a function of an increase or decease in the independent variable *X*. _________ is the expected value of Y i.e. dependent variable when X independent variable is zero

Co-variance

__________ has no upper or lower bound and its size is dependent on the scale of the variables.

Co-variance. Correlation.

___________ provides the direction, while the ___________ provides the direction and strength of the linear relationship between two variables.

Correlation

___________is always between -1 and +1. and its scale is independent of the SCALE of the variables themselves. It allows us to compare variables that are measured in different ways. e.g. temperature in degrees celsius, energy usage in kilowatts

Goodness of fit

Summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals. Usually based on how well the data predicted by the model correspond to the data that were actually collected.

Linear regression

A development of the correlation technique

Cross-validation

Assessing the accuracy of a model across different samples

Quadratic. Exponential. Polynomial

Data sometimes does follow non-linear relationship.

Experimental research

A causal relationship is when one variable causes a change in another variable. These types of relationships are investigated only by ____________________

Strong positive relationship

A correlation close to 1 indicates a __________________

strong negative relationship

A correlation close to negative 1 indicates a ________________

no / weak relationship

A correlation close to zero indicates __________________

scatter plot

A good way to portray a bi-variate relationship is with a ______________. We can learn much more by displaying the bi-variate data in a graphical form that maintains the PAIRING. Correlation. What pattern does it exhibit? Before going crazy computing correlations look at a _________________ of your data.

negative 1 and positive 1! (the number with two stars *)

In SPSS output window, the first thing we have to do is look at the Pearson correlation number. REMEMBER it's always a number between:

Co-variance

A measure of the 'average' relationship between two variables. Provides the DIRECTION, positive, negative, of the linear relationship between two variables. One of a family of statistical measures used to analyze the linear relationship between two variables. How they change or don't change together. The concept here is the direction or the sign i.e. -ve or +ve . We are not interested in the strength!

Cross-product deviations

A measure of the 'total' relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean.

Deleted residual

A measure of the influence of a particular case of data. It is the difference between the adjusted predicted value for a case and the original observed value for that case.

Co-variance ratio - CVR

A measure of whether a case influences the variance of the parameters in a regression model.

cancel each other out!

A single variable. If you add up everyone who stands above the mean and add up everyone who stands below the mean, then the differences are going to __________________ that is why you have an average.

Biserial correlation

A standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous.

Pearson's correlation coefficient

A standardized measure of the strength of relationship between two variables.

F-ratio

A test statistic with a known probability distribution *the F-distribution*. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments.

Predictor variable

A variable that is used to try to predict values of another variable known as an outcome variable.

Outcome variable

A variable whose values we are trying to predict from one or more predictor variables.

Pearson Correlation. 2 continuous variables

A very basic bi-variate analysis. One method of estimating between two variables that are scored on a interval or ratio level e.g. Age vs income

Confounding variable

Because correlation can arise from the presence of a ___________ _________ rather than from direct causation, it is often said that "Correlation does not imply causation".

X, Y

Bi-variate analysis is one of the simplest forms of the quantitative statistical analysis. It involves the analysis of two variables, often denoted as _____ & _____ for the purpose of determining the empirical relationship between them.

Pearson Correlation. Linear Regression

Cannot infer causality! Really and truly these techniques are very associated. The graphs are going to be the same.

Negative co-variance

Co-variance. Indicates a decreasing relationship. If one goes up the other goes down.

Positive co-variance

Co-variance. Indicates a direct or increasing linear relationship. If one goes up the other goes up.

Spurious correlation.

Correlation is not causation! two completely unrelated factors that may have a mathematical correlation but have NO sensible correlation in real-life e.g dog barks vs. moon phase.

Statistically significant

Correlation strength does not necessarily mean the correlation is __________________

Linear relationships

Correlation. Only applicable to ___________________. NOT curved patterns, example: temperature goes up, then goes down then goes up again in different season you end up with a curved pattern.

Specification error

In a regression model where A is regressed on B but C is actually the true causal factor for A, this misleading choice of independent variable B instead of C, is called

Homoscedasticity. Heteroscedasticity.

In regression analysis , ____________ means a situation in which the variance of the dependent/response/output variable is the SAME for all the data. In regression analysis , ____________ means a situation in which the variance of the dependent/response/output variable VARIES across the data

Pearson correlation coefficient

lower case "r". We use this symbol to indicate the sample co-efficient. r is called the _________________.

strength, magnitude

Example: If there is a true correlation between Drinking red bull and cognitive performance, and I know the ________ and ________ of the correlation. I am going to be able to say drinking 4 red bull you should be getting something like a B or an A.

Stress = independent/ predictor / explanatory Colds = dependent/ response / output

Example: X axis: stress. Y axis: colds. Which is the dependent and independent variable? X axis: Stress = independent/ predictor / explanatory. Y axis: Colds = dependent/ response / output. Compute Y=a+bx and you can predict the amount of colds for each stress level.

Homoscedasticity

Facilitates analysis because most methods are based on the assumption of equal variance. e.g. linear regression or ANOVA

variables themselves

For the Co-variance you are going to get a number that is related to the scale or the measure of the _________________

italicized r. Italicized r squared.

For the Pearson correlation co-efficient the statistical notation is an _____________. For the linear regression is an ________________.

Co-variance. Correlation. Linear regression.

How do variables behave as a pair? These are all very closely related!

Intercept / ᾶ / alpha.

If someone had stress, *independent variable*, x = 0, how many colds * dependent variable*, y = ? would we EXPECT them to experience?

"response variable"

If the independent variable is referred to as an "explanatory variable" then the term _________________ is preferred for the dependent variable.

Reject the null hypothesis i.e. H0

If we produce an r value that is equivalent to a p value that is less than 0.05 then we will be able to _______________


Ensembles d'études connexes

Abnormal Psych Final Exam Review

View Set

7.3 indicators and effects of climate change

View Set

HIS: Lesson 4 (Overview on Health Informatics)

View Set

ISDS 3115 Test 1, Ch 1 Concept Questions

View Set

Chapter 9: Chronic Illness and Disability

View Set

industrial/ organizational psych

View Set