Quantitative - 3 - Correlation, Linear Regression
Intercept, ᾶ or alpha
You have the regression line, the line cutting through the scatter plot, the point where it touches y is called the _____________
Whole Unitary changes. e.g. 1,2,3
β / beta / slope The number of units that change, not x = 1.5 or 1.7 but ___________
h0 h1
β / beta / slope = 0 _____________. β / beta / slope ≠ 0 _____________.
The regression line
The ____________ _____ should literally 'fit' the data better. It will minimize the residuals.
null hypothesis.
When the slope β / beta / = 0 there is no association between the variables and that will be your ______________
Mean squares
A measure of average variability.
r squared value
Linear Regression: Example: Years of education vs Earnings. Output window: Correlation results. P-value. and the _____________
Earnings. dependent/ response / output / Y
Linear Regression: Example: Years of education vs Earnings. SPSS Select: Analyze Select: Regression Select: Linear The dependent variable here is _____________ because you are trying to predict earnings with education! The variable being predicted is the ________________ variable!
Percentage of Variance
Linear Regression: What we are looking here is the variance! If you just square the correlation co-efficient, that will give you automatically the _______________ accounted for by knowing the value of the other variable.,
expected mean
Linear regression captures the ______________ of: y / response / output / dependent at particular value of x and imposes any relationships.
r= -1
Pearson's correlation. The symbol for a perfect negative linear relationship is ____
r=1
Pearson's correlation. The symbol for a perfect positive linear relationship is ____
r=0
Pearson's correlation. We will be dealing almost exclusively with samples. If there is no relationship between X and Y r = ____
Weak Relationship. Moderate Relationship. Strong Relationship.
Pearson's correlation. perfect linear relationships do not exist. There are some rules of thumbs r= 0 - 0.3 ______________ r= 0.3 - 0.7 _____________ r= 0.7 - 0.9 _____________
Correlation. Yet independent of the scale, temperature vs energy usage
Refers to any of a broad class of statistical relationships involving dependence.
Dependence
Refers to any statistical relationship between two random variables or two sets of data.
cancel each other out
Regression line. It is squared because they would otherwise _________________ when added up.
Independent variable
Scatter plot. Customarily plotted along the horizontal axis / x axis
dependent variable
Scatter plot. Customarily plotted along the vertical axis / y axis
Predictive
Series of correlations. When you have a manageable list then you get __________ power.
β / beta / slope
So you start with x = 0, y = value. Now what is going to happen when x =1 or x = 2 . By how much is y changing for every change in x? The intercept / ᾶ / alpha, the point where it touches y (response / output / dependent variable), is observed so the only computation we have do is the ____________
βi
Standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in a standardized form. It is the change in the outcome associated with a one standard deviation change in the predictor.
explanatory variable / independent
The Simple Linear Regression Model. Simple because it has only one _______________ as opposed to multiple regression which has more than one.
independent/ explanatory / X / predictor
The assumption of Heteroscedasticity. The variance of your residuals in your Analysis are not consistent or not constant across your _____________ variable
cancel out
The average is the point where all of the differences below the mean will ___________ all of the differences above the mean.
homoscedasticity vs heteroscedesticty
Assumption of linear regression
foundation
Intercept / ᾶ / alpha. It forms the _______________ of the regression equation, meaning that we start with the intercept and then we build from there.
intercept. It is where it starts!
Linear regression formula; y = a+bx y is the response / output / dependent x is the predictor / explanatory / independent a is the _______________
predictor / explanatory / independent
Linear regression. For every x score: ___________________: a range of y scores are associated
significant p-value
Linear regression. You start off with two continuous variables and you do a correlation study, you get a significant correlation. With a _________ _______ you are then able to interpret the strength, the magnitude and the direction of the correlation co-efficient.
Correlation does not imply causation!
Mathematical correlation does not imply correlation in real life. This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations.
Regression line.
Mean and Regression line both summarize the central tendency. What lies above lies below! The mean is the average of a single variable. The _______________ is the average of two variables. (A two-dimensional average. This is what it's all about!) When you have two variables then the average is not going to work cause the average works for a single variable.
Single variable
Measures of central tendency, variability, and spread summarize a _________________ by providing important information about its distribution.
Correlation Co-efficient
Measures the strength of a relationship between two variables having a metric / interval / ratio scale and it ranges form negative 1 to positive 1
least error possible
Over two variables the regression line is where you will make least error. What you are trying to do here is not maximize the hits or the success but maximizing the ___________________
Statistics
Pearson and Linear regression. Interpretations and procedures are going to be different cause we are looking at different _______________ using the same variables.
outliers
Regression line. The whole point is that you are making the least error possible not on one score but also on the _____________
least
Regression line. You will be off, but you will be off in the ___________ possible way
default
SPSS Analyse Select: correlate Select: bi-variate option Move the variables that you want to correlate in the variable box Pearson correlation coefficient is the _________
P-value
SPSS Output window: Under the Pearson correlation number, one finds the ______________. Asymp Sig (2-tailed)
input / explanatory = x axis outcome / predicitve = y axis
SPSS. The first thing is to do a scatter plot. Graph menu. Select: Chart builder. Select: scatter/ dot plot Drag the variables into the axis. Explanatory variable _______________ Predictive variable ___________
Linear correlation
The correlation coefficient A value of 0 implies that there is no _______________________ between the variables.
Residual or error
The difference between the best fit-line and the observed value is called the __________________
SSE, sum of squares error
The goal of linear regression is to create a linear model that minimizes the _____________
zero
The intercept / ᾶ / alpha is the value of y when x equals __________
slope
The intercept is the value of y when x equals zero, then from there on you move to β / beta or the ______________ of the line.
regression analysis, including the analysis of variance.
The possible existence of heteroscedasticity is a major concern in the application of ______________________ because the presence of heteroscedasticity can invalidate statistical tests of significance.
Coefficient of Determination
The proportion of variance in one variable explained by a second variable. It is Pearson's correlation coefficient squared.
Minimum
The regression line is the line where the sum of square errors on the data point to the means is at a ______________
Sum of squares
The residuals are squared and then added together to generate ________________ residuals / error a.k.a. SSE
Population. Sample
The symbol for Pearson's correlation is "ρ" when it is measured in the _____________ "r" when it is measured in a _____________
Predicted value
The value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.
independent/ explanatory / X
The variable that is the PREDICTOR, not predicted, is the __________________
Correlations
This is the tool that gives power for psychologist to give predictions!
zero
Three data sets: 160,170,180. Mean = 170 differences from the mean = positive / negative 10. If you take into consideration the sign you would end up in your computation with _______ differences from the mean.
Bi-variate analysis
Two variables and certain aspects of the relationship between variables. To know and predict a value for the dependent variable if we know a case's value of the independent variable e.g. correlation. Can be helpful in testing simple hypotheses of association and relationship.
Mean
Uni-variate. With only one variable, and no other information, the best prediction for the next measurement is the _________ of the sample itself. If it is two variables the best way to predict is to use the Regression line.
Auto-correlation
When the residuals of two observations in a regression model are correlated.
homoscedasticity
You assume __________________ and you also assume that the sample of y scores at every x is normally distributed at every x
You are able to predict!
You pile up demographics variables, you have other variables that have been proven to be correlated with the variable you are interested in. You add them all up and all of a sudden you are accounting for a larger percentile of the other variable.
β / beta / slope. Intercept / ᾶ / alpha.
_________ is the expected increase or decrease in the dependent variable *Y* as a function of an increase or decease in the independent variable *X*. _________ is the expected value of Y i.e. dependent variable when X independent variable is zero
Co-variance
__________ has no upper or lower bound and its size is dependent on the scale of the variables.
Co-variance. Correlation.
___________ provides the direction, while the ___________ provides the direction and strength of the linear relationship between two variables.
Correlation
___________is always between -1 and +1. and its scale is independent of the SCALE of the variables themselves. It allows us to compare variables that are measured in different ways. e.g. temperature in degrees celsius, energy usage in kilowatts
Goodness of fit
Summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals. Usually based on how well the data predicted by the model correspond to the data that were actually collected.
Linear regression
A development of the correlation technique
Cross-validation
Assessing the accuracy of a model across different samples
Quadratic. Exponential. Polynomial
Data sometimes does follow non-linear relationship.
Experimental research
A causal relationship is when one variable causes a change in another variable. These types of relationships are investigated only by ____________________
Strong positive relationship
A correlation close to 1 indicates a __________________
strong negative relationship
A correlation close to negative 1 indicates a ________________
no / weak relationship
A correlation close to zero indicates __________________
scatter plot
A good way to portray a bi-variate relationship is with a ______________. We can learn much more by displaying the bi-variate data in a graphical form that maintains the PAIRING. Correlation. What pattern does it exhibit? Before going crazy computing correlations look at a _________________ of your data.
negative 1 and positive 1! (the number with two stars *)
In SPSS output window, the first thing we have to do is look at the Pearson correlation number. REMEMBER it's always a number between:
Co-variance
A measure of the 'average' relationship between two variables. Provides the DIRECTION, positive, negative, of the linear relationship between two variables. One of a family of statistical measures used to analyze the linear relationship between two variables. How they change or don't change together. The concept here is the direction or the sign i.e. -ve or +ve . We are not interested in the strength!
Cross-product deviations
A measure of the 'total' relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean.
Deleted residual
A measure of the influence of a particular case of data. It is the difference between the adjusted predicted value for a case and the original observed value for that case.
Co-variance ratio - CVR
A measure of whether a case influences the variance of the parameters in a regression model.
cancel each other out!
A single variable. If you add up everyone who stands above the mean and add up everyone who stands below the mean, then the differences are going to __________________ that is why you have an average.
Biserial correlation
A standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous.
Pearson's correlation coefficient
A standardized measure of the strength of relationship between two variables.
F-ratio
A test statistic with a known probability distribution *the F-distribution*. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments.
Predictor variable
A variable that is used to try to predict values of another variable known as an outcome variable.
Outcome variable
A variable whose values we are trying to predict from one or more predictor variables.
Pearson Correlation. 2 continuous variables
A very basic bi-variate analysis. One method of estimating between two variables that are scored on a interval or ratio level e.g. Age vs income
Confounding variable
Because correlation can arise from the presence of a ___________ _________ rather than from direct causation, it is often said that "Correlation does not imply causation".
X, Y
Bi-variate analysis is one of the simplest forms of the quantitative statistical analysis. It involves the analysis of two variables, often denoted as _____ & _____ for the purpose of determining the empirical relationship between them.
Pearson Correlation. Linear Regression
Cannot infer causality! Really and truly these techniques are very associated. The graphs are going to be the same.
Negative co-variance
Co-variance. Indicates a decreasing relationship. If one goes up the other goes down.
Positive co-variance
Co-variance. Indicates a direct or increasing linear relationship. If one goes up the other goes up.
Spurious correlation.
Correlation is not causation! two completely unrelated factors that may have a mathematical correlation but have NO sensible correlation in real-life e.g dog barks vs. moon phase.
Statistically significant
Correlation strength does not necessarily mean the correlation is __________________
Linear relationships
Correlation. Only applicable to ___________________. NOT curved patterns, example: temperature goes up, then goes down then goes up again in different season you end up with a curved pattern.
Specification error
In a regression model where A is regressed on B but C is actually the true causal factor for A, this misleading choice of independent variable B instead of C, is called
Homoscedasticity. Heteroscedasticity.
In regression analysis , ____________ means a situation in which the variance of the dependent/response/output variable is the SAME for all the data. In regression analysis , ____________ means a situation in which the variance of the dependent/response/output variable VARIES across the data
Pearson correlation coefficient
lower case "r". We use this symbol to indicate the sample co-efficient. r is called the _________________.
strength, magnitude
Example: If there is a true correlation between Drinking red bull and cognitive performance, and I know the ________ and ________ of the correlation. I am going to be able to say drinking 4 red bull you should be getting something like a B or an A.
Stress = independent/ predictor / explanatory Colds = dependent/ response / output
Example: X axis: stress. Y axis: colds. Which is the dependent and independent variable? X axis: Stress = independent/ predictor / explanatory. Y axis: Colds = dependent/ response / output. Compute Y=a+bx and you can predict the amount of colds for each stress level.
Homoscedasticity
Facilitates analysis because most methods are based on the assumption of equal variance. e.g. linear regression or ANOVA
variables themselves
For the Co-variance you are going to get a number that is related to the scale or the measure of the _________________
italicized r. Italicized r squared.
For the Pearson correlation co-efficient the statistical notation is an _____________. For the linear regression is an ________________.
Co-variance. Correlation. Linear regression.
How do variables behave as a pair? These are all very closely related!
Intercept / ᾶ / alpha.
If someone had stress, *independent variable*, x = 0, how many colds * dependent variable*, y = ? would we EXPECT them to experience?
"response variable"
If the independent variable is referred to as an "explanatory variable" then the term _________________ is preferred for the dependent variable.
Reject the null hypothesis i.e. H0
If we produce an r value that is equivalent to a p value that is less than 0.05 then we will be able to _______________