What is prediction? What is a regression equation
Maybe we can use an association
(correlation) to make a prediction.
Psychologists use multiple regressions for prediction
Because the study is complex, and it's unlikely that one variable is going to predict an outcome.
Want to use our existing data to solve for the values of a and b that will produce the best fitting linear function
Defined in terms of errors of prediction (Y - Y' deviations
In order to actually do calculations need a sample in which both pieces of information available
Derive associations for prediction in a new sample
slope
How much the line goes up for every unit you go across
How do we describe the slope?
In this case the slope is the "bX" part of the equation.
Prediction only happens when we have a temporal asymmetry.
So we can take for example a child's score on memory and language at one time point and predict what their score would be at another time point in the future.
General case for formula of a straight line: Y = a + bX.
This formula is more like Y = bX.
Prediction equation in raw score form
Want to use our existing data to solve for the values of a and b that will produce the best fitting linear function
Equation of a straight line:
Y = a + bX
Can convert Zy' to Y' using:
Y' = My + Zy'.sy ( the standard deviation)
if correlation equals one equals
Zy = Zx
Z score describes where
an individual case falls in a distribution.
We are trying to work out Y
based on X
when we plot the data its always centered around zero
because when we are talking about z scores the mean is always zero.
if there is dependent data missing on the second tests and we know the score of the same variable on the first test
can we make a better prediction than if we used the convention Y'equals the mean from first test?
Y = dependent
criterion variable
residuals
deviations
Includes the two extreme cases, r = 0 and r = 1. In between it gives a prediction
for Y that is based on X but depends on the size of r( strength of the correlation)
Predict scores on variable Y
from scores on variable X
the regression line when we are talking about z scores always
goes through zero and has a slope of R
"Best" prediction is class mean Y'
is M(y) from the sample
why is the mean of X the best predictor for Y?
its the most conservative predictor. we are less likely to make a giant error.
Simple linear regression is
just a simple straight line through the data.
we work out the missing data point to
keep our sample size the same.
Positive correlation
low score on X low score on Y
if correlation equals zero equals
no Zx
slop is the difference
of how many units the line goes across.
Y depends
on X
If rXY is neither 0 or 1, it provides some information to help us predict Y, but the relationship ‚
perfect so prediction needs to be conservative
Notation Y ' o r Y ˆ =
predicted score on Y)
As r decreases, the slope decreases, until finally (if r = 0),
prediction line falls along the X axis
X = independent
predictor variable
the regression line when we are talking about z scores always has a slope of
r (correlation coefficient)
ZY' (predicted Z value of Y scores) =
r Zx ( correlation between X and Y multiplied by the z score and X)
b =
slope
X variable is
something that has happened in the past
Simple regression is just one X variable
that predicts the Y variable
Y intercept the point at which
the line intercepts the Y axis.
Y'
the predicted score of Y
The amount by which our prediction differs from zero will depend on
the strength of r
In order to maximise the prediction it's important to know about
the strength of the relationship between X and Y.
we need to convert missing data
to a z score
Y = a + bX in this equation the "A" is referred
to as a constant its where X equals zero.
Y = 3 + 2X in this equation the "3" refers
to the Y intercept
Multiple regression:
use more than one X variable to predict a Y variable in the future.
if we have a perfect correlation between our two tests
we just use he most conservative predictor ( mean)
if we are going to work out missing data relative to the mean and correlation score from first sample to second sample
we need to think about the Z score. Where the particular datum falls relative to other data in a distribution.
Y variable is
what might happen in the future
The current sample we are just working out the existing associations in order to work out
what would be expected if one of the variables was missing in the new sample.
the mean is a method that is often used
when we have a missing data point in the sample.