STA215: Data and Correlation
categorical variable
a variable with qualitative data
Which symbol represents slope in a statistical model?
b
What kind of variables can the response and explanatpry variable be?
both quantatative, both categorical, one of each
Fill in the blank: The closer 𝑟2 is to 100%, the
closer the data points are to the regression line.
Which of the following makes no distinction as to which variable is 𝑋 and which is 𝑌?
correaltion
why study bivariate data
focuses on whather a relationship exists between the explanatpry variable and response variable
Explanatory Variable
may or may not explain changes or influences change in the repsonse variable - denoted X - independent
correlation
measures direction and strength of linear association between x and y
Response variable
measures outcome on each individual- denoted Y - dependent variable
regression
models the linear relationship between x and y and uses the model to predct the value for y for a specific value of x
if r is negative
more negative areas
if r is positive
more positive areas
If we change the unit of measure for 𝑋 from centimeters to feet, will the value of 𝑟 change?
no
If we change 𝑋 from diameter to volume and we change 𝑌 from volume to diameter, will the value of 𝑟 change?
no
bivariate data is
quantitative data that has two variables; often represented using a scatterplot
Which of the following requires an important distinction between 𝑋 and 𝑌?
regression
quantitative variable
variable with nuerical data
Fill in the blank: 𝑟2 tells us the percentage of the ___ that is(are) explained by the least-squares regression line
variation in 𝑦
With 𝑟=0.941, how should we describe strength?
very strong`
When should you model with a straight line?
when the relationship between 𝑋 and 𝑌 is linear and 𝑋 and 𝑌 are quantitative
Suppose the correlation between number of beers consumed and blood alcohol content is 𝑟=0.9. What percentage of the variation in blood alcohol content can be explained by number of beers consumed?
81%
Which of the following values for 𝑟2 indicate perfect fit (i.e., all data points are on the regression line.)
100%
Suppose two different explanatory variables have a linear relationship with a response variable, 𝑦. The regression line using Variable #1 has an 𝑟2 of 47% and the regression line using Variable #2 has an 𝑟2 of 89%. Which explanatory variable explains the most variation in 𝑦?
2
What is a Statistical Model?
An equation that fits the pattern between a response variable ans possible explanatory variables, accounting for deviations from the models -simplest case: one quantitative response and one quantatiatve explanatory variable
True or False: If 𝑟2 is really close to 100%, then there is a lot of unexplained variation.
False
True or false: Multiplying the 𝑧‑score for 𝑋 by the 𝑧‑score for 𝑌 always gives positive products.
False
If there is no relationship between 𝑋 and 𝑌 and 𝑟2=0, what shape should we expect the data points in the scatterplot to resemble?
a hamburger parallel to the 𝑥-axis
What does correlation give us?
a measure of direction and strength of the linear relationship between 𝑥 and 𝑦
how can we comapre the strength of linear relationships more precisely than using words such as weakand strong
a measure to quantify
What does regression give us?
a model of the relationship between 𝑥 and 𝑦
Why is plotting the data so important before computing 𝑟?
To check whether the relationship is linear or non‑linear.
True or false: Because the value for 𝑟 is negative, we can say that the direction of the relationship is negative
True
True or false: The closer the data points are to the line, the closer 𝑟 is to either -1.0 or +1.0.
True
strength of relationship on scatterplot
determined by how closely the points follow a clear form; strong or weak correlation -strong: close -weak: far away
What does each dot in the scatterplot represent?
each (𝑋,𝑌) pair
Which variable may explain changes in the outcome?
explanatory variable
True or False: If the prediction errors (i.e., residuals) are large, then 𝑟2 is close to 100%.
false
True or false: 𝑟 is resistant to outliers.
false
True or False: If correlation (𝑟) is negative, then slope could be positive or negative—we cannot predict which.
false; If correlation, 𝑟, is negative, then slope will always be negative.
True or False: Knowing only the value of slope, you can determine the value of correlation.
false; Knowing the value of slope only tells us the direction of 𝑟; the value of slope tells us nothing about the value of 𝑟.
True or False: 𝑋 and 𝑌 can be interchanged in both correlation and linear regression.
false; 𝑋 and 𝑌 can be interchanged in correlation, but not in the formula for a regression line 𝑦̂=𝑎+𝑏𝑥.
True or False: 𝑟2 measures the fraction of 𝑦 values that are exactly predicted by the 𝑥 values.
false; 𝑟2 is a measure of the fraction of variation in the 𝑦's that is explained by 𝑥. It does not tell us the fraction of 𝑦 values that are exactly predicted as most are not, even when 𝑟2 is close to 100%.
Which variable is the outcome variable?
response variable
Why is it hard to depict between explanatory and response variable
sometimes they cannnot be designated
what does r give
strength
What does the symbol 𝑦̂ represent?
the predicted 𝑦 value
What does total variation in the 𝑦's measure?
the variability of the 𝑦's about their mean 𝑦¯
purpose for r
to measure size of joint variation in x and y for each point (each product gives area of rectangle
why do we compute deviations fo x and y
to measure the variation in x's and y's
True or False: Both correlation and linear regression require a straight line relationship between 𝑋 and 𝑌.
true
True or False: If correlation (𝑟) is positive, then slope is always positive.
true
True or False: If correlation (𝑟) is zero, then slope is always zero.
true
True or False: If 𝑟2 is really close to 100%, then the sum of squared residuals is very small.
true
True or False: Slope and correlation (𝑟) always have the same sign.
true
True or False: The regression line always passes through the point (𝑥¯,𝑦¯).
true
True or False: 𝑟2 is a measure of how successfully the regression line explains the variation in 𝑦.
true
True or false: A value of 𝑟=-1.5 has to be an error.
true
True or false: The formula for computing 𝑟 includes 𝑧‑scores for both 𝑋 and 𝑌.
true
True or false: The sign on 𝑟 always denotes the direction of the relationship.
true
True or false: Using color and symbols, we can clearly see the three linear relationships for types of hotdogs displayed in the scatterplot.
true
what is bivariate data?
two measurements (two variables) on each individual in a study - study relationship between variables
What type of data is graphed with a scatterplot?
two quantitative variables measured on each individual