STAT 2300
For a correlation coefficient that is perfectly strong and positive, will be closer to 0 or 1?
1
Simpson's paradox
A condition where the percentages reverse when a third (lurking) variable is ignored; in other words, a condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the reported variables.
sampling distribution
A distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value.
slope
A measure of the average change in the response variable for every one unit increase in the explanatory or independent variable.
Statistic
A number that can be computed from the sample data without making use of any unknown parameters.
Parameter
A number that describes the population.
Response variable
A variable that measures an outcome of a study. The "after"
Explanatory variable
A variable thought to explain or even cause changes in another variable. The "before"
Random variable
A variable whose value is a numerical outcome of a random phenomenon.
Influential observation
An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation.
influential observation
An observation that substantially alters the values of slope and y-intercept in the regression equation when it is included in the computations.
Law of large numbers
States that the actually observed mean outcome must approach the mean μ of the population as the number of observations increases.
What does the correlation coefficient measure?
Strength of the linear relationship between two quantitative variables
least squares regression line
The line with the smallest sum of squared residuals
Central Limit Theorem
The name of the statement telling us that the sampling distribution of x is approximately normal whenever the sample is large and random.
residual
The observed y minus the predicted y; denoted: y - yˆ
r2
The percentage of total variation in the response variable, Y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, Y, that is explained by the explanatory variable, X.
Probability
The proportion of times the event occurs in many repeated trials of a random phenomenon.
Extrapolation
The use of a regression line for prediction for values of the explanatory variable far outside the range of the data from which the line was calculated.
True or false: Correlation coefficient, r, does not change if the unit of measure for either X or Y is changed.
True
True or false: The correlation between x and y equals the correlation between y and x (i.e., changing the roles of x and y does not change r).
True
Which one of the following statements is a correct statement about correlation coefficient? a. The correlation between major (like mathematics, accounting, Spanish, etc.) and overall GPA is very high. b. In professional baseball, the correlation between players' batting average and their salary is positive. c. The correlation between percentage of disposable income required to meet consumer loan payments and the percentage of disposable income required to pay mortgage payments for selected years is 11.8% Correlation coefficient has no unit of measure. Thus, reporting correlation as "11.8%" is incorrect. In addition, correlation should be a number between -1 and +1. d. The correlation for time between planting and harvesting of a grain called paddy and the yield of paddy in kilograms per hectare is r = 0.27. If we measure time in hours instead of days, then the correlation will be r = (24)(0.27).
b
form of a scatterplot
linear or curved or none
direction of a scatterplot
positive or negative
Which one of the following best describes the computation of correlation coefficient?
r equals the average of the products of the z-scores for x and y.
strength of a scatterplot
strong or weak
True or false: The correlation coefficient computed on bivariate quantitative data is misleading when the relationship between the two variables is non-linear.
true