Stats Exam 5
The accompanying table shows the ages (in years) of 11 children and the numbers of words in their vocabulary. Complete parts (a) through (d) below. (a) Display the data in a scatter plot. Choose the correct graph below. (b) Calculate the sample correlation coefficient r. (c) Describe the type of correlation, if any, and interpret the correlation in the context of the data. (d) Use the table of critical values for the Pearson correlation coefficient to make a conclusion about the correlation coefficient. Let α=0.01.
(a) pick scatter plot based off data points (b)r=. 979 (plug data into lists and use linreg) (c)There is a strong positive linear correlation. (d)The critical value is . 735 Therefore, there is sufficient evidence at the 1% level of significance to conclude that there is a significant linear correlation between children's ages and the number of words in their vocabulary.
The table shows the number of goals allowed and the total points earned (2 points for a win, and 1 point for an overtime or shootout loss) by 14 ice hockey teams over the course of a season. The equation of the regression line is y=−0.616x+230.239. Use the data to answer the following questions. (a) Find the coefficient of determination, r2, and interpret the result. (b) Find the standard error of the estimate, se, and interpret the result.
(a) r2=. 755 Interpret the coefficient of determination. Select the correct choice below and fill in the answer box to complete your choice. A.About 75.5% of the variation in points earned can be explained by the relationship between number of goals allowed and points earned. The remaining variation is unexplained. (b) se=9.609 Interpret the standard error of the estimate. Select the correct choice below and fill in the answer box to complete your choice. A.The standard error of the estimate of the points earned for a specific number of goals allowed is about 9.609
A survey was conducted two years ago asking college students their top motivations for using a credit card. To determine whether this distribution has changed, you randomly select 425 college students and ask each one what the top motivation is for using a credit card. Can you conclude that there has been a change in the claimed or expected distribution? Use α=0.05. Complete parts (a) through (d). a. State H0 and Ha and identify the claim. Which hypothesis is the claim? (b) Determine the critical value, χ20, and the rejection region. (c) Calculate the test statistic.
(a)HO---The distribution of motivations is 29% rewards, 23% low rate, 21% cash back, 7% discounts, and 20% other. HA---The distribution of motivations differs from the claimed or expected distribution. HA is the claim (b)9.488 (use chi square table) x^2>x^20 (c) use the formula in notebook (d)Reject H0. At the 5% significancelevel, there is enough evidence to conclude that the distribution of motivations differs from the claimed or expected distribution.
(a)The coefficient of determination r2 is the ratio of which two types of variations? (b)What does r2 measure? (c)What does 1−r2 measure?
(a)The coefficient of determination is the ratio of the explained variation to the total variation. (b)The coefficient of determination is the percent of variation of y that is explained by the relationship between x and y. (c)The value 1−r2 is the percent of the variation that is unexplained.
The table shows the average weekly wages (in dollars) for state government employees and federal government employees for 8 years. The equation of the regression line is y=1.531x−116.332. Complete parts (a) and (b) below. (a) Find the coefficient of determination and interpret the result. How can the coefficient of determination be interpreted? (b) Find the standard error of estimate se and interpret the result. How can the standard error of estimate be interpreted?
(a)r2=. 953 use lin reg t test The coefficient of determination is the fraction of the variation in average weekly wages for federal government employees that can be explained by the variation in average weekly wages for state government employees and is represented by r squared .r2. The remaining fraction of the variation, 1 minus r squared1−r2, is unexplained and is due to other factors or to sampling error. (b)se=25.29 The standard error of estimate of the average weekly wage for federal government employees for a specific average weekly wage for state government employees is about se dollars.
Find the expected frequency, Ei, for the given values of n and pi. n=150, pi=0.6
90 (multiply 150 and 0.6)
What is a residual? Explain when a residual is positive, negative, and zero.
A residual is the difference between the observed y-value of a data point and the predicted y-value on a regression line for the x-coordinate of the data point. A residual is positive when the point is above the line, negative when it is below the line, and zero when the observed y-value equals the predicted y-value.
Explain how to determine whether a sample correlation coefficient indicates that the population correlation coefficient is significant. Choose the correct answer below.
A table can be used to compare the absolute value of r with a critical value, or a hypothesis test can be performed using a t-test.
Describe the hypotheses for a two-way ANOVA test. Which statement below describes the hypotheses for a two-way ANOVA test?
A two-way ANOVA test has three null hypotheses, one for each main effect and one for the interaction effect.
Use the value of the linear correlation coefficient to calculate the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? r=-0.076
Calculate the coefficient of determination. r^2 . 006 What does this tell you about the explained variation of the data about the regression line? .6% of the variation can be explained by the regression line. About the unexplained variation? 100-.6 99.4% of the variation is unexplained and is due to other factors or to sampling error.
Use the value of the linear correlation coefficient to calculate the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? r=-0.243
Calculate the coefficient of determination. r^2 . 059 What does this tell you about the explained variation of the data about the regression line? 5.9% of the variation can be explained by the regression line. About the unexplained variation? 100-5.9 94.1% of the variation is unexplained and is due to other factors or to sampling error.
List five properties of the F-distribution.
Determine the first of the five properties. The F-distribution is a family of curves, each of which is determined by two types of degrees of freedom: the degrees of freedom corresponding to the variance in the numerator, denoted by d.f. N, and the degrees of freedom corresponding to the variance in the denominator, denoted by d.f. D Determine the second of the five properties. The F-distribution is positively skewed and therefore the distribution is not symmetric. Determine the third of the five properties. The total area under each F-distribution curve is equal to 1. Determine the fourth of the five properties. Choose the correct answer below. All values of F are greater than or equal to 0. Determine the fifth of the five properties. For all F-distributions, the mean value of F is approximately equal to 1.
Value of home and life span are two variables that have been shown to have positive correlation but no cause-and-effect relationship. Describe at least one possible reason for the correlation. Select all that apply.
Exercise tends to increase life spans, people who live within walking distance of amenities tend to walk more than those who do not, and homes that are within walking distance of amenities tend to be more valuable than homes that are not. Greater wealth allows people to afford more valuable homes and to spend more money on health care, and greater health care spending generally enables people to live longer.
Explain how to find the expected frequency for a cell in a contingency table.
Find the sum of the row and the sum of the column in which the cell is located. Find the product of these sums. Divide the product by the sample size.
Explain how to determine the values of d.f.N and d.f.D when performing a two-sample F-test.
First determine what d.f.N and d.f.D represent. The variable d.f.N represents the degrees of freedom of the numerator, and the variable d.f.D represents the degrees of freedom of the denominator. The value of d.f.N is equal to n 1 minus 1 comman1−1, and the value of d.f.D is equal to n 2 minus 1 comman2−1, where n1 and n2 represent the sample sizes of the numerator and denominator (respectively).
State the null and alternative hypotheses for a one-way ANOVA test. Which hypotheses below are for a one-way ANOVA test?
H0: All population means are equal. Ha: At least one population mean is different from the others.
Explain how the chi-square independence test and the chi-square goodness-of-fit test are similar. How are they different? For each characteristic, determine if it applies to the chi-square independence test, the chi-square goodness-of-fit test, or both.
Has d.f.=(r−1)(c−1): Chi-square independence test Requires the data be obtained from a random sample: Both Has d.f.=k−1: Chi-square goodness-of-fit test Uses this formula to find the expected frequency: Ei=npi: Chi-square goodness-of-fit test Uses the following formula to find the expected frequency: Chi-square independence test Er,c=(Sum of row r)•(Sum of column c)Sample size Requires that each expected frequency is at least 5: Both Used to test if two variables are independent: Chi-square independence test Test if a frequency distribution fits an expected distribution: Chi-square goodness-of-fit test Testing a claim about data that are in categories: Both
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line. (The pair of variables have a significant correlation.) Then use the regression equation to predict the value of y for each of the given x-values, if meaningful. The table below shows the heights (in feet) and the number of stories of six notable buildings in a city.
Plug the list into the calculator Find the regression equation.(use linreg ax+b) y=ax+b (plug a and b into equation Choose the correct graph below. (zoom, zoom stat) (a) Predict the value of y for x=498. Choose the correct answer below. not meaningful (b) Predict the value of y for x=648. Choose the correct answer below. (c) Predict the value of y for x=345. Choose the correct answer below. (d) Predict the value of y for x=732. Choose the correct answer below. (Plug each value into y=ax+b formula for x) (VALUE IS NOT MEANINGFUL IF IT IS OUTSIDE DATA RANGE)
The accompanying data are the number of wins and the earned run averages (mean number of earned runs allowed per nine innings pitched) for eight baseball pitchers in a recent season. Find the equation of the regression line. Then construct a scatter plot of the data and draw the regression line. Then use the regression equation to predict the value of y for each of the given x-values, if meaningful. If the x-value is not meaningful to predict the value of y, explain why not. (a) x=5 wins (b) x=10 wins (c) x=21 wins (d) x=15 wins
Plug the list into the calculator Find the regression equation.(use linreg ax+b) y=ax+b (plug a and b into equation Choose the correct graph below. (zoom, zoom stat) (a) x=5 wins (b) x=10 wins (c) x=21 wins (d) x=15 wins (Plug each value into y=ax+b formula for x) (VALUE IS NOT MEANINGFUL IF IT IS OUTSIDE THE DATA RANGE)
Given a set of data and a corresponding regression line, describe all values of x that provide meaningful predictions for y.
Prediction values are meaningful only for x-values in (or close to) the range of the original data.
Explain how to find the critical value for an F-test.
Specify the level of significance, α. Determine the degrees of freedom for the numerator, d.f.N, and denominator, d.f.D. Find the critical value of F using technology or the F-distribution table.
Explain how to predict y-values using the equation of a regression line.
Substitute a value of x into the equation of a regression line and solve for y hat.
Describe the difference between the variance between samples MSB and the variance within samples MSW.
The MSB measures the differences related to the treatment given to each sample. The MSW measures the differences related to entries within the same sample.
In order to predict y-values using the equation of a regression line, what must be true about the correlation coefficient of the variables?
The correlation between variables must be significant.
Two variables have a positive linear correlation. Does the dependent variable increase or decrease as the independent variable increases? Choose the correct answer below.
The dependent variable increases.
Describe the explained variation about a regression line in words and in symbols.
The explained variation is the sum of the squares of the differences between the predicted y-values and the mean of the y-values of the ordered pairs.
Identify the explanatory variable and the response variable. A teacher wants to determine if the teaching method used by her students can be used to predict the students' test scores.
The explanatory variable is the teaching method. The response variable is the students' test scores.
What does it mean to say "correlation does not imply causation"? Choose the correct answer below.
The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables.
The logarithmic equation is a nonlinear regression equation of the form y=a+blnx. The accompanying data are the shoe sizes and heights (in inches) of 14 men. Graphs of the regression line and the logarithmic equation are also provided. Which equation is a better model for the data? Explain.
The logarithmic equation is a better model for the data because the graph of the logarithmic equation fits the data better than the regression line.
The ages (in years) of 10 men and their systolic blood pressures (in millimeters of mercury) are shown in the attached data table with a sample correlation coefficient r of 0.897.Remove the data entry for the man who is 51 years old and has a systolic blood pressure of 201 millimeters of mercury from the data set and find the new correlation coefficient. Describe how this affects the correlation coefficient r. Use technology.
The new correlation coefficient r gets stronger, going from 0.897 to .961.
What conditions are necessary to use the chi-square goodness-of-fit test?
The observed frequencies must be obtained randomly and each expected frequency must be greater than or equal to 5.
A linear equation is an equation of the form y=ax+b, and a power equation is an equation of the form y=axb. The linear equation and power equation for the accompanying data are provided below. Determine which equation is a better model for the data. Explain your reasoning.
The power equation is a better model for the data because the graph of the power equation fits the data better than the graph of linear equation.
Describe the range of values for the correlation coefficient.
The range of values for the correlation coefficient is −1 to 1, inclusive.
Describe the range of values for the correlation coefficient. Choose the correct answer below.
The range of values for the correlation coefficient is −1 to 1, inclusive.
List the three conditions that must be met in order to use a two-sample F-test.
The samples must be randomly selected, independent, and each population must have a normal distribution.
Two variables have a positive linear correlation. Is the slope of the regression line for the variables positive or negative?
The slope is positive. As the independent variable increases the dependent variable also tends to increase.
Describe the total variation about a regression line in words and symbols.
The total variation is the sum of the squares of the differences between the y-values of each ordered pair and the mean of the y-values of the ordered pairs, or ∑yi−y)^2
Describe the unexplained variation about a regression line in words and in symbols.
The unexplained variation is the sum of the squares of the differences between the observed y-values and the predicted y-values. The unexplained variation in symbols can be described by the expression(yi-y hat i)^2
What conditions are necessary in order to use a one-way ANOVA test? Which conditions below are necessary? Select all that apply.
There must be at least 3 samples., The samples must be independent of each other., Each population must have the same variance., The samples must be randomly selected from a normal, or approximately normal, population.
Determine whether the statement is true or false. If it is false, rewrite it as a true statement. If the test statistic for the chi-square independence test is large, you will, in most cases, reject the null hypothesis.
True
Explain what it means for two variables to have a bivariate normal distribution.
Two variables have a bivariate normal distribution when for any fixed values of x the corresponding values of y are normally distributed, and for any fixed values of y the corresponding values of x are normally distributed.
Give examples of two variables that have a perfect positive linear correlation and two variables that have a perfect negative linear correlation. Choose the correct answer below.
Two variables that have perfect positive linear correlation are the price per gallon of gasoline and the total cost of gasoline. Two variables that have perfect negative linear correlation are the distance from a door and the height of a wheelchair ramp.
What is the coefficient of determination for two variables that have perfect positive linear correlation or perfect negative linear correlation? Interpret your answer.
Two variables that have perfect positive or perfect negative linear correlation have a correlation coefficient of 1 or −1, respectively. In either case the coefficient of determination is 1, which means 100% of the variation in the response variable is explained by the variation in the explanatory variable.
Discuss the difference between r and ρ.
r represents the sample correlation coefficient. ρ represents the population correlation coefficient.
Discuss the difference between r and ρ. Choose the correct answers below.
r represents the sample correlation coefficient. ρ represents the population correlation coefficient.
The table shows the amounts of crude oil (in thousands of barrels per day) produced by a certain country and the amounts of crude oil (in thousands of barrels per day) imported by the same country for seven years. The equation of the regression line is y=−1.284x+16,762.57. Complete parts (a) and (b) below. (a) Find the coefficient of determination and interpret the result. How can the coefficient of determination be interpreted? (b) Find the standard error of estimate se and interpret the result. How can the standard error of estimate be interpreted?
r2=. 804 The fraction of the variation in the amount of imported crude oil that can be explained by the variation in the amount of produced crude oil is r2. The remaining fraction 1−r2 of the variation is unexplained and is due to other factors or to sampling error. se=196.820 thousands of barrels per day The standard error of estimate of the amount of imported crude oil for a specific amount of produced crude oil is about se thousand of barrels per day.
The table shows the numbers of new-vehicle sales (in thousands) in the United States for Company A and Company B for 10 years. The equation of the regression line is y=0.981x+1,241.77. Complete parts (a) and (b) below. (a) Find the coefficient of determination and interpret the result. How can the coefficient of determination be interpreted? (b) Find the standard error of estimate se and interpret the result. How can the standard error of estimate be interpreted?
r2=. 840 The coefficient of determination is the fraction of the variation in new-vehicle sales for Company B that can be explained by the variation in new-vehicle sales for Company A and is represented by r squared The remaining fraction of the variation, 1−r2, is unexplained and is due to other factors or to sampling error.
Which value of r indicates a stronger correlation: r=0.762 or r=−0.887? Explain your reasoning. Choose the correct answer below.
r=−0.887 represents a stronger correlation because |-0.887|>|0.762|
Which value of r indicates a stronger correlation: r=0.791 or r=−0.922? Explain your reasoning.
r=−0.922 represents a stronger correlation because |-0.922|>|0.791|
The money raised and spent (both in millions of dollars) by all congressional campaigns for 8 recent 2-year periods are shown in the table. The equation of the regression line is y=0.958x+10.243. Find the standard error of estimate se and interpret the result.
se=26.849 use lin reg t test The standard error of estimate of the money spent for a specific amount of money raised is about se million dollars.
Find the equation of the regression line for the given data. Then construct a scatter plot of the data and draw the regression line. The table shows the shoe size and heights (in.) for 6 men.
use lin reg and plug in a and b
The number of initial public offerings of stock issued in a 10-year period and the total proceeds of these offerings (in millions) are shown in the table. The equation of the regression line is y=48.317x+18,431.41. Complete parts a and b. (a) Find the coefficient of determination and interpret the result. . 298.298 (Round to three decimal places as needed.) How can the coefficient of determination be interpreted? The coefficient of determination is the fraction of the variation in proceeds that is unexplained and is due to other factors or sampling error. The remaining fraction of the variation is explained by the variation in issues. The coefficient of determination is the fraction of the variation in proceeds that can be explained by the variation in issues. The remaining fraction of the variation is unexplained and is due to other factors or to sampling error. Your answer is correct. (b) Find the standard error of estimate se and interpret the result. 16467.716467.7 (Round to three decimal places as needed.) How can the standard error of estimate be interpreted? The standard error of estimate of the issues for a specific number of proceeds is about se million dollars. The standard error of estimate of the proceeds for a specific number of issues is about se million dollars.
(a) .298 (use r squared) How can the coefficient of determination be interpreted? The coefficient of determination is the fraction of the variation in proceeds that can be explained by the variation in issues. The remaining fraction of the variation is unexplained and is due to other factors or to sampling error. (b) Find the standard error of estimate se and interpret the result. 16467.7 (use lin reg t test use s) How can the standard error of estimate be interpreted? The standard error of estimate of the proceeds for a specific number of issues is about se million dollars.
The table shows the total square footage (in billions) of retailing space at shopping centers and their sales (in billions of dollars) for 10 years. The equation of the regression line is y=514.678x−1677.574. Complete parts a and b. (a) Find the coefficient of determination and interpret the result. How can the coefficient of determination be interpreted? other factors or to sampling error. (b) Find the standard error of estimate se and interpret the result. How can the standard error of estimate be interpreted?
(a) .972 How can the coefficient of determination be interpreted? The coefficient of determination is the fraction of the variation in sales that can be explained by the variation in total square footage. The remaining fraction of the variation is unexplained and is due to other factors or to sampling error. (b) 38.724 How can the standard error of estimate be interpreted? The standard error of estimate of the sales for a specific total square footage is about se billion dollars.