Statistics Chapter 4 Homework

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

4.2 If the linear correlation between two variables is​ negative, what can be said about the slope of the regression​ line? Choose the correct answer below. A. More information is needed B. Positive C. Negative

C. Negative Note: If the linear correlation between two variables is​ negative, the slope of the regression line will also be negative.

4.2 What is a​ residual? What does it mean when a residual is​ positive? Choose the correct answer below. A. A residual is the difference between an observed value of the response variable y and the value of the corresponding explanatory variable x. If it is​ positive, then the response variable is greater than the explanatory variable. B. A residual is the difference between an observed value of the response variable y and the predicted value of y. If it is​ positive, then the observed value is greater than the predicted value. C. A residual is the difference between an observed value of the response variable y and the average value of the response variable. If it is​ positive, then the response variable is greater than the mean. D. A residual is a data point that does not fit the pattern of the rest of the data. If it is​ positive, then the data point should still be included in the data set.

B. A residual is the difference between an observed value of the response variable y and the predicted value of y. If it is​ positive, then the observed value is greater than the predicted value. Note: A residual is the difference between an observed value of the response variable y and the predicted value of​ y, or the residual is the observed value minus the predicted value. That means that if the residual is​ positive, then the observed value must be greater than the predicted value.

4.3 A scatter diagram is given with one of the points drawn in blue. In​ addition, two​ least-squares regression lines are drawn. The line drawn in red is the​ least-squares regression line with the point in blue excluded. The line drawn in blue is the​ least-squares regression line with the point in blue included. On the basis of these​ graphs, do you think the point in blue is​ influential? A scatter diagram has a horizontal axis labeled Explanatory from less than 5 to 15 plus in increments of 10 and a vertical axis labeled Response from less than 10 to 20 plus in increments of 5. Upper A series of plotted points closely follow a red line that rises from left to right passing through the points left parenthesis 5 comma 10 right parenthesis and left parenthesis 19 comma 20 right parenthesis . Upper A blue point is plotted at left parenthesis 14 comma 13 right parenthesis . Upper A blue line rises from left to right passing through the points left parenthesis 5 comma 10 right parenthesis and left parenthesis 19 comma 19 right parenthesis . All coordinates are approximate. Does the point in blue seem to be​ influential? A. No​, because the observation significantly affects the​ least-squares regression​ line's slope​ and/or y-intercept or the value of the correlation coefficient. B. No​, because the observation does not significantly affect the​ least-squares regression​ line's slope​ and/or y-intercept or the value of the correlation coefficient. C. Yes​, because the observation does not significantly affect the​ least-squares regression​ line's slope​ and/or y-intercept or the value of the correlation coefficient. D. Yes​, because the observation significantly affects the​ least-squares regression​ line's slope​ and/or y-intercept or the value of the correlation coefficient.

B. No​, because the observation does not significantly affect the​ least-squares regression​ line's slope​ and/or y-intercept or the value of the correlation coefficient.

4.2 example A pediatrician wants to determine the relation that exists between a​ child's height,​ x, and head​ circumference, y. She randomly selects 11 children from her​ practice, measures their heights and head​ circumferences, and obtains the accompanying data. Complete parts​ (a) through​ (g) below.

(a) Find the​ least-squares regression line treating height as the explanatory variable and head circumference as the response variable. The equation of the​ least-squares regression line is given by y=b1x+b0 where b1=r•sy/sx is the slope of the​ least-squares regression line and b0=y−b1x is the​ y-intercept of the​ least-squares regression line. While either the formulas above or technology can be used to find the​ least-squares regression​ line, for this​ exercise, use technology. Use technology to construct the​ least-squares regression​ line, rounding to the slope to three decimal places and the constant to one decimal place. y=0.081x+15.1 ​(b) Interpret the slope and​ y-intercept, if appropriate. First interpret the slope. The slope of a line is change in ychange in x. The​ slope, 0.081​, is the change in head circumference for a 1 in. change in height on average. For every inch increase in​ height, the head circumference increases by 0.081 ​in., on average. ​Next, interpret the​ y-intercept. The​ y-intercept of a line is found by letting x equal 0 and solving for y. Recall that the​ y-intercept is y=15.1. The​ y-intercept of 15.1 is the predicted value of head circumference when the height is 0. Since the height of a child cannot be 0​ inches, it does not make sense to interpret the​ y-intercept. ​(c) Use the regression equation to predict the head circumference of a child who is 24 inches tall. To predict the head​ circumference, substitute 24 for x and solve for y in the​ least-squares regression line. Substitute 24 for x and solve for y​, rounding to two decimal places. y = 0.081x+15.1 y = 0.081​(24​)+15.1 y = 17.04 So the predicted head circumference of a child with height 24 inches is 17.04 inches. ​(d) Compute the residual based on the observed head circumference of the 24​-inch-tall child in the table. Is the head circumference of this child above or below the value predicted by the regression​ model? The residual is equal to the observed value of head circumference minus the predicted value of the head circumference of a child with height 24 inches. Recall that the predicted value of head circumference is 17.04 inches as computed in part​ (c). The observed value is 17.4 inches for this observation. Compute the residual. residual = observed−predicted residual = 17.4−17.04 residual = 0.36 Recall that the residual compares the​ predicted, or​ average, value of the​ child's head circumference to the observed head circumference. Use the formula for the residual and its sign to determine whether the observed value is above or below the​ predicted, or​ average, value. ​(e) Draw the​ least-squares regression line on the scatter diagram of the data and label the residual from part​ (d). Recall that the residual is the vertical distance from the data point to the prediction line. Use technology to plot the scatter​ diagram, the​ least-squares regression​ line, and the residual. This graph is shown to the right A scatter diagram has a horizontal axis from 24 to 29 in increments of 0.5 and a vertical axis from 16 to 18 in increments of 0.2. A line rises from left to right and passes through the points (24, 17) and (29, 17.4). Plotted points are scattered around this line. One of the plotted points is at (24, 17.4) and lies above the line. The label "Residual" appears near this point. A vertical line extends from this point to the line that rises from left to right. ​(f) Notice that two children are 26 inches tall. One has a head circumference of 17.1 ​inches; the other has a head circumference of 17.3 inches. How can this​ be? Consider the possible reasons why two children with the same height may have different head circumferences. ​(g) Would it be reasonable to use the​ least-squares regression line to predict the head circumference of a child who was 32 inches​ tall? Why? To be able to use a​ least-squares regression line to make a​ prediction, the value in question must be within the scope of the model. Use this information to determine if it is reasonable to use the​ least-squares regression line to predict the head circumference of a child who was 32 inches tall.

4.3 Fill in the blank. A​ _______ is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis.

residual plot Note: A scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis is called a residual plot.

4.3 Example The time it takes for a planet to complete its orbit around a particular star is called the​ planet's sidereal year. The sidereal year of a planet is related to the distance the planet is from the star. The accompanying data show the distances of the planets from a particular star and their sidereal years. Complete parts ​(a) through​ (e) below.. Click here to view the data table. Click here to view the table of critical values of the correlation coefficient.

​(a) Draw a scatter diagram of the data treating distance from the star as the explanatory variable. A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal​ axis, and the response variable is plotted on the vertical axis. Draw a scatter diagram of the data. The graph is shown to the right. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 2000 and a vertical axis labeled Sidereal Year from 0 to 250 in increments of 25. The following 9 points are plotted: (50, 0); (50, 0); (100, 0); (150, 0); (500, 10); (900, 25); (1800, 80); (2800, 155); (3700, 240). The points follow the pattern of a curve rising from left to right at a slightly increasing rate. All coordinates are approximate. ​(b) Determine the correlation between distance and sidereal year. Does this imply a linear relation between distance and sidereal​ year? The linear correlation coefficient is a measure of the strength and direction of the linear relation between two quantitative variables. The formula for the linear correlation coefficient is shown below. r = [∑ (xi−x / sx) (yi−y /sy) ] / n−1 In this​ formula, x is the sample mean of the explanatory​ variable, sx is the sample standard deviation of the explanatory​ variable, y is the sample mean of the response​ variable, and sy is the sample standard deviation of the response​ variable, n is the number of individuals in the sample. While either technology or the formula can be used to find the linear correlation​ coefficient, for purposes of this​ problem, use​ technology, rounding to three decimal places. The correlation between distance and sidereal year is 0.987. The linear correlation coefficient is always between −1 and​ 1, inclusive. If r=+​1, then a perfect positive linear relation exists between the two variables. If r=−​1, then a perfect negative linear relation exists between the two variables. The closer r is to +​1, the stronger is the evidence of positive association between the two variables. The closer r is to −​1, the stronger is the evidence of negative association between the two variables. If r is close to​ 0, then little or no evidence exists of a linear relation between the two variables. If the absolute value of the correlation coefficient is greater than the critical value for the given sample​ size, then a linear relation exists between the two variables. From the correlation coefficient​ table, the critical value for n=9 is 0.666. The​ correlation, 0.987, is greater than the critical​ value, 0.666. ​Thus, the correlation of 0.987 implies a linear relation between distance and sidereal year. ​(c) Compute the​ least-squares regression line. The equation of the​ least-squares regression line is given by y=b1x+b0 where b1=r•sy / sx is the slope of the​ least-squares regression line and b0=y−b1x is the​ y-intercept of the​ least-squares regression line. Note that x is the sample mean and sx is the sample standard deviation of the explanatory variable​ x, and y is the sample mean and sy is the sample standard deviation of the response variable y. While either technology or the formula can be used to find the​ least-squares regression​ line, for purposes of this​ problem, use technology to find the​ least-squares regression​ line, rounding the slope to three decimal places and the intercept to one decimal place. y=0.0624x−12.2 ​(d) Plot the residuals against the distance from the star. A residual plot is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis. Recall that a residual is the difference between the observed value of y and the predicted value of y. Residual=Observed y−Predicted y=y−y While the residual plot can be constructed using either technology or by first determining the residuals by​ hand, for purposes of this​ problem, use technology to construct the residual plot. The graph is shown to the right. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from negative 23 to 23 in increments of 5. The following 9 points are plotted: (50, 10); (50, 8.5); (100, 7.5); (150, 5); (500, negative 7); (900, negative 15.5); (1800, negative 20); (2800, negative 9.5); (3700, 21). The points follow the pattern of a parabola that opens upward. All coordinates are approximate. ​(e) Do you think the​ least-squares regression line is a good​ model? To determine if a linear model is​ appropriate, a residual plot is used. If a plot of the residuals against the explanatory variable shows a discernible​ pattern, such as a​ curve, then the explanatory and response variable may not be linearly related. If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable​ increases, then a strict requirement of the linear model is violated. Recall that outliers are extreme observations. An outlier can also be thought of as an observation that does not fit the overall pattern of the data. The residuals follow a​ U-shaped pattern. The spread of the residuals does not increase or decrease consistently as the explanatory variable increases. There are no outliers in the residual plot because no residuals lie far from the rest of the plot. Use this information to determine if the linear model is appropriate.

4.3 example The following data represent the time between eruptions and the length of eruption for 8 randomly selected geyser eruptions. Complete parts ​(a) through​ (c) below. Click here to view a scatter plot of the data. Click here to view a residual plot of the data. ​Time, x: Length, y: Time, x: Length, y 12.23: 1.91: 11.68: 1.78 11.75: 1.79: 12.31: 1.91 11.96: 1.81: 11.55: 1.73 12.15: 1.82: 11.67: 1.69 11.29: 1.65 ​ (a) What type of relation appears to exist between time between eruptions and length of​ eruption? ​(b) Does the residual plot confirm that the relation between time between eruptions and length of eruption is​ linear? ​(c) The coefficient of determination

(a) What type of relation appears to exist between time between eruptions and length of​ eruption? Two variables that are linearly related are positively associated​ if, whenever the value of one variable​ increases, the value of the other variable tends to also increase. ​Similarly, two variables that are linearly related are negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable tends to also decrease. There appears to be a linear association. As time between eruptions​ increases, so does the length of the​ eruption, so the association is positive. ​(b) Does the residual plot confirm that the relation between time between eruptions and length of eruption is​ linear? Recall that a residual is the difference between the observed value of y and the predicted​ value, y. The difference between the sample​ mean, y​, and the predicted​ value, y​, is the variation that is explained by the linear model. The residual is the​ left-over distance between the predicted​ value, y​, and the actual value y. The purpose of a residual plot is to analyze the unexplained variation present in a data sample. If the plot of residuals shows a discernible​ pattern, then some other relationship could explain this variation. This means that the trend in the data​ isn't completely explained by a linear model. If a plot of residuals shows the spread of the residuals increasing or decreasing as the explanatory variable​ increases, consider what happens when you split the residual plot into two pieces. The value of R2 for the values where there is little spread will be less than the value of R2 for the values where there is a lot of spread. This causes the linear model to be less reliable in some values of the explanatory variable. Use the information above to determine if the residual plot confirms that the relation between time between eruptions and length of eruption is linear. ​(c) The coefficient of determination is found to be 88.0%. Provide an interpretation of this value. The coefficient of determination measures the percentage of total variation in the response variable that is explained by the​ least-squares regression line. Two formulas for R2 are shown below. R^2=explained variation / total variation= 1 − unexplained variation / total variation A large value​ (close to 1 or​ 100%) of R2 implies that the unexplained variation is a small portion of the total variation.​ Conversely, a large value of R2 implies that the explained variation is a large portion of the total variation. Use the information above to draw a conclusion regarding the relationship between the given value of R2 and the variation in the​ least-squares regression equation. Remember that this coefficient is based on the variation of the response​ variable, not the explanatory variable.

4.4 What is meant by a marginal​ distribution? What is meant by a conditional​ distribution? 1. What is meant by a marginal​ distribution? A. A marginal distribution is a frequency or relative frequency distribution of either the row or column variable in a contingency table. B. A marginal distribution is the relative frequency of each category of one​ variable, given a specific value of the other variable in a contingency table. C. A marginal distribution is the relative distribution of both row or column variables in the contingency table. D. A marginal distribution is the effect of either row variable or the column variable in the contingency table. 2. What is meant by a conditional​ distribution? A. A conditional distribution is the relative distribution of both row or column variables in the contingency table. B. A conditional distribution is a frequency or relative frequency distribution of either the row or column variable in a contingency table. C. A conditional distribution is the relative association between two categorical variables in the contingency table. D. A conditional distribution lists the relative frequency of each category of the response​ variable, given a specific value of the explanatory variable in a contingency table.

1) A. A marginal distribution is a frequency or relative frequency distribution of either the row or column variable in a contingency table. Note: A relative frequency distribution of either the row or column variable in a contingency table 2) D. A conditional distribution lists the relative frequency of each category of the response​ variable, given a specific value of the explanatory variable in a contingency table. Note: A conditional distribution lists the relative distribution of each category of each response​ variable, given a specific value of the explanatory variable. It is not the relative association between two categorical variables in a contingency table.

4.3 Fill in the blanks. Total deviation=​_______ deviation+​_______ deviation Choose the correct answer below. A. Total deviation=unexplained deviation+explained deviation B. Total deviation=positively associated deviation+negatively associated deviation C. Total deviation=predicted deviation+observed deviation D. Total deviation=response deviation+explanatory deviation

A. Total deviation=unexplained deviation+explained deviation Note: Two variables that are linearly related are positively associated​ if, whenever the value of one variable​ increases, the value of the other variable also increases. Two variables are negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable increases. The deviation between the predicted and mean values of the response variable is called the explained​ deviation, so explained deviation is y−y. The deviation between the observed and predicted values of the response variable is called the unexplained​ deviation, so unexplained deviation is y−y. ​Therefore, total deviation equals unexplained deviation plus explained deviation. y−y=y−y+y−y

4.1 What is the difference between univariate data and bivariate​ data? Choose the correct answer below. A. In univariate​ data, there is one mean. In bivariate​ data, there are two means. B. In univariate​ data, the data are qualitative. In bivariate​ data, the data are quantitative. C. In univariate​ data, there are only positive values and zeros. In bivariate​ data, there are positive​ values, negative​ values, and zeros. D. In univariate​ data, a single variable is measured on each individual. In bivariate​ data, two variables are measured on each individual.

D. In univariate​ data, a single variable is measured on each individual. In bivariate​ data, two variables are measured on each individual.

4.3 The following data represent the time between eruptions and the length of eruption for 8 randomly selected geyser eruptions. Complete parts​ (a) through​ (c) below. Click here to view a scatter plot of the data Click here to view a residual plot of the data. ​Time, x: Length, y: Time, x: Length, y 12.17: 1.89: 11.74: 1.78 11.85: 1.79: 12.24: 1.90 11.99: 1.82: 11.55: 1.69 12.13: 1.81: 11.69: 1.80 11.29: 1.63 ​(a) What type of relation appears to exist between time between eruptions and length of​ eruption? A. ​Linear, negative association B. ​Linear, positive association C. A nonlinear pattern. D. No association. (b) Does the residual plot confirm that the relation between time between eruptions and length of eruption is​ linear? A. Yes. The plot of the residuals shows a discernible​ pattern, implying that the explanatory and response variables are linearly related. B. No. The plot of the residuals shows no discernible​ pattern, implying that the explanatory and response variables are not linearly related. C. Yes. The plot of the residuals shows no discernible​ pattern, so a linear model is appropriate. D. No. The plot of the residuals shows that the spread of the residuals is increasing or​ decreasing, violating the requirements of a linear model. ​(c) The coefficient of determination is found to be 88.4%. Provide an interpretation of this value. The least squares regression line explains ______​% of the variation in _________. ​(Type an integer or a decimal. Do not​ round.)

(a) B. Linear, positive association (b) C. Yes. The plot of the residuals shows no discernible​ pattern, so a linear model is appropriate. Note: Recall that a residual is the difference between the observed value of y and the predicted​ value, y. The purpose of a residual plot is to analyze the unexplained variation present in a data sample. Consider what a trend in the unexplained variation would say about the appropriateness of a linear model. (c) The least squares regression line explains 88.4​% of the variation in length of eruption.

4.3 The data to the right represent the number of chocolate chips per cookie in a random sample of a name brand and a store brand. Complete parts ​(a) to ​(c) below. Name Brand: 29 26 20 23 25 22 30 27 22 21 25 20 33 Store Brand: 23 28 26 27 20 31 24 18 21 16 20 33 15 ​(a) Draw​ side-by-side boxplots for each brand of cookie. Label the boxplots​ "N" for the name brand and​ "S" for the store brand. Choose the correct answer below. A. Two stacked horizontal boxplots are above a horizontal number line from 10 to 40 in increments of 2. A boxplot labeled "N" consists of a box extending from 22 to 28 with a vertical line segment through the box at 25 and two horizontal line segments extending from the left and right sides of the box to 15 and 38, respectively. A boxplot labeled "S" consists of a box extending from 19 to 28 with a vertical line segment through the box at 23 and two horizontal line segments extending from the left and right sides of the box to 13 and 36, respectively. All values are approximate. B. Two stacked horizontal boxplots are above a horizontal number line from 10 to 40 in increments of 2. A boxplot labeled "N" consists of a box extending from 20 to 33 with a vertical line segment through the box at 28 and two horizontal line segments extending from the left and right sides of the box to 19 and 38, respectively. A boxplot labeled "S" consists of a box extending from 15 to 33 with a vertical line segment through the box at 28 and two horizontal line segments extending from the left and right sides of the box to 11 and 39, respectively. All values are approximate. C. Two stacked horizontal boxplots are above a horizontal number line from 10 to 40 in increments of 2. A boxplot labeled "N" consists of a box extending from 22 to 28 with a vertical line segment through the box at 25 and two horizontal line segments extending from the left and right sides of the box to 20 and 33, respectively. A boxplot labeled "S" consists of a box extending from 19 to 28 with a vertical line segment through the box at 23 and two horizontal line segments extending from the left and right sides of the box to 15 and 33, respectively. All values are approximate. (b) Does there appear to be a difference in the number of chips per​ cookie? A. Yes. The store brand appears to have more chips per cookie. B. Yes. The name brand appears to have more chips per cookie. C. No. There appears to be no difference in the number of chips per cookie. D. There is insufficient information to draw a conclusion. (c) Does one brand have a more consistent number of chips per​ cookie? A. No. Both brands have roughly the same number of chips per cookie. B. Yes. The name brand has a more consistent number of chips per cookie. C. Yes. The store brand has a more consistent number of chips per cookie. D. There is insufficient information to draw a conclusion.

(a) C. Two stacked horizontal boxplots are above a horizontal number line from 10 to 40 in increments of 2. A boxplot labeled "N" consists of a box extending from 22 to 28 with a vertical line segment through the box at 25 and two horizontal line segments extending from the left and right sides of the box to 20 and 33, respectively. A boxplot labeled "S" consists of a box extending from 19 to 28 with a vertical line segment through the box at 23 and two horizontal line segments extending from the left and right sides of the box to 15 and 33, respectively. All values are approximate. (b) B. Yes. The name brand appears to have more chips per cookie. (c) B. Yes. The name brand has a more consistent number of chips per cookie.

4.3 Match the coefficient of determination to the scatter diagram. The scales on the​ x-axis and​ y-axis are the same for each scatter diagram. (a) R2=0.58​, (b) R2=1​, (c) R2=0.94 I) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains a line that falls from left to right and 14 plotted points, all of which are exactly on the line. II) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains 19 plotted points and a line that rises from left to right. The points generally follow the pattern of the line with average vertical spread that increases from left to right from about one tenth the length of the vertical axis to nine tenths the length of the vertical axis. III) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains a line that falls from left to right and 28 plotted points that generally follow the pattern of the line with average vertical spread of about one sixth the length of the vertical axis. ​(a) Scatter diagram ______. ​(b) Scatter diagram ______. ​(c) Scatter diagram _______.

(a) Scatter diagram II. ​(b) Scatter diagram I. ​(c) Scatter diagram III. Note: The coefficient of​ determination, R^2​, measures the percentage of total variation in the response variable that is explained by the least squares regression line. The coefficient of determination is equal to the linear correlation coefficient squared.

4.1 example Match the linear correlation coefficient to the scatter diagram. The scales on the​ x- and​ y-axes are the same for each diagram. ​(a) r=0.787 ​(b) r=1 ​(c) r=0.049 ​(d) r=−0.946 Click the icon to view the scatter diagrams.

(a) The linear correlation coefficient r is a measure of the strength of the correlation and also provides information about whether the correlation is negative or positive. The linear correlation coefficient is always between −1 and​ 1, inclusive. If the correlation coefficient is​ positive, the correlation is positive. If the correlation coefficient is​ negative, the correlation is negative. The value of the correlation coefficient is positive. Compare the correlation to +​1, −​1, and 0. The closer r is to +1 or −​1, the stronger the evidence of association between the two variables. The correlation coefficient r=0.787 is between 0 and 1, but not close to either. If r=+1 then a perfect positive linear relation exists between the two variables. If r=−1 then a perfect negative linear relation exists between the two variables. The closer r is to +1 the stronger is the evidence of positive association between the two variables. The closer r is to −1 the stronger is the evidence of negative association between the two variables. If r is close to​ 0, then little or no evidence exists of a linear relation between the two variables. ​Thus, there is a strong positive relationship between the two variables. Match the linear correlation coefficient to the scatter diagram. If two graphs are​ similar, it may help to examine the other correlation coefficients to choose the correct graph for r=0.787. The correct graph is shown. A scatter diagram with a horizontal axis labeled Explanatory and a vertical axis labeled Response contains 8 plotted points that generally follow the pattern of a straight line that rises from left to right with average vertical spread of about one half the vertical distance between the highest and lowest points. ​(b) Similar to part​ (a), compare the correlation to +​1, −​1, and 0. The correlation coefficient r=1 is exactly 1. ​Thus, there is a perfect positive relationship between the two variables. Match the linear correlation coefficient to the scatter diagram. If two graphs are​ similar, it may help to examine the other correlation coefficients to choose the correct graph for r=1. The correct graph is shown. A scatter diagram with a horizontal axis labeled Explanatory and a vertical axis labeled Response contains 7 plotted points that exactly follow the pattern of a straight line that rises from left to right. ​(c) Compare the correlation to +​1, −​1, and 0. The correlation coefficient r=0.049 is positive but very close to 0. ​Thus, there is almost no relationship between the two variables. Match the linear correlation coefficient to the scatter diagram. If two graphs are​ similar, it may help to examine the other correlation coefficients to choose the correct graph for r=0.049. The correct graph is shown. A scatter diagram with a horizontal axis labeled Explanatory and a vertical axis labeled Response contains 11 plotted points that generally rise from left to right with average vertical spread of about three quarters the vertical distance between the highest and lowest points. ​(d) Compare the correlation to +​1, −​1, and 0. The correlation coefficient r=−0.946 is very close to −1. ​Thus, there is a strong negative relationship between the two variables. Match the linear correlation coefficient to the scatter diagram. If two graphs are​ similar, it may help to examine the other correlation coefficients to choose the correct graph for r=−0.946. The correct graph is shown. A scatter diagram with a horizontal axis labeled Explanatory and a vertical axis labeled Response contains 10 plotted points that closely follow the pattern of a straight line that falls from left to right with average vertical spread of about one third the vertical distance between the highest and lowest points.

4.3 The time it takes for a planet to complete its orbit around a particular star is called the​ planet's sidereal year. The sidereal year of a planet is related to the distance the planet is from the star. The accompanying data show the distances of the planets from a particular star and their sidereal years. Complete parts ​(a) through ​(e) below. Click here to view the data table. Click here to view the table of critical values of the correlation coefficient. ​(a) Draw a scatter diagram of the data treating distance from the star as the explanatory variable. Choose the correct graph below. A. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Sidereal Year from 0 to 250 in increments of 25. The following 9 points are plotted: (50, 0); (50, 0); (100, 0); (150, 0); (500, 10); (900, 30); (1800, 85); (2800, 165); (3700, 245). The points follow the pattern of a curve rising from left to right at a slightly increasing rate. All coordinates are approximate. B. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Sidereal Year from 0 to 250 in increments of 25. The following 9 points are plotted: (50, 245); (50, 165); (100, 85); (150, 50); (500, 50); (900, 50); (1800, 50); (2800, 50); (3700, 50). The points follow the pattern of a curve falling from left to right at a rapidly decreasing rate. All coordinates are approximate. C. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Sidereal Year from 0 to 250 in increments of 25. The following 9 points are plotted: (50, 245); (50, 165); (100, 85); (150, 30); (500, 10); (900, 0); (1800, 0); (2800, 0); (3700, 0). The points follow the pattern of a curve falling from left to right at a rapidly decreasing rate. All coordinates are approximate. D. A coordinate system has a horizontal axis labeled Sidereal Year from 0 to 250 in increments of 125 and a vertical axis labeled Distance (millions of miles) from 0 to 4000 in increments of 500. The following 9 points are plotted: (0, 50); (0, 50); (0, 100); (0, 150); (10, 500); (30, 900); (85, 1800); (165, 2800); (245, 3700). The points follow the pattern of a curve rising from left to right at a slightly decreasing rate. All coordinates are approximate. ​(b) Determine the correlation between distance and sidereal year. 1. The correlation between distance and sidereal year is _____ ​(Round to three decimal places as​ needed.) 2. Does this imply a linear relation between distance and sidereal​ year? a. No b. Yes ​(c) Compute the​ least-squares regression line. y= ____ x + ( _____) ​(Round the slope to three decimal places as needed. Round the intercept to one decimal place as​ needed.) ​(d) Plot the residuals against the distance from the star. Choose the correct graph below. A. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from negative 25 to 25 in increments of 5. A dashed horizontal line passes through (0, 0). The following 9 points are plotted: (50, 0); (50, 0); (100, 0); (150, 0); (500, 0); (900, 0); (1800, 0); (2800, 0); (3700, 0). The points follow the pattern of a horizontal line. All coordinates are approximate. B. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from 0 to 500 in increments of 50. The following 9 points are plotted: (50, 100); (50, 70); (100, 50); (150, 20); (500, 50); (900, 250); (1800, 390); (2800, 20); (3700, 340). The points generally rise from left to right. All coordinates are approximate. C. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from negative 25 to 25 in increments of 5. A dashed horizontal line passes through (0, 0). The following 9 points are plotted: (50, negative 10); (50, negative 8.5); (100, negative 7); (150, negative 5); (500, 7); (900, 16); (1800, 20); (2800, 4.5); (3700, negative 18.5). The points follow the pattern of a parabola that opens downward. All coordinates are approximate. D. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from negative 25 to 25 in increments of 5. A dashed horizontal line passes through (0, 0). The following 9 points are plotted: (50, 10); (50, 8.5); (100, 7); (150, 5); (500, negative 7); (900, negative 16); (1800, negative 20); (2800, negative 4.5); (3700, 18.5). The points follow the pattern of a parabola that opens upward. All coordinates are approximate. ​(e) Do you think the​ least-squares regression line is a good​ model? Why? A. ​Yes, because the scatterplot appears to be a straight line. B. ​No, because the residuals do not form a pattern. C. ​Yes, because the residuals do not form a pattern. D. ​Yes, because the residuals form a pattern. E. ​No, because the residuals form a pattern.

(a)A. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Sidereal Year from 0 to 250 in increments of 25. The following 9 points are plotted: (50, 0); (50, 0); (100, 0); (150, 0); (500, 10); (900, 30); (1800, 85); (2800, 165); (3700, 245). The points follow the pattern of a curve rising from left to right at a slightly increasing rate. All coordinates are approximate. (b) 1} 0.990 2} b. Yes Note: Answer is Yes because the linear correlation coefficient is greater than the critical value for the given sample size. (c) y=0.065x + (−12.3) Note: The equation of the​ least-squares regression line is given by y=b1x+b0 where b1=r•sy / sx is the slope of the​ least-squares regression line and b0=y−b1x is the​ y-intercept of the​ least-squares regression line. Note that x is the sample mean and sx is the sample standard deviation of the explanatory variable​ x, and y is the sample mean and sy is the sample standard deviation of the response variable y. Use technology or the formula to find the regression line. Remember to round to one decimal place. (d) D. A coordinate system has a horizontal axis labeled Distance (millions of miles) from 0 to 4000 in increments of 1000 and a vertical axis labeled Residual from negative 25 to 25 in increments of 5. A dashed horizontal line passes through (0, 0). The following 9 points are plotted: (50, 10); (50, 8.5); (100, 7); (150, 5); (500, negative 7); (900, negative 16); (1800, negative 20); (2800, negative 4.5); (3700, 18.5). The points follow the pattern of a parabola that opens upward. All coordinates are approximate. (e)E. No, because the residuals form a pattern.

4.2 A data set is given below. ​(a) Draw a scatter diagram. Comment on the type of relation that appears to exist between x and y. ​(b) Given that x=3.5000​, sx=2.5100​, y=4.0500​, sy=1.7085​, and r=−0.9538​, determine the​ least-squares regression line. ​(c) Graph the​ least-squares regression line on the scatter diagram drawn in part​ (a). x 0 1 4 4 6 6 y 5.7 6.2 4.4 3.6 2.0 2.4 ​(a) Choose the correct graph below. A.A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,5.8), (1,6.2), (4,4.4), (4,3.6), (6,2), (6,2.4). All vertical coordinates are approximate. B.A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,1.4), (1,0.8), (4,2.6), (4,3.4), (6,5), (6,4.6). All vertical coordinates are approximate. C.A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,2), (1,2.4), (4,3.6), (4,4.4), (6,5.8), (6,6.2). All vertical coordinates are approximate. D.A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (5.8,0), (6.2,1), (4.4,4), (3.6,4), (2,6), (2.4,6). All horizontal coordinates are approximate. 1. There appears to be a _______ relationship. ​(b) y=negative 0.649−0.649x+6.322 ​(Round to three decimal places as​ needed.) ​(c) Choose the correct graph below. A. 0607xy A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,1.4), (1,0.8), (4,2.6), (4,3.4), (6,5), (6,4.6). A line, rising from left to right, passes through the points (1,0) and (6, 5.2). All vertical coordinates are approximate. The line passes within 2 vertical units of all plotted points. B. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,5.8), (1,6.2), (4,4.4), (4,3.6), (6,2), (6,2.4). All vertical coordinates are approximate. A line falling from left to right passes through the points (1,5.6) and (6,2.4). All vertical coordinates are approximate. The line passes within 2 vertical units of all plotted points. C. 0607xy A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,5.8), (1,6.2), (4,4.4), (4,3.6), (6,2), (6,2.4). A line falling from left to right passes through the points (1,5.6) and (6,2.4). All vertical coordinates are approximate. The line passes within 1 vertical unit of all plotted points. Your answer is correct. D. 0607xy A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,2), (1,2.4), (4,3.6), (4,4.4), (6,5.8), (6,6.2). A line rising from left to right passes through the points (0,2) and (6,6.2). All vertical coordinates are approximate. The line passes within 2 vertical units of all plotted points. Q

(a)A. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 7 in increments of 1. The following six points are plotted, listed from left to right: (0,5.8), (1,6.2), (4,4.4), (4,3.6), (6,2), (6,2.4). All vertical coordinates are approximate. 1. There appears to be a linear, negative relationship.

4.1 For the accompanying data​ set, (a) draw a scatter diagram of the​ data, (b) compute the correlation​ coefficient, and​ (c) determine whether there is a linear relation between x and y. Click the icon to view the data set. Click the icon to view the critical values table. ​(a) Draw a scatter diagram of the data. Choose the correct graph below. A. A scatter diagram has a horizontal x-axis labeled from 0 to 10 in increments of 1 and a vertical y-axis labeled from 0 to 10 in increments of 1. The following 5 points are plotted, listed here from left to right: (6, 3); (6, 4); (7, 6); (7, 7); (9, 5). B. A scatter diagram has a horizontal x-axis labeled from 0 to 10 in increments of 1 and a vertical y-axis labeled from 0 to 10 in increments of 1. The following 5 points are plotted, listed here from left to right: (6, 6); (6, 7); (7, 3); (7, 4); (9, 5). C. A scatter diagram has a horizontal x-axis labeled from 0 to 10 in increments of 1 and a vertical y-axis labeled from 0 to 10 in increments of 1. The following 5 points are plotted, listed here from left to right: (6, 6); (6, 7); (7, 3); (7, 9); (9, 5). D. A scatter diagram has a horizontal x-axis labeled from 0 to 10 in increments of 1 and a vertical y-axis labeled from 0 to 10 in increments of 1. The following 5 points are plotted, listed here from left to right: (6, 3); (6, 4); (7, 1); (7, 7); (9, 5). ​(b) Compute the correlation coefficient. The correlation coefficient is r=_____. ​(Round to three decimal places as​ needed.) (c) Determine whether there is a linear relation between x and y. Because the correlation coefficient is _______ and the absolute value of the correlation​ coefficient, ________​, is______than the critical value for this data​ set, _______, _____ linear relation exists between x and y. ​(Round to three decimal places as​ needed.)

(a)C. A scatter diagram has a horizontal x-axis labeled from 0 to 10 in increments of 1 and a vertical y-axis labeled from 0 to 10 in increments of 1. The following 5 points are plotted, listed here from left to right: (6, 6); (6, 7); (7, 3); (7, 9); (9, 5). Note: A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal​ axis, and the response variable is plotted on the vertical axis. (b) r=−0.274. ​ (c) Because the correlation coefficient is negative and the absolute value of the correlation​ coefficient, 0.274​, is not greater than the critical value for this data​ set, 0.878​, no linear relation exists between x and y.

4.1 The linear correlation between violent crime rate and percentage of the population that has a cell phone is −0.918 for years since 1995. Do you believe that increasing the percentage of the population that has a cell phone will decrease the violent crime​ rate? What might be a lurking variable between percentage of the population with a cell phone and violent crime​ rate? (a) Will increasing the percentage of the population that has a cell phone decrease the violent crime​ rate? Choose the best option below. a. Yes b. No (b)What might be a lurking variable between percentage of the population with a cell phone and violent crime​ rate? A. the economy B. the average cell​ phone's effectiveness as a weapon C. overall cell phone signal strength D. the police

(a)b. No Note: Since there is a lurking variable between the percentage of the population with a cell phone and the violent crime​ rate, the two​ variables' correlation does not imply causation. (b)A. the economy Note: In a strong​ economy, crime rates tend to​ decrease, and consumers are better able to afford cell phones.

4.1 Determine whether the scatter diagram indicates that a linear relation may exist between the two variables. If the relation is​ linear, determine whether it indicates a positive or negative association between the variables. Use this information to answer the following. A scatter diagram has a horizontal axis labeled "Explanatory" from 0 to 10 in increments of 1 and a vertical axis labeled "Response" from 0 to 400 plus in increments of 100. The following 12 approximate points are plotted, listed here from left to right: (0.6, 380); (1.2, 380); (2, 380); (3, 380); (4, 380); (6, 340); (7, 300); (7, 250); (8.2, 200); (8.2, 180); (9.8, 20); (9.8, 50). The first five points follow the general pattern of a horizontal line and the next 7 points follow the general pattern of a straight line that falls from left to right. 1. Do the two variables have a linear​ relationship? A. The data points do not have a linear relationship because they do not lie mainly in a straight line. B. The data points have a linear relationship because they lie mainly in a straight line. C. The data points have a linear relationship because they do not lie mainly in a straight line. D. The data points do not have a linear relationship because they lie mainly in a straight line. 2. If the relationship is linear do the variables have a positive or negative​ association? A. The variables have a positive association. B. The variables have a negative association. C. The relationship is not linear.

1. A. The data points do not have a linear relationship because they do not lie mainly in a straight line. 2. C. The relationship is not linear. Note: Two variables that are linearly related are said to be positively associated​ if, whenever the value of one variable​ increases, the value of the other variable also increases. Two variables that are linearly related are said to be negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable decreases.

4.1 What does it mean to say that two variables are positively​ associated? Negatively​ associated? 1. What does it mean to say that two variables are positively​ associated? A. There is a linear relationship between the variables. B. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable decreases. C. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable increases. D. There is a relationship between the variables that is not linear. 2. What does it mean to say that two variables are negatively​ associated? A. There is a linear relationship between the variables. B. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable decreases. C. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable increases. D. There is a relationship between the variables that is not linear.

1. C. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable increases. Note: Two variables that are linearly related are said to be positively associated when​ above-average values of one variable are associated with​ above-average values of the other variable and​ below-average values of one variable are associated with​ below-average values of the other variable. That​ is, two variables are positively associated​ if, whenever the value of one variable​ increases, the value of the other variable also increases. 2. B. There is a linear relationship between the​ variables, and whenever the value of one variable​ increases, the value of the other variable decreases. Note: Two variables that are linearly related are said to be negatively associated when​ above-average values of one variable are associated with​ below-average values of the other variable and​ below-average values of one variable are associated with​ above-average values of the other variable. That​ is, two variables are negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable decreases.

4.2 Explain what each point on the​ least-squares regression line represents. Choose the correct answer below. A. Each point on the​ least-squares regression line represents the​ y-values that would be considered ideal at that corresponding value of x. B. Each point on the​ least-squares regression line represents the predicted​ y-value at the corresponding value of x. C. Each point on the​ least-squares regression line represents the​ y-value of the data set at that corresponding value of x. D. Each point on the​ least-squares regression line represents one of the points in the data set.

B. Each point on the​ least-squares regression line represents the predicted​ y-value at the corresponding value of x. Note: The equation of the​ least-squares regression line is given below. y=b1x+b0 The notation y is used in the​ least-squares regression line to indicate a predicted value of y for a given value of x.​ Thus, the​ y-value of each point on the least squares regression line represents the expected​ y-value at the corresponding value of​ x, given the data provided.

4.3 Example Analyze the residual plot below and identify​ which, if​ any, of the conditions for an adequate linear model is not met. A residual plot has a horizontal axis labeled Explanatory from less than 5 to 25 plus in increments of 10 and a vertical axis labeled residuals from less than negative negative 20 to negative 20 plus in increments of 10. A horizontal dashed line intersects the vertical axis at 0. A series of plotted points follows a curve that falls from left to right passing through the points (5, 22) and (6, 0) to a minimum at (15, negative 20), and rises from left to right passing through the points (22, 0) and (24, 20). All coordinates are approximate.

Residuals play an important role in determining the adequacy of the linear model. Residuals are analyzed to determine whether a linear model is appropriate to describe the relation between the explanatory and response​ variables, to determine whether the variance of the residuals is​ constant, and to check for outliers. To determine if a linear model is​ appropriate, a residual plot is​ used, which is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis. If a plot of the residuals against the explanatory variable shows a discernible​ pattern, such as a​ curve, then the explanatory and response variable may not be linearly related. The given residual plot shows a​ U-shaped pattern. If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable​ increases, then a strict requirement of the linear model is violated. This requirement is called constant error variance. The spread of the residuals does not change steadily as the explanatory variable increases. Recall that outliers are extreme observations. An outlier can also be thought of as an observation that does not fit the overall pattern of the data. The outliers can be found in a residual plot because the residual will lie far from the rest of the plot. There are no outliers in this residual plot because no residuals lie far from the rest of the plot. ​Thus, the explanatory and response variable may not be linearly related because of the patterned residuals.

4.3 example Match the coefficient of determination to the scatter diagram. The scales on the​ x-axis and​ y-axis are the same for each scatter diagram. (a) R2=0.58​, (b) R2=0.94​, (c) R2=0.01 I) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains a line that falls from left to right and 28 plotted points that generally follow the pattern of the line with average vertical spread of about one sixth the length of the vertical axis. II) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains 19 plotted points and a line that rises from left to right. The points generally follow the pattern of the line with average vertical spread that increases from left to right from about one tenth the length of the vertical axis to nine tenths the length of the vertical axis. III) A scatter diagram with a horizontal axis labeled "Explanatory" and a vertical axis labeled "Response" contains 30 plotted points and a line that falls slightly from left to right. The plotted points generally follow the pattern of the line with average vertical spread of about three fifths the length of the vertical axis.

The coefficient of​ determination, R2​, measures the percentage of total variation in the response variable that is explained by the least squares regression line. The coefficient of determination is equal to the linear correlation coefficient squared. If R2=1 there is a perfect linear relation between the two variables. The closer R2 is to​ 1, the stronger the evidence is of association between the two variables. ​(a) Notice that R2=0.58 is less than R2=0.94 and greater than R2=0.01. ​Therefore, the scatter diagram that matches the coefficient of​ determination, R2=0.58 is graph II. ​(b) Notice that R2=0.94 is greater than R2=0.58 and R2=0.01. ​Therefore, the scatter diagram closest to the​ least-squares regression line matches the coefficient of​ determination, R2=0.94. This is graph I. ​(c) Since R2=0.01 is less than R2=0.94 and R2=0.58​, the scatter diagram associated with this coefficient of determination will be more spread out than the scatter diagrams for the other two. Graph III matches the coefficient of​ determination, R2=0.01.

4.3 A scatter diagram is given with one of the points drawn in blue​ (large point). The line drawn in red​ (dashed line) is the​ least-squares regression line with the point in blue​ (the large​ point) excluded. The line drawn in blue​ (solid line) is the​ least-squares regression line with the point in blue​ (large point) included. On the basis of these​ graphs, do you think the point in blue is​ influential? A scatter diagram has a horizontal axis labeled from 0 to 25 in increments of 1 and a vertical axis labeled from 0 to 25 in increments of 1. A series of plotted points closely follow a dashed line that rises from left to right passing through the points (0, 9.3) and (20, 19.4). A solid line rises from left to right and passes through the points (0, 9.8) and (20, 19). All of the plotted points closely follow the solid line. A large blue point is plotted at (19, 17). All coordinates are approximate. Is the point in blue​ (large point)​ influential? a. No​, because the point does not significantly affect the​ least-squares regression line. b. Yes​, because the point does not significantly affect the​ least-squares regression line. c. No​, because the point significantly affects the​ least-squares regression line. d. Yes​, because the point significantly affects the​ least-squares regression line.

a. No​, because the point does not significantly affect the​ least-squares regression line.

4.1 True or​ false: Correlation implies causation. Choose the correct answer below. a. The statement is false. Correlation never implies causation. b. The statement is true. Correlation always implies causation. c. The statement is false. Correlation can only be used to imply causation as a result of a properly designed experiment. d. The statement is false. Correlation can only be used to imply causation as a result of an observational study.

c. The statement is false. Correlation can only be used to imply causation as a result of a properly designed experiment. Note: In​ general, correlation does not imply causation. If data used in a study are​ observational, it cannot concluded the two correlated variables have a causal relationship. Correlation can only imply causation as the result of a properly designed experiment

4.4 In an effort to gauge how the​ country's population feels about the​ immigration, researchers surveyed adult citizens. One question asked​ was, "On the​ whole, do you think immigration is a good thing or a bad thing for this country​ today?" The results of the​ survey, by​ ethnicity, are given in the acompanying table. Complete parts​ (a) through​ (f). (f) Is ethnicity associated with opinion regarding​ immigration? If​ so, how? Choose the correct answer below. A. ​No, ethnicity is not associated with opinion regarding immigration. B. ​Yes, ethnicity is associated with opinion regarding immigration. Hispanics are more likely to feel that immigration is a bad thing for the country and much less likely to feel it is a good thing. C. ​Yes, ethnicity is associated with opinion regarding immigration. Hispanics are more likely to feel that immigration is a good thing for the country and much less likely to feel it is a bad thing.

check notebook Note: A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table. To create a marginal distribution for a​ variable, calculate the row and column totals for each category of the variable. The row totals for each category are found by adding up the cell entries in each column of that row. The row totals represent the distribution of the row variable. The column totals for each category are found by adding up the cell entries in each row of that column. The column totals represent the distribution of the column variable. The relative frequency marginal distribution for the row variable opinion is found by dividing the row total for each opinion by the table total. The relative frequency marginal distribution for the column​ variable, ethnicity, is found by dividing the column total for each ethnicity by the table total. (f) C. Yes, ethnicity is associated with opinion regarding immigration. Hispanics are more likely to feel that immigration is a good thing for the country and much less likely to feel it is a bad thing. Note: Note that for​ Hispanics, the relative frequency that corresponds to citizens who feel that immigration is a good thing for the country is larger than for the other ethnicities and the relative frequency that corresponds to citizens who feel that immigration is a bad thing for the country is lower than the other ethnicities. Use the information to determine whether ethnicity is associated with opinion regarding immigration.

4.3 The​ _______, R2​, measures the proportion of total variation in the response variable that is explained by the least squares regression line.

coefficient of determination Note: The proportion of variation in the response variable that is explained by the​ least-squares regression line is called the coefficient of​ determination, R2.

Analyze the residual plot below and identify​ which, if​ any, of the conditions for an adequate linear model is not met. A residual plot has a horizontal axis labeled Explanatory from less than 5 to 25 plus in increments of 10 and a vertical axis labeled residuals from less than negative negative 2 to negative 2 plus in increments of 1. A horizontal dashed line intersects the vertical axis at 0. A series of plotted points is generally between horizontal coordinates 5 and 25 and between vertical coordinates negative 2 and 2. There are two plotted points at left parenthesis 17 comma 3.9 right parenthesis and left parenthesis 19 comma 4 right parenthesis. All coordinates are approximate. Which of the conditions below might indicate that a linear model would not be​ appropriate? a. None b. Patterned residuals c. Constant error variance d. Outlier

d. Outlier Note: If a plot of the residuals against the explanatory variable shows a discernible​ pattern, such as a​ curve, then the explanatory and response variable may not be linearly related. If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable​ increases, then a strict requirement of the linear model is violated. This requirement is called constant error variance. Recall that outliers are extreme observations. An outlier can also be thought of as an observation that does not fit the overall pattern of the data.

4.1 Will the following variables have positive​ correlation, negative​ correlation, or no​ correlation? outside temperature and the number of people wearing coats number of doctors and number of administrators at a hospital size of TV in living room and amount for heating bill a. negative correlation b. positive correlation c. no correlation Drag each of the​ r-values given above into the appropriate area below. 1. outside temperature and the number of people wearing coats 2. number of doctors and number of administrators at a hospital 3. size of TV in living room and amount for heating bill

negative correlation: outside temperature and the number of people wearing coats.. positive correlation: number of doctors and number of administrators at a hospital no correlation: size of TV in living room and amount for heating bill Note: Two variables that are linearly related are said to be positively associated when​ above-average values of one variable are associated with​ above-average values of the other variable and​ below-average values of one variable are associated with​ below-average values of the other variable. That​ is, two variables are positively associated​ if, whenever the value of one variable​ increases, the value of the other variable also increases. Two variables that are linearly related are said to be negatively associated when​ above-average values of one variable are associated with​ below-average values of the other variable. That​ is, two variables are negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable decreases

4.2 example A data set is given below. ​(a) Draw a scatter diagram. Comment on the type of relation that appears to exist between x and y. ​(b) Given that x=3.8333​, sx=1.9408​, y=4.0500​, sy=1.2454​, and r=−0.8895​, determine the​ least-squares regression line. ​(c) Graph the​ least-squares regression line on the scatter diagram drawn in part​ (a). x: 1 2 4 5 5 6 y: 5.0 5.7 4.7 3.3 2.9 2.7​

​(a) A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal​ axis, and the response variable is plotted on the vertical axis. The scatter diagram is given to the right. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 6 in increments of 1. The following six points are plotted, listed from left to right: (1,5), (2,5.8), (4,4.8), (5,3.4), (5,3), (6,2.8). All vertical coordinates are approximate. Two variables that are linearly related are positively associated​ if, whenever the value of one variable​ increases, the value of the other variable tends to also increase. ​ Similarly, two variables that are linearly related are negatively associated​ if, whenever the value of one variable​ increases, the value of the other variable tends to also decrease. The scatter diagram is shown to the right. There appears to be a linear association. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 6 in increments of 1. The following six points are plotted, listed from left to right: (1,5), (2,5.8), (4,4.8), (5,3.4), (5,3), (6,2.8). All vertical coordinates are approximate. The scatter diagram is shown to the right. The association appears to be negative. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 6 in increments of 1. The following six points are plotted, listed from left to right: (1,5), (2,5.8), (4,4.8), (5,3.4), (5,3), (6,2.8). All vertical coordinates are approximate. ​(b) The equation of the​ least-squares regression line is given by y=b1x+b0 where b1=r•sysx is the slope of the​ least-squares regression line and b0=y−b1x is the​ y-intercept of the​ least-squares regression​ line, where x is the sample mean for​ x, sx is the sample standard deviation of​ x, y is the sample mean for​ y, sy is the sample standard deviation of​ y, and r is the linear correlation coefficient. While either the formula or technology can be used to calculate the​ least-squares regression​ line, for the purposes of this​ exercise, use the formulas.​ First, calculate b1 since it is required for the calculation of b0​, rounding to three decimal places. b1 = r•sy/sx b1 = −0.8895•1.2454 / 1.9408 b1 = −0.571 Use b1=−0.571 to calculate b0​, rounding to three decimal places. b0 = y−b1 x b0 = 4.0500−(−0.571)•(3.8333) b0 = 6.239 ​Therefore, the​ least-squares regression line is y=−0.571x+6.239. ​(c) Plot the line y=−0.571x+6.239 on the same diagram as the scatter diagram in part​ (a). The resulting graph is shown to the right. A scatter diagram has a horizontal x-axis labeled from 0 to 6 in increments of 1 and a vertical y-axis labeled from 0 to 6 in increments of 1. The following six points are plotted, listed from left to right: (1,5), (2,5.8), (4,4.8), (5,3.4), (5,3), (6,2.8). A line falling from left to right passes through the points (1,5.6) and (6,2.8). All vertical coordinates are approximate. The line passes within 1 vertical unit of all plotted points.

4.2 A pediatrician wants to determine the relation that exists between a​ child's height,​ x, and head​ circumference, y. She randomly selects 11 children from her​ practice, measures their heights and head​ circumferences, and obtains the accompanying data. Complete parts​ (a) through​ (g) below. Click the icon to view the​ children's data. ​(a) Find the​ least-squares regression line treating height as the explanatory variable and head circumference as the response variable. y=nothingx+ ​(Round the slope to three decimal places and round the constant to one decimal place as​ needed.)

​(a) Find the​ least-squares regression line treating height as the explanatory variable and head circumference as the response variable. y=0.1700.170x+13.113.1 ​(Round the slope to three decimal places and round the constant to one decimal place as​ needed.) ​(b) Interpret the slope and​ y-intercept, if appropriate. First interpret the slope. Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. A. For a head circumference of 0​ inches, the height is predicted to be nothing in. ​(Round to three decimal places as​ needed.) B. For every inch increase in head​ circumference, the height increases by nothing ​in., on average. ​(Round to three decimal places as​ needed.) Your answer is not correct. C. For every inch increase in​ height, the head circumference increases by 0.1700.170 ​in., on average. ​(Round to three decimal places as​ needed.) D. For a height of 0​ inches, the head circumference is predicted to be nothing in. ​(Round to three decimal places as​ needed.) E. It is not appropriate to interpret the slope. Interpret the​ y-intercept, if appropriate. Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. A. For every inch increase in head​ circumference, the height increases by nothing ​in., on average. ​(Round to one decimal place as​ needed.) B. For a head circumference of 0​ inches, the height is predicted to be nothing in. ​(Round to one decimal place as​ needed.) Your answer is not correct. C. For every inch increase in​ height, the head circumference increases by nothing ​in., on average. ​(Round to one decimal place as​ needed.) D. For a height of 0​ inches, the head circumference is predicted to be nothing in. ​(Round to one decimal place as​ needed.) E. It is not appropriate to interpret the​ y-intercept. This is the correct answer. ​(c) Use the regression equation to predict the head circumference of a child who is 24.5 inches tall. y=17.2717.27 in. ​(Round to two decimal places as​ needed.) ​(d) Compute the residual based on the observed head circumference of the 24.5​-inch-tall child in the table. Is the head circumference of this child above or below the value predicted by the regression​ model? The residual for this observation is negative 0.11−0.11​, meaning that the head circumference of this child is below the value predicted by the regression model. ​(Round to two decimal places as​ needed.) ​(e) Draw the​ least-squares regression line on the scatter diagram of the data and label the residual from part​ (d). Choose the correct graph below. A. 24291618Residual A scatter diagram has a horizontal axis from 24 to 29 in increments of 0.5 and a vertical axis from 16 to 18 in increments of 0.2. A line rises from left to right and passes through the points (24, 17) and (29, 17.9). Plotted points are scattered around this line. One of the plotted points is at (24.5, 17.1). The label "Residual" appears near this point. A vertical line extends from this point to the line that rises from left to right. B. 24291618Residual A scatter diagram has a horizontal axis from 24 to 29 in increments of 0.5 and a vertical axis from 16 to 18 in increments of 0.2. A line rises from left to right and passes through the points (24, 17.1) and (29, 18). Plotted points are scattered around this line. One of the plotted points is at (24.5, 17.1) and lies below the line. The label "Residual" appears near this point. A vertical line extends from this point to the line that rises from left to right. Your answer is correct. C. 24291618Residual A scatter diagram has a horizontal axis from 24 to 29 in increments of 0.5 and a vertical axis from 16 to 18 in increments of 0.2. A line rises from left to right and passes through the points (24, 17) and (24.5, 17.1). Plotted points are scattered around this line. One of the plotted points is labeled "Residual" and lies on the line that rises from left to right. D. 24291618Residual A scatter diagram has a horizontal axis from 24 to 29 in increments of 0.5 and a vertical axis from 16 to 18 in increments of 0.2. A horizontal line meets the y-axis at approximately 17.5. Plotted points are scattered around this line. One of the plotted points is at (24.5, 17.1) and lies below the line. The label "Residual" appears near this point. A vertical line extends from this point to the horizontal line. ​(f) Notice that two children are 26.5 inches tall. One has a head circumference of 17.5 ​inches; the other has a head circumference of 17.7 inches. How can this​ be? A. There is no logical explanation for this—the two observations in question should have had the same head circumference. B. For children with a height of 26.5 ​inches, head circumferences vary. This is the correct answer. C. The only explanation is that the difference was caused by measurement error. D. The only explanation is that the difference is due to the fact that one observation was of a​ boy, and one observation was of a girl. Your answer is not correct. ​(g) Would it be reasonable to use the​ least-squares regression line to predict the head circumference of a child who was 32 inches​ tall? Why? A. Yes—the calculated model can be used for any​ child's height. B. No—this height is not possible. C. Yes—this height is possible and within the scope of the model. D. No—this height is outside the scope of the model. Your answer is correct. E. More information regarding the child is necessary to make the decision.


Ensembles d'études connexes

Key Documents & Speeches in U.S. History

View Set

Ch.5 Cost Approach - Cost Estimating

View Set

MKT421T Wk 5 - Apply: Course Post Assessment

View Set

Unit 6: Overcoming Communication Boundaries

View Set

5.75 Oxidative phosphorylation and the chemiosmotic theory

View Set

Unit 5 Progress Check: MCQ Spanish 5 AP Classroom

View Set

Clinical Level Final Mock Exam II

View Set