Exam 2
Is there a relationship between cigarette tar and CO? Yes, as the amount of tar increases the amount of carbon monoxide also increases.
Construct a scatter diagram using the data table to the right. This data is from a study comparing the amount of tar and carbon monoxide (CO) in cigarettes. Use tar for the horizontal scale and use carbon monoxide (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO
residual
For a pair of sample x- and y-values, the ______________ is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation.
No, because Best Actors and Best Actresses typically appear in different movies, so we should not expect the ages to be correlated.
Should we expect that there would be a correlation?
Data > bin use cut points stat table frequency
The data represents the body mass index (BMI) values for 20 females. Construct a frequency distribution beginning with a lower class limit of 15.0 and use a class width of 6.0.
steam and leaf plot
The data represents the heights of eruptions by a geyser. Use the heights to construct a stemplot. Identify the two values that are closest to the middle when the data are sorted in order from lowest to highest.
Are there any outliers? Yes, the volume of 50 oz appears to be an outlier because it is far away from the other volumes
The data table to the right represents the volumes of a generic soda brand. Complete parts (a) and (b) below.
Do the data appear to have a distribution that is approximately normal? No, it is not symmetric.
The table below shows the frequency distribution of the rainfall on 52 consecutive Mondays in a certain city. Use the frequency distribution to construct a histogram. Do the data appear to have a distribution that is approximately normal?
Does the histogram appear to depict data that have a normal distribution? The histogram appears to depict a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is roughly symmetric.
The table below shows the frequency distribution of the weights (in grams) of pre-1964 quarters. Use the frequency distribution to construct a histogram. Does the histogram appear to depict data that have a normal distribution? Why or why not?
A residual is a value of y− y which is the difference between an observed value of y and a predicted value of y. The regression line has the property that the sum of squares of the residuals is the lowest possible sum.
a. What is a residual? b. In what sense is the regression line the straight line that "best" fits the points in a scatterplot?
outliers
are sample values that lie very far away from the majority of the other sample values.
(frequency in category / total frequency) x 100
A frequency table of grades has five classes (A, B, C, D, F) with frequencies of 2, 11, 17, 5, and 1 respectively. Using percentages, what are the relative frequencies of the five classes?
Shape of the distribution
A histogram aids in analyzing the _______ of the data.
least-squares property
A straight line satisfies the __________________ if the sum of the squares of the residuals is the smallest sum possible.
Compare the pie chart found above to the Pareto chart given on the left. Can you determine which graph is more effective in showing the relative importance of job sources? The Pareto chart is more effective.
A study was conducted to determine how people get jobs. The table below lists data from 400 randomly selected subjects.
normal
A(n) ____ distribution has a "bell" shape.
frequency distribution
A _______ helps us understand the nature of the distribution of a data set.
relative frequency
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
Correlation
A __________ exists between two variables when the values of one variable are somehow associated with the values of the other variable.
Do cigarette filters appear to be effective? Yes, because the relative frequency of the higher tar classes is greater for non-filtered cigarettes.
Construct one table that includes relative frequencies based on the frequency distributions shown below, then compare the amounts of tar in nonfiltered and filtered cigarettes. Do the cigarette filters appear to be effective? (Hint: The filters reduce the amount of tar ingested by the smoker.)
Add them all up
Construct the cumulative frequency distribution for the given data.
What is the equation of the regression line? Select the correct choice below and fill in the answer boxes to complete your choice. ^ y= -398 + (135) x What does the symbol y represent? The symbol y represents the predicted value of price.
Different hotels in a certain area are randomly selected, and their ratings and prices were obtained online. Using technology, with x representing the ratings and y representing price, we find that the regression equation has a slope of 135 and a y-intercept of−398. Complete parts (a) and (b) below.
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric
Does the frequency distribution appear to have a normal distribution? Explain.
Is there sufficient evidence to support the claim that there is a linear correlation between the weights of bears and their chest sizes? Choose the correct answer below and, if necessary, fill in the answer box within your choice. Yes, because the absolute value of the test statistic 0.956 (COR COEFF R) exceeds the critical value. When measuring an anesthetized bear, is it easier to measure chest size than weight? If so, does it appear that a measured chest size can be used to predict the weight? Yes, it is easier to measure a chest size than a weight because measuring weight would require lifting the bear onto the scale. The chest size could be used to predict weight because there is a linear correlation between the two.
Fifty-four wild bears were anesthetized, and then their weights and chest sizes were measured and listed in a data set. Results are shown in the accompanying display. Is there sufficient evidence to support the claim that there is a linear correlation between the weights of bears and their chest sizes? When measuring an anesthetized bear, is it easier to measure chest size than weight? If so, does it appear that a measured chest size can be used to predict the weight? Use a significance level of α=0.05.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 28.4%, which is high, so there is not sufficient evidence to conclude that there is a linear correlation between brain volume and IQ score in males. low is
For a data set of brain volumes (cm3) and IQ scores of twelve males, the linear correlation coefficient is found and the P-value is 0.284. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The critical values are +/-. between the critical values, is not
For a data set of brain volumes (cm3) and IQ scores of ten males, the linear correlation coefficient is r=0.118. Use the table available below to find the critical values of r. Based on a comparison of the linear correlation coefficient r and the critical values, what do you conclude about a linear correlation?
regression equation
Given a collection of paired sample data, the ____________________ y=b0+b1x algebraically describes the relationship between the two variables, x and y.
What is the trend? How does this trend compare to the trend for drive-in movie theaters? There appears to be an upward trend, unlike drive-in movie theaters, which have a downward trend.
Given below are the numbers of indoor movie theaters, listed in order by row for each year. Use the given data to construct a time-series graph. What is the trend? How does this trend compare to the trend for drive-in movie theaters?
Bell-shaped
Heights of adult males are normally distributed. If a large sample of heights of adult males is randomly selected and the heights are illustrated in a histogram, what is the shape of that histogram?
The data has a pattern that is not a staight line.
Identify a characteristic of the data that is ignored by the regression line.
There is an influential point that strongly affects the graph of the regression line.
Identify a characteristic of the data that is ignored by the regression line.
Lower Class Limits: numbers on left 20,22,24,26,28,30,32 Upper Class Limits: numbers on right 21,23,25,27,29,31,33 Class width: 2 Class midpoints: 20.5, 22.5, 23.5 Class boundaries: 19.5, 21.5, 23.5, 25.5, 27.5, 29.5, 31.5, 33.5 # of individuals included in the summary: 88
Identify the lower class limits, upper class limits, class width, class midpoints, and class boundaries for the given frequency distribution. Also identify the number of individuals included in the summary. 20-21 31 22-23 33 24-25 14 26-27 2 28-29 5 30-31 1 32-33 2
The outlier will appear as a bar far from all of the other bars with a height that corresponds to a frequency of 1.
If we collect a large sample of blood platelet counts and if our sample includes a single outlier, how will that outlier appear in a histogram?
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
No, a graph cannot help to overcome the deficiency. If the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data.
If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the Internet, can a graph help to overcome the deficiency of having a voluntary response sample?
Among such retractions, does misconduct (fraud, duplication, plagiarism) appear to be a major factor? Yes, misconduct appears to be a major factor because the majority of retractions were due to misconduct.
In a study of retractions in biomedical journals, 494 were due to error, 196 were due to plagiarism, 803 were due to fraud, 304 were due to duplications of publications, and 249 had other causes. Construct a Pareto chart. Among such retractions, does misconduct (fraud, duplication, plagiarism) appear to be a major factor?
relative frequency
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
a nonzero axis
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as _______.
outlier
In a scatterplot, a(n) ______________ is a point lying far away from the other data points.
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
marginal change
In working with two variables related by a regression equation, the _________________ in a variable is the amount that it changes when the other variable changes by exactly one unit.
influential points
Paired sample data may include one or more ___________, which are points that strongly affect the graph of the regression line.
Data > Bin use fixed width bins stat table frequency
Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 3.470 in., and use a class width of 0.010 in. The screws were labeled as having a length of 3 1/2 in.
No. The data values in each class could take on any value between the class limits, inclusive.
Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all of the original service times?
touch
The bars in a histogram ____.
What impression does the graph create? The graph creates the impression that men have salaries that are more than twice the salaries of women. Does the graph depict the data fairly? No, because the vertical scale does not start at zero.
The graph to the right compares teaching salaries of women and men at private colleges and universities. What impression does the graph create? Does the graph depict the data fairly? If not, construct a graph that depicts the data fairly.
Does the graph distort the data? Why or why not? Yes, because the graph incorrectly uses objects of volume to represent the data.
The graph to the right uses cylinders to represent barrels of oil consumed by two countries. Does the graph distort the data or does it depict the data fairly? Why or why not? If the graph distorts the data, construct a graph that depicts the data fairly.
frequency
The heights of the bars of a histogram correspond to _______ values.
Count bars
The histogram to the right represents the weights (in pounds) of members of a certain high-school debate team. How many team members are included in the histogram?
Class width: 120-110= 10
The histogram to the right represents the weights (in pounds) of members of a certain high-school programming team. What is the class width? What are the approximate lower and upper class limits of the first class?
The distribution is normal. The points are reasonably close to a straight line and do not show a systematic pattern that is not a straight-line pattern.
The normal quantile plot shown to the right represents duration times (in seconds) of eruptions of a certain geyser from the accompanying data set. Examine the normal quantile plot and determine whether it depicts sample data from a population with a normal distribution.
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
Does there appear to be a correlation between the president's height and his opponent's height? No, there does not appear to be a correlation because there is no general pattern to the data.
The table provided below shows paired data for the heights of a certain country's presidents and their main opponents in the election campaign. Construct a scatterplot. Does there appear to be a correlation?
linear correlation coefficient r
The ______________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.
No, because while there is no linear correlation, there may be a relationship that is not linear.
Twenty different statistics students are randomly selected. For each of them, their body temperature (°C) is measured and their head circumference (cm) is measured. If it is found that r=0, does that indicate that there is no association between these two variables?
Choose the correct answer below. r is a statistic that represents the value of the linear correlation coefficient computed from the paired sample data, and ρ is a parameter that represents the value of the linear correlation coefficient that would be computed by using all of the paired data in the population of all statistics students. Select the correct choice below and fill in the answer box to complete your choice. The value of r is estimated to be 0, because it is likely that there is no correlation between body temperature and head circumference. Choose the correct answer below. The value of r does not change, because r is not affected by converting all values of a variable to a different scale.
Twenty different statistics students are randomly selected. For each of them, their body temperature (degrees°C) is measured and their head circumference (cm) is measured. a. For this sample of paired data, what does r represent, and what does rhoρ represent? b. Without doing any research or calculations, estimate the value of r. c. Does r change if body temperatures are converted to Fahrenheit degrees?
Using the linear correlation coefficient found in the previous step, determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables. Choose the correct answer below. There is sufficient evidence to support the claim of a linear correlation between the two variables. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot. Choose the correct answer below. The scatterplot reveals a distinct pattern that is not a straight-line pattern.
Use the given data set to complete parts (a) through (c) below. (Use α=0.05.)
The value of r will always have the same sign as the value of b1.
What is the relationship between the linear correlation coefficient r and the slope b1 of a regression line?
scatterplot
When determining whether there is a correlation between two variables, one should use a ____________ to explore the data visually.
pictographs
When drawings of objects are used to depict data, false impressions can be made. These drawings are called _______.
Use the regression line for predictions only if the data go far beyond the scope of the available sample data.
When making predictions based on regression lines, which of the following is not listed as a consideration?
Variation
Which characteristic of data is a measure of the amount that the data values vary?
The linear correlation coefficient r is robust. That is, a single outlier will not affect the value of r.
Which of the following is NOT a property of the linear correlation coefficient r?
If r>1, then there is a positive linear correlation
Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables?
Correlation does not imply causality
Which of the following is NOT one of the three common errors involving correlation?
If |r|>critical value, we should fail to reject the null hypothesis and conclude that there is not sufficient evidence to support the claim of a linear correlation.
Which of the following is NOT true for a hypothesis test for correlation?
The method for regression analysis line is not robust. It is seriously affected by a small departure from a normal distribution.
Which of the following is not a requirement for regression analysis?
dependent variable
Which of the following is not equivalent to the other three?
We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.
Which of the following statements about correlation is true?
less than or equal to
is not