STATS Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

A pharmaceutical company conducts an experiment in which a subject takes 100 mg of a substance orally. The researchers measure how many minutes it takes for one quarter of the substance to exit the bloodstream. What kind of variable is the company​ studying?

Quantitative variable

A study examining the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped. Create appropriate graphical displays for both​ groups, and write a brief report comparing their cholesterol levels. The data are given below.

The median score by Smokers is 55 points lower than the median score for​ Ex-Smokers and the mean score for Smokers is 5.45.4 points higher than the mean score for​ Ex-Smokers. The Smokers had a range of 204204​, while the​ Ex-Smokers had a range of 201201. The Ex-Smokers had a larger standard deviation. The Smoker's distribution has an outlier.

The​ company's annual report​ states, "Our survey shows that 82.65​% of our employees are​ 'very happy' working​ here." Comment on that claim. Use appropriate statistics terminology.

The survey result is a statistic. It estimates the true proportion of satisfied workers in the population.

Traffic checks on a certain section of highway suggest that 85% of drivers are speeding there. Since 0.85×0.85=0.7225​, the multiplication rule might suggest that there is approximately a 72% chance that two vehicles in a row are both speeding.​ What's wrong with that​ reasoning?

There are cases when the speed of one car is not independent of the speed of another​ car, so the multiplication rule does not apply.

A poll question​ asked, "In the next 5 years, do you think that things will get better, stay the same, or get worse?" The possible responses were "Better,""No change," "Worse," "Don't Know," and "No Response." What kind of variable is the​ response?

categorical variable

Suppose you have fit a linear model to some data and now take a look at the residuals. For the possible residuals plot​ shown, tell whether you would try a​ re-expression and, if​ so, why.

re-express to straighten the relationship

The Yale Program on Climate Change Communication surveyed 1263 American adults in March 2015 and asked them about their attitudes on global climate change. A display of the percentages of respondents choosing each of the major alternatives offered is provided. List the errors in this display.

- the percentages do not sum to 100% - there is no title - showing the pie chart on a slant violates the area principle

The accompanying table below shows that as the number of oranges on a tree​ increases, the fruit tends to get smaller. Create a linear model for this relationship and express any concerns you may have.

- the residuals plot is curved

Which matters more about a sample you draw from a​ population?

- the size of the sample

If you create an online​ survey, individuals can choose on their own whether to participate in the sample. This causes a form of bias called​ __________.

- voluntary response

The weights in pounds of a breed of yearling cattle follows the Normal model ​N(1121​,77​). What weight would be considered unusually low for such an animal?

Any weight more than 2 standard deviations below the​ mean, or less than 967 pounds, is unusually low. One would expect to see a steer 3 standard deviations below the​ mean, or less than 890 pounds only rarely.

A study was conducted on shoe sizes of​ students, reported in European sizes. For the​ men, the mean size was 44.16 with a standard deviation of 1.92. To convert European shoe sizes to U.S. sizes for​ men, use the equation shown below. USsize=EuroSize×0.7999−23.3 ​a) What is the mean​ men's shoe size for these responses in U.S.​ units? ​b) What is the standard deviation in U.S.​ units?

a) 12.02 b) 1.54

Load the accompanying data about a particular car race into your preferred statistics package and answer the questions a through c below. ​a) What was the average speed of the winner in 2011​? ​b) How many times did Arie Luyendyk win the race in the the 1990s? ​c) How many races took place during the 1930​s?

a) 170.265 miles/hr b) 2 times c) 10 races

In a study of streams in the Adirondack​ Mountains, the following relationship was found between the​ water's pH and its hardness​ (measured in​ grains). Is it appropriate to summarize the strength of association with a​ correlation?

- The scatterplot is not​ linear; correlation is not appropriate.

A researcher is working with a model that uses the number of rings in an​ Abalone's shell to predict its age. He finds an observation that he believes has been miscalculated. After deleting this​ outlier, he redoes the calculation. Does it appear that this outlier was exerting very much​ influence?

Yes, this observation was influential. After it was​ removed, the slope of the regression line changed by a large amount.

A researcher studies high schools and finds a strong positive linear association between funding and reading scores.

a. ​Yes, schools with higher funding have generally better readers. b. a lurking variable

Disk drives have been getting larger. Their capacity is now often given in terabytes​ (TB) where 1 TB=1000 gigabytes, or about a trillion bytes. A survey of prices for external disk drives found the data shown to the right. Find and interpret value of R2.

- The value of R2=98.98​% indicates the percentage of the variability in the price of these disk drives that can be accounted for by a linear model on the capacity of the drives.

Administrators at Texas​ A&M University were interested in estimating the percentage of students who are on a vegetarian diet. The​ A&M student body has about 42,000 members. How might the administrators answer their question by applying the three Big​ Ideas?

The​ A&M administrators should take a survey. They should sample a part of the student​ body, selecting respondents with a randomization method. They should be sure to draw a sufficiently large sample.

A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level. d) Select a brief description of the association between age and blood pressure among these employees. ​e) Does this prove that​ people's blood pressure levels increase as they​ age? Explain.

d) The percentage of employees having low blood pressure decreases and the percentage having high blood pressure increases as they age. ​e) No, because only a controlled experiment can isolate the relationship between age and blood pressure.

For the following description of​ data, identify the variables and tell whether each should be treated as categorical or quantitative. Ian​ Walker, a psychologist at the University of​ Bath, wondered whether drivers treat bicycle riders differently when they wear helmets. He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him. He then rode on alternating days with and without a helmet. Out of 2500 cars passing​ him, he found that when he wore his​ helmet, motorists passed 3.35 inches closer to​ him, on​ average, than when his head was bare. ​[NY Times​, Dec.​ 10, 2006]

(Distance; Quantitative), (Helmet; Categorical)

Here are several scatterplots. The calculated correlations are 0.777​, 0.006​, −0.923​, and 0.951. Which is​ which?

- (0.006) = parabola - (0.777) = positive linear spread out - (-0.923) = negative - (0.951) = positive linear strong correlation

A​ town's January high temperatures average 34°F with a standard deviation of 8°​,while in July the mean high temperature is 74°and the standard deviation is 9°. In which month is it more unusual to have a day with a high temperature of 56°​? Explain.

It is more unusual to have a day with a high temperature of 56°in January.A high temperature of 56° in January is 2.750 standard deviations above the mean and a high temperature of 56° in July is only 2.000 standard deviations below the mean.

An article reported on a school​ district's magnet school programs. Of the 1885 qualified​ applicants, 592 were black or​ Hispanic, 295 Asian, and 998 white. Summarize the relative frequency distribution of ethnicity with a sentence or two​ (in the proper​ context, of​ course).

Of the qualified​ applicants, 31.431.4​% were black or​ Hispanic, 15.615.6​% were​ Asian, and 52.952.9​% were white.

People with​ z-scores of 2.5 or above on a certain aptitude test are sometimes classified as geniuses. If aptitude test scores have a mean of 100 and a standard deviation of 30 points, what is the minimum aptitude test score needed to be considered a​ genius?

The minimum aptitude test score needed to be considered a genius is 175 points.

A friend says​ "I flipped five heads in a​ row! The next one has to be​ tails!" Explain why this thinking is incorrect.

There is no law of averages for the short run. The first five flips do not affect the sixth flip.

Is there any evidence that an​ animal's gestation period is related to the​ animal's lifespan? The scatterplot shows Gestation Period​ (in days) vs. Life Expectancy​ (in years) for 18 species of mammals. The highlighted point at the far right represents humans. Complete parts a through e. ​a) For these​ data, r=0.540. This is not a very strong relationship. Do you think the association would be stronger if humans were​ removed? Explain. ​b) Is there reasonable justification for removing humans from the data​ set? Explain.

a. Stronger. Both slope and correlation would increase. b. ​Yes, restricting the study to nonhuman animals would justify it. c. The association is moderately strong d. On​ average, for every year increase in life​ expectancy, the gestation period increases by about 12.97 days. ​e) A certain mammal has a life expectancy of about 18 years. Estimate the expected gestation period of this species. 326.3326.3 days

Examine each of the following questions for possible bias. If you think the question is​ biased, indicate how and propose a better question. ​a) Should companies that promote teen smoking be liable to help pay for the costs of cancer institutions​? ​b) Given that 18-year-olds are old enough to smoke​, is it fair to set the drinking age at 21?

a. The question is biased toward ​"yes​" because of the wording "promote teen smoking." A better question may be​ "Should companies be responsible to help pay for the costs of cancer institutions?" b. The question is biased toward​ "no" because of the preamble "18-year-olds are old enough to smoke​." A better question may be​ "Do you think the drinking age should be lowered from 21?​"

Prior to​ graduation, a high school class was surveyed about its plans. The table displays the results for white and minority students​ (the Minority group included​ African-American, Asian,​ Hispanic, and Native American​ students). Complete parts​ a) through​ d).

​d) Do you see any important differences in the​ post-graduation plans of white and minority​ students? There is more than a five percent difference in at least oneof the categories of​ post-graduation plans for white and minority students. There is evidence of an association between race and​ post-graduation plans.

When you sample so that every combination of individuals in your population has an equal chance of being chosen you are taking a​ __________.

- simple random sample

The following data show the percentage change in population for 49 states and the District of Columbia from the 2000 census to the 2010 census. Using appropriate graphical displays and summary​ statistics, write a report on the percentage change in population by state.

- the median is 7.85 - the first quartile, Q1, is 4.5 - the third quartile, Q3, is 41.1 - the IQR is 9.6 - The minimum -0.6 and maximum 35.1. the range is 35.7. - The histogram shows that the distribution of Percent Change is unimodal and skewed right. The states vary from a minimum of negative 0.6−0.6​% to 35.135.1​% growth in the decade. The median was 7.857.85​% and the middle half of the states had growth between 4.54.5​% and 14.114.1​%.

A student wants to determine whether or not a value in her data is an outlier. She has calculated Q1=​4, median=​5, and Q3=10. Where is the upper​ fence?

- 19

The boxplot shows the fuel economy ratings for 67 subcompact cars with the same model year. Some summary statistics are also provided. The extreme outlier is an electric car whose electricity usage is equivalent to 112 miles per gallon. If that electric car is removed from the data​ set, how will the standard deviation be​ affected? The​ IQR?

- How will removing the electric car affect the standard​ deviation? The standard deviation will be much lower. Since the standard deviation is calculated by summing the squared differences between the data values and the​ mean, removing the electric car will drastically lower this sum. - How will removing the electric car affect the​ IQR?The IQR will not change very​ much, if at all. All that removing the electric car can do is possibly change the location of each quartile to be the preceding data​ value, which will not have a huge impact on the IQR.

Concerned about reports of discolored scales on fish caught downstream from a newly sited chemical​ plant, scientists set up a field station in a shoreline public park. For one week they asked fishermen there to bring any fish they caught to the field station for a brief inspection. At the end of the​ week, the scientists said that 40​% of the 135 fish that were submitted for inspection displayed the discoloration. From this​ information, can the researchers estimate what proportion of fish in the river have discolored​ scales? Explain.

- If discolored fish are not equally likely to be caught as normal​ fish, or fisherman are more disposed to bring discolored fish than normal​ fish, then the sample will be biased and resulting estimation will be biased.

A government bureau keeps track of the number of adoptions in each region. The accompanying histogram shows the distribution of adoptions in each region. Would you report the standard deviation or the​ IQR? Explain briefly.

- report the IRQ, since the distribution is skewed

You are trying to study the amount of financial aid students at your University receive. You sample 50 students and find out the average size of their financial aid packages. The average of your sample is a​ __________.

- sample statistic

The table below shows the number of licensed drivers in a state by age and by sex. Complete parts​ a) through​ d).

As age​ increases, the percentage of female drivers increases. ​d) Do​ driver's age and sex appear to be​ independent? Explain. A. No. There is no association between​ driver's age and sex. B. Yes. There is some association between​ driver's age and sex. C. No. There is some association between​ driver's age and sex. This is the correct answer. D. Yes. There is no association between​ driver's age and sex.

An auctioneer sold a herd of cattle whose minimum weight was 920 pounds, median was 1180 pounds, standard deviation 80​, and IQR 104 pounds. They sold for 30 cents a​ pound, and the auctioneer took a $20 commission on each animal.​ Then, for​ example, a steer weighing 1100 pounds would net the owner 0.30​(1100)−20=​$310. Find the​ minimum, median, standard​ deviation, and IQR of the net sale prices.

The minimum price is ​$256.00. ​(Round to the nearest cent as​ needed.) The median price is ​$334.00. ​(Round to the nearest cent as​ needed.) The standard deviation of the prices is ​$24.00. ​(Round to the nearest cent as​ needed.) The IQR of the prices is ​$31.20. ​(Round to the nearest cent as​ needed.)

You purchased a​ five-pack of new light bulbs that were recalled because 21​%of the lights did not work. What is the probability that at least one of your lights is​ defective?

The probability that at least one of the light bulbs is defective is 0.692.

For the following description of​ data, identify the​ W's from​ what's given, name the​ variables, specify for each variable whether its use indicates that it should be treated as categorical or​ quantitative, and, for any quantitative​ variable, identify the units in which it was measured​ (or note that they were not​ provided). In a study appearing in a science​ journal, a research team reports that plants in southern England are flowering earlier in the spring. Records of the first flowering dates for 384 species over a period of 48 years show that flowering has advanced an average of 12 days per​ decade, an indication of climate​ warming, according to the authors. a) who? b) what? c) when? d) where? e) why? f) how? g) the first flowering date variable is- h) the year variable is- i) the flower species variable is-

a) 384 plant species in southern England b) the first flowering dates for 384 plant species in southern England c) cannot be determined d) southern England e) the study was conducted to determine wether plants are flowering earlier in the spring. f) the how for this situation cannot be determined from the given information g) quantitative with unit of days h) quantitative with units of years i) categorical

Hens usually begin laying eggs when they are about 6 months old. Young hens tend to lay smaller​ eggs, often weighing less than the desired minimum weight of 54grams. Complete parts​ a) through​ c) below. ​a) The average weight of the eggs produced by the young hens is 51.7 ​grams, and only 29​% of their eggs exceed the desired minimum weight. If a Normal model is​ appropriate, what would the standard deviation of the egg weights​ be? ​b) By the time these hens have reached the age of one​ year, the eggs they produce average 66.8 grams, and 93​% of them are above the minimum weight. What is the standard deviation for these older​ hens? ​c) Are egg sizes more consistent for the younger hens or the older​ ones? Explain.

a) 4.2 grams b) 8.6 grams c) the egg sizes more consistent for the younger hens because their standard deviation is lower.

Sugar is a major ingredient in many breakfastcereals. The histogram displays the sugar content as a percentage of weight for 48 brands of cereal. The boxplot compares sugar content for adult cereals​ (A) and​ children's cereals​ (C). Complete parts a through c. ​a) What is the range of the sugar contents of these​ cereals? ​b) Describe the shape of the distribution. ​c) What aspect of breakfast cereals might account for this​ shape? ​d) Are all​ children's cereals higher in sugar than adult​ cereals? ​e) Which group of cereals varies more in sugar​ content? Explain.

a) 63% b) bimodial c) Cereals tend to be either very sugary or healthy​ low-sugar cereals. d) yes e) Although the ranges appear to be comparable for both​ groups, the IQR is larger for the adult​ cereals, indicating that​ there's more variability in the sugar content of the middle​ 50% of adult cereals.

A livestock cooperative reports that the mean weight of yearling Angus steers is 1126 pounds. Suppose that the weights of all such animals can be described by a Normal model with a standard deviation of 66 pounds. ​a) What percent of steers weigh over 1050 ​pounds? ​b) What percent of steers weigh under 1300 ​pounds? ​c) What percent of steers weigh between 900 and 1200 ​pounds?

a) 87.5% b) 99.6% c) 86.9%

Data from a recent football season reported the number of yards gained by each of the​ league's 452 receivers. The mean is 276.35 yards, with a standard deviation of 312.81 yards. Complete parts a through c below.

a) According to the Normal model and the​ 68-95-99.7 Rule, what percent of receivers would be expected to gain more yards than 2 standard deviations above the mean number of​ yards? 2.5% b) For these​ data, what does that​ mean? About 10 receiver(s) should gain more than 902 yard(s). This is fewer than the actual amount of receivers that gained this many yards. c) Explain the problem in using a Normal model here. These data are strongly skewed to the​ right, so a Normal model is not appropriate.

Here are boxplots of the points scored during the first 10 games of the season for both Alex and Kelly. ​a) Summarize the similarities and differences in their performance so far. ​b) The coach can take only one player to the state championship. Which one should she​ take? Why?

a) Both girls have the same approximate​ median, but Alex has a larger IQR. b) A and B are both​ possible, depending on the​ coach's preference.

The scatterplot to the right shows that the trend for the interest rate on a​ 3-month bond changed dramatically after​ 1980, so two regression models were fit to the relationship between the rate​ (in %) and the number of years since​ 1950, one for 1950 to 1980 and one for the data from 1980 to 2007. The accompanying display shows the plots of the interest rate on the​ 3-month bond from 1950 to 1980 and from 1980 to 2007 and their corresponding regression models. Complete parts a through d below.

a) How does the model for the data between 1980 and 2007 compare to the one for the data between 1950 and​ 1980? - The two models both fit​ well, but they have very different slopes. ​c) Do you trust this newer predicted​ value? Explain. - ​No, because extrapolating 70 years beyond the beginning of these data would be dangerous and unlikely to be accurate. ​d) Would you use either of these models to predict the interest rate in the​ future? Explain. - It would be best not to predict the value because extrapolating beyond the​ x-values that were used to fit the model can be dangerous.

The full series of data giving the median age at first marriage in the United States for men and women shows the following pattern. Answer parts a through c. ​a) In what way do these data differ from standard time​ series? b) Describe the patterns you see here. c) Do you expect the patterns seen since 1960 to​ continue? Explain.

a) They are time series because they report values over time.​ However, the values are not all equally spaced because the early values are reported only every​ decade, while later values are annual. Your answer is correct. b) Age at first marriage declined in the first part of the 20th​ century, but has been increasing for both men and women since about 1960. Throughout more than a century of​ data, men have typically been older at first marriage than women. c) The increase in age cannot continue indefinitely. The pattern that men tend to be older than women at first marriage may well continue.

A National Vital Statistics Report provides information on deaths by​ age, sex, and race. Below is a link to the displays of the distributions of ages at death for White and Black males. Use these displays to complete parts a through c below. ​a) Describe the overall shapes of these distributions. ​b) How do the distributions​ differ? ​c) Look carefully at the bar definitions. Where do these plots violate the rules for statistical​ graphs? Select all that apply.

a) both distributions are left skewed and unimodal. b) the central for the distribution of Black males is less than the center of the distribution of white males. c) a. the widths of the far left and right bins differ from the widths of the middle bins, the vertical axes do not have the same maximum

Load the accompanying data about the Kentucky Derby into your preferred statistics package and answer the questions a through d below. a) what was the name of the winning horse in 1884? b) when did the length of the race change? c) what was the winning time in 1929? d) only two horses have run the race in less than 2 minutes. Which horses and in what years?

a) buchanan b) 1896 c) 2 minutes and 10.8 seconds, 130.8 seconds d) Secretariat in 1973 and monarchs in 2001

Here are the summary statistics for the weekly payroll of a small​ company: lowest salary=​$250​, mean salary=​$800​, median=​$800​, range=​$1000​, IQR=​$700​, first quartile=​$450​, standard deviation=​$450. a) Do you think the distribution of salaries is​ symmetric, skewed to the​ left, or skewed to the​ right? Explain why. ​b) Between what two values are the middle​ 50% of the salaries​ found?

a) the distribution is symmetric because the mean is equal to the median. b) $450, $1150

The data in the accompanying table are the annual numbers of deaths from floods in the United States for 21 randomly selected years from 1940 through 2017. Find the​ a) mean,​ b) median and​ quartiles, and​ c) range and IQR.

a) the mean is 95.05 b) the median is 82 c) the Q1= 53 d) the Q3= 121 e) the range = 277 f) interquartile range is IQR = 68

The histogram to the right shows the distribution of the prices of plain pizza slices​ (in $) for 308 weeks in a large city. ​a) Is the mean closer to $5.00​, $6.00​, or $7.00​? ​Why? ​b) Is the standard deviation closer to $0.75, $2.50, or $5.00​? Explain.

a) the mean is closest to $6.00 because that is the balancing point of the histogram. b) the standard deviation is closest to $0.75 since that is a typical distance from the mean.

During his 20 seasons in the​ NHL, Wayne Gretzky scored​ 50% more points than anyone who ever played professional hockey. He accomplished this amazing feat while playing in 280 fewer games than Gordie​ Howe, the previous record holder. The number of games Gretzky played during each season is provided with an accompanying​ stem-and-leaf display. Complete parts a through c below. ​a) Would you use the mean or the median to summarize the center of this​ distribution? Why? ​b) Find the median. c) Without actually finding the​ mean, would you expect it to be lower or higher than the​ median? Explain.

a) the median should be used to summarized the center of this distribution because the distribution is skewed b) the median is 79 c) the mean would be lower because the distribution is skewed to the left

Identify the​ W's, name the​ variables, specify for each variable whether its use indicates that it should be treated as categorical or​ quantitative, and, for any quantitative​ variable, identify the units in which it was measured. A listing posted by a sandwich restaurant​ chain's headquarters​ gives, for each of the sandwiches it​ sells, the type of meat in the​ sandwich, the number of​ calories, and the serving size in ounces. The data might be used to assess the nutritional value of the different sandwiches. a) who? b) what? c) when? d) where? e) why? f) how? g) one variable- h) another variable- i) a third variable-

a) the restaurant's sandwiches b) type of meat, number of calories, serving size c) the when is not specified d) the chain's restaurants e) report by the chain's headquarters f) report by the chain's headquarters g) type of meat, categorical, has no units h) number of calories, quantitative, its units are calories i) serving size, quantitive, its units are ounces

Tell what each of the residual plots to the right indicates about the appropriateness of the linear model that was fit to the data.

a- fanned to the left, the fanned pattern indicates that they linear model is not appropriate. The model's predicting power increases as the values of the explanatory variable increases. b- parabola, the curved pattern in the residuals post indicates that the linear model is not appropriate. The relationship is not linear. c- smiley face, the curved pattern in the residuals plot indicate that the linear model is not appropriate. The relationship is not linear.

An internet company conducts a global consumer survey to help multinational companies understand different consumer attitudes throughout the world. Within 30​ countries, the researchers interview 1000 people aged​ 13-65. Their samples are designed so that they get 500 males and 500 females in each country. Complete parts a and b below. a) Are they using a simple random​ sample? Explain. ​b) What kind of design do you think they are​ using?

a. No. It would be nearly impossible to get exactly 500 males and 500 females from every country by random chance. b. A stratified​ sample, stratified by whether the respondent is male or female.

Answer true or false. If​ false, explain briefly. ​a) Some of the residuals from a least squares linear model will be positive and some will be negative. ​b) Least squares means that some of the squares of the residuals are minimized. ​c) We write y to denote the predicted values and y to denote the observed values.

a. The statement is true. b. The statement is false. Least squares means the sum of the squared residuals is minimized. c. The statement is true.

Researchers collected data on the annual mortality rate​ (deaths per​ 100,000) for males in 20 large towns and the water hardness in terms of the calcium concentration​ (parts per​ million, ppm) in the drinking water. ​a) The display to the right shows the relationship between mortality and calcium concentration for these towns. Describe what you see in this​ scatterplot, in context. c) Interpret the slope of this line in context. ​d) Explain the meaning of the​ y-intercept of the line. ​e) The largest residual has a value of 85. Explain what this value means. ​g) Explain the meaning of​ R-squared in this situation.

a. There is a fairly​ strong, negative, linear relationship between calcium concentration and mortality rate. Towns with harder water tended to have lower mortality rates. c. For each additional point in Calcium (ppm)​, the model predicts a decrease of 1.482 points in Mortality. d. The model predicts that a town with 0 ppm calcium concentration would have a mortality rate of 1824.644. e. The town had 85 more deaths per​ 100,000 people than the model predicts. g. 73.3​% of the variability in the mortality can be accounted for by a linear model on calcium concentration.

The scatterplot of the Housing Cost Index versus the Median Family Income for 10 regions of a country is shown on the right. The correlation is 0.69. Complete parts a through f. ​a) Describe the relationship between the Housing Cost Index and Median Family Income by region. Choose the correct answer below. b) If both variables are​ standardized, what would the correlation coefficient between the standardized variables​ be? ​c) If Median Family Income had been measured in thousands of dollars instead of​ dollars, how would the correlation​ change? d) Another region of the country has a housing cost index of 572 and a median income of about $40,000. If this region were to be included in the data​ set, how would that affect the correlation​ coefficient? ​e) Do these data provide proof that by raising the median family income in a​ region, the housing cost index will rise as a​ result? ​f) For these data​ Kendall's tau is 0.55. Does that provide proof that by raising the median income in a​ state, the Housing Cost Index will rise as a​ result? Explain what​ Kendall's tau says and does not say.

a. There is a moderate positive linear association. b. 0.69 c. The correlation coefficient would have the same sign and its magnitude would not change. d. The correlation coefficient would have the same sign and its magnitude would decrease. e. These data do not provide​ proof, since the value of the correlation coefficient cannot prove any causation. f) Tau says that there is an association between median income and housing​ costs, but it makes no claims about the form of this association. It has no requirement that the relationship be linear. Here it appears that the plot​ "thickens" from left to right. That could affect the​ correlation, but not tau.

Perhaps fans are just more interested in teams that win. Below is a correlation table and scatterplot of data from a subset of NationalLeague teams for the 2016 season. Complete parts a through c below. ​a) Do winning teams generally enjoy greater attendance at their home​ games? Describe the association. ​b) Is attendance more strongly associated with winning or scoring​ runs? Explain. ​c) How strongly is scoring more runs associated with winning more​ games?

a. There is a moderate positive relationship between the number of wins and average home attendance. Their correlation of 0.722 is moderate. ​b. The correlation coefficient for winning and home​ attendance, 0.722 is greater than the correlation coefficient for scoring runs and home​ attendance, 0.513. Thus, attendance is more strongly associated with winning. c. The correlation between runs and wins is 0.454.

Examine each of the following questions for possible bias. If you think the question is​ biased, indicate how and propose a better question. ​a) Do you think high school students should be required to wear​ uniforms? ​b) Given​ humanity's great tradition of​ exploration, do you favor continued funding for space​ flights?

a. There is no indication of bias. b. The question may be biased towards yes because of​ "great tradition." A better question would be​ "Do you favor continued funding for the space​ program?"

Consider each of the situations below. Do you think the proposed sampling method is​ appropriate? ​a) We want to know what percentage of local doctors accept patients without medical insurance. We call the offices of 50 doctors randomly selected from local Yellow Pages ads. ​b) We want to know what percentage of local businesses anticipate sales to decrease in the upcoming month. We randomly select a page in the Yellow Pages and call every business listed there.

a. This sampling method is not appropriate. This method will probably result in undercoverage of those doctors who did not purchase a Yellow Pages ad. b. This sampling method is not appropriate. The sample will probably contain listings for only one or two types of​ businesses, resulting in undercoverage.

A least squares regression line was calculated to relate the length​ (cm) of newborn boys to their weight in kg. The line is weight=−6.16+0.1925 length. Explain in words what this model means. Should new parents​ (who tend to​ worry) be concerned if their​ newborn's length and weight​ don't fit this​ equation?

a. What does the given model​ mean? - The weight of a newborn boy can be predicted as −6.16 kg plus 0.1925 kg per cm of length. b. Should new parents​ (who tend to​ worry) be concerned if their​ newborn's length and weight​ don't fit this​ equation? - ​No, because this is a model fit to data. No particular baby should be expected to fit this model exactly.

State police want to estimate the percentage of cars with up-to-date registration, insurance, and safety inspection stickers. State police set up a roadblock on a randomly selected street to question people. They usually find problems with about 19% of the cars they stop. ​a) Identify the population. Choose the correct answer below. ​b) Identify the population parameter of interest. Choose the correct answer below. ​c) Identify the sampling frame. Choose the correct answer below. ​d) Identify the sample. Choose the correct answer below. ​e) Identify the sampling​ method, including whether or not randomization was employed. Choose the correct answer below. ​f) Identify who​ (if anyone) was left out of the study. Choose the correct answer below. ​g) Identify any potential sources of bias and any problems in generalizing to the population of interest. Choose the correct answer below.

a. cars b. proportion with up-to-date registration, insurance, and safety inspections. c. all cars on that road d. those actually stopped by roadblock e. cluster sample of location f. local drivers that do not take that road g. undercoverage bias is possible. the time of day and location may not be representative of all cars.

Is there any pattern to the locations of the planets in a distant solar​ system? The table shows the average distance of each of the planets from the star.

a. positive, starts at zero, curved, b. The relationship between position and distance is nonlinear, with a positive direction. c. The relationship is not linear. d. positive, doesn't start at zero, slightly curved the opposite e. The relationship between position number and log of distance appears to be roughly linear.

For the following description of​ data, identify the​ W's, name the​ variables, specify for each variable whether its use indicates it should be treated as categorical or​ quantitative, and for any quantitative​ variable, identify the units in which it was measured​ (or note that they were not​ provided). In 1992​, a magazine collected data and published an article evaluating dishwashers. It listed 38 ​models, giving the​ brand, cost​ (dollars), size​ (cu ft),​ type, estimated annual energy cost​ (dollars), an overall rating​ (good, excellent,​ etc.), and repair history for that brand​ (percentage requiring repairs over the past 5​ years). a.) The​ W's are​ Who, What,​ When, Where,​ Why, and hoW. Identify the Who. Choose the correct answer below. b.) Identify the What.Choose the correct answer below. c.) Identify the When. Select the correct choice below​ and, if​ necessary, fill in any answer boxes within your choice. d.) Identify the Where. Choose the correct answer below. e.) Identify the Why. Choose the correct answer below. f.) Identify the how. Choose the correct answer below. g.) brand- cost- size- type- estimated annual energy cost- overall rating- repair history- h.) cost- size- estimated annual energy cost- repair history-

a.) 38 models of dishwashers b.) ​Brand, cost,​ size, type, estimated annual energy​ cost, overall​ rating, repair history. c.) The data were recorded in 1992 d.) The information is not provided. e.) To provide information to the magazine's readers f.) The information is not provided g.) - categorical - quantitive - quantitive - categorical - quantitive - categorical - quantitive e.) - dollars - cubic feet - dollars - percentages

For the following description of​ data, identify Who and What were investigated and the Population of interest. A look at 539 participants in a study found that participants who ate three or more candy bars a week experienced waist size increases four times greater than those of people who​ didn't eat the candy bars. a.) Identify the Who for this study. b.) what ? c.) population of interest?

a.) All 539 participants in the study b.) waist size change, number of candy bars consumed per week c.) all people

Identify Who and What were investigated and the Population of interest. A study begun in 2011 examines the use of stem cells in treating two forms of haemophilia. Each of the 28 patients entered one of two separate trials in which embryonic stem cells were to be used to treat the condition. a.) who? b.) what? c.) population of interest?

a.) the 28 hemophilic patients b.) the effects the treatments have on hemophilia c.) all people with these two forms of hemophilia

Disk drives have been getting larger. Their capacity is now often given in terabytes​ (TB) where 1 TB=1000 gigabytes, or about a trillion bytes. A survey of prices for external disk drives found the data shown to the right. For this​ data, we want to predict Price from Capacity. Complete parts a through i below. b) What does the slope​ mean, in this​ context? ​d) What does the intercept​ mean, in this​ context? Is it​ meaningful? ​g) You have found a 20.0 TB drive for ​$2560. According to the​ model, does this seem like a good​ buy? How much more or less would you pay compared to what the model​ predicts? h) Does the model overestimate or underestimate the​ price? ​i) The correlation is very high. Does this mean that the model is​ accurate? Use the scatterplot shown below.

b. It indicates the additional price in dollars for each additional TB of capacity. d. It indicates the price for a hard drive with no capacity. It is meaningless and should not be interpreted. g. No. Because the actual price is more than the predicted price. h. It underestimates the price. i. The model might not be accurate because there is an extreme outlier.

Use the advertised prices for used cars of a particular model in the accompanying table to create a linear model for the relationship between a​ car's Age and its Price. Complete parts a through g. ​b) Explain the meaning of the slope of the line. Select the correct choice below and fill in the answer box to complete your choice. c) Explain the meaning of the​ y-intercept of the line. Select the correct choice below and fill in the answer box to complete your choice. ​e) You have a chance to buy one of two cars. They are about the same age and appear to be in equally good condition. Would you rather buy the one with a positive residual or the one with a negative​ residual? Explain. g) Would this regression model be useful in establishing a fair price for a 23​-year-old car? Explain.

b. The slope indicates that every​ 1-year increase in Age decreases the Price of cars of this model by $811​, on average. c. The​ y-intercept means that a new car of this model costs ​$17,622 on average. e. The car with a negative residual is better because its actual price is below the predicted price for its age. g. No, because the predicted price is​ negative, which does not make sense.

Here are engine size​ (displacement, in​ liters) and gas mileage​ (estimated combined city and​ high-way) for a random sample of 10 model cars. ​b) Describe the​ direction, form, and strength of the plot. Choose the correct answer below. ​c) Find the correlation between horsepower and miles per gallon. d) What does the plot say about fuel​ economy?

b. There is a​ negative, straight, and moderate association. c. r = -0.542 d. Vehicles in the selected group with more displacement have lower mileage.

​c) Suppose business has been good and the company gives each employee a ​$150 raise. Tell the new value of each of the summary statistics. ​d) Instead, suppose the company gives each employee a 10​% raise. Tell the new value of each of the summary statistics.

c) After everyone receives a ​$150 ​raise, the new lowest salary is ​$400. After everyone receives a ​$150 ​raise, the new mean salary is ​$950. After everyone receives a ​$150 ​raise, the new median is ​$950. After everyone receives a ​$150 ​raise, the new range is ​$1000. After everyone receives a ​$150 ​raise, the new IQR is ​$700. After everyone receives a ​$150 ​raise, the new first quartile is ​$600. After everyone receives a ​$150 ​raise, the new standard deviation is ​$450. d) After everyone receives a 10​% ​raise, the new lowest salary is ​$275. After everyone receives a 10​% ​raise, the new mean salary is ​$880. After everyone receives a 10​% ​raise, the new median is ​$880. After everyone receives a 10​% ​raise, the new range is ​$1100. After everyone receives a 10​% ​raise, the new IQR is ​$770. After everyone receives a 10​% ​raise, the new first quartile is ​$495. After everyone receives a 10​% ​raise, the new standard deviation is ​$495.

The accompanying graph shows the percentage of winners of a horse race that have run slower than a given speed. Note that few have won running less than 33 miles per​ hour, but about​ 85% of the winning horses have run less than 37miles per hour.​ (A cumulative frequency graph like this is called an​ "ogive.") Using only the​ graph, complete parts a through e below.

​a) Estimate the median winning speed. The median winning speed is 35.935.9 mph. ​b) Estimate the quartiles. The lower quartile is 34.534.5 mph.The upper quartile is 36.736.7 mph.​ c) Estimate the range and the IQR.The range is 6.46.4 mph.The IQR is 2.22.2 mph. ​e) Write a few sentences about the speeds of the horse race winners. The distribution of speeds is skewed to the left. The typical horse race winner has a speed of 3636 mph. Most of the horse race winners are between first quartile 34.534.5 mph and third quartile 36.736.7 mph.

At its​ website, a polling company publishes results of a new survey each day. Scroll down to the end of the published results and​ you'll find a statement that includes words as shown below. Results are based on telephone interviews with​ 1,008 national​ adults, aged 18 and​ older, conducted on April​ 2-5, 2007 ... In addition to sampling​ error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls. Complete parts​ (a) through​ (c).

​a) For this​ survey, identify the population of interest. Choose the correct answer below. - everyone in the nation that is 18+ years old b) The company performs its surveys by phoning numbers generated at random by a computer program. What is the sampling​ frame? - everyone with a telephone ​c) What​ problems, if​ any, would you be concerned about in matching the sampling frame with the​ population? - some people do not have telephones

For high school​ students, college admissions to the most selective schools are very competitive. College A accepted about 5.8​% of its​ applicants, College B 8​%, and College C 12.7​%. Susan has applied to all three. She figures that her chances of getting into at least one of the three must be about 26.5​%. Complete parts a through c below.

​a) How has she arrived at this​ conclusion? - She has added the probability of getting accepted at College A with the probability of getting accepted at College B with the probability of getting accepted at College C. b) What additional assumption is she making? - To reach her conclusion using the Addition​ Rule, she is assuming that the events are disjoint. ​c) Do you agree with her conclusion? - No, because many students get accepted to more than one​ college, so the events are not disjoint.

The correlation between Education and Income as measured on 100 people is r=0.70. Explain whether or not each of the following possible conclusions is justified. ​a) When Education ​increases, Income increases as well. ​b) The form of the relationship between Education and Income is straight. ​c) There are several outliers in the scatterplot of Income vs. Education. ​d) If we measure Education in months instead of years​, the correlation will increase.

​a) Is this conclusion​ justified? Explain. - ​No, because this cannot be concluded from the correlation alone. There may be a nonlinear relationship or outliers. b) b) Is this conclusion​ justified? Explain. - ​No, because the form of the relationship cannot be determined from the correlation. c) ​No, because the form of the relationship cannot be determined from the correlation. - No, because the correlation coefficient does not provide evidence for or against the existence of outliers in the data. ​d) Is this conclusion​ justified? Explain. - No​, because correlation depends on the​z-scores, and they are unaffected by changes in center or scale.

How long is your arm compared with your hand​ size? Put your right thumb at your left shoulder​ bone, stretch your hand open​ wide, and extend your hand down your arm. Put your thumb at the place where your little finger​ is, and extend down the arm again. Repeat this a third time. Now your little finger will probably have reached the back of your left hand. If the fourth hand width goes past the end of your middle​ finger, turn your hand sideways and count finger widths to get there. ​a) Suppose you repeat your measurement 25 times and average your results. What parameter would this average​ estimate? What is the​ population? ​b) Suppose you now collect arm lengths measured in this way from 7 friends and average these 7 measurements. What is the population​ now? What parameter would this average​ estimate? ​c) Do you think these 7 arm lengths are likely to be representative of the population of arm lengths in your​ community? In the​ country? Why or why​ not?

​a) The parameter is the length of your arm and the population is all possible measurements of your arm. ​b) The population is all possible measurements of your arm and your friends' arms and the average would estimate the mean of the measurements. c) The sample would not be representative of either the community or the country because a group of friends might not be as diverse as the population.

Shown on the right is a scatterplot of the production budgets​ (in millions of​ dollars) vs. the running time​ (in minutes) for major release movies in 2005. Dramas are plotted as red​ x's and all other genres are plotted as blue dots. A separate least squares regression line has been fitted to each group. For the following​ questions, examine the plot. Complete parts a through c below.

​a) What are the units for the slopes of these​ lines? - million dollars per min ​b) In what way are dramas and other movies similar with respect to this​ relationship? - they have the same rate of increase in budget per increase in runtime. ​​​​c) In what way are dramas different from other genres of movies with respect to this​ relationship? - On average dramas cost about​ $20 million less for the same runtime.

The regression of Price on Size of homes in a certain city had R2=89.9​%. Complete parts a through c below.

​a) What is the correlation between Size and​ Price? 0.948 b) What would one predict about the Price of a home 1 standard deviation above average in​ Size? Select the correct choice below and fill in the answer box to complete your choice. - Price should be 0.9480.948 standard​ deviation(s) above the mean in price. c) What would one predict about the Price of a home 2 standard deviations below average in​ Size? Select the correct choice below and fill in the answer box to complete your choice. - Price should be 1.896 standard​ deviation(s) below the mean in price.

Students in an introduction to statistics course were asked to describe their politics as​ "Liberal," "Moderate," or​ "Conservative." The results are shown in the table. Complete parts a through d below. (38,51,89,37,44,81,9,20,29)

​a) What percent of the class is​ male? 57.8 b) ​What percent of the class considers themselves to be​ "Conservative"? 14.6% ​c) What percent of the males in the class consider themselves to be​ "Conservative"? 17.4% ​d) What percent of all students in the class are males who consider themselves to be​ "Conservative"? 10.1%

For each of the​ following, list the sample space and tell whether you think the outcomes are equally likely. ​a) Toss 2 ​coins; record the order of heads and tails. ​b) A family has 2 ​children; record the number of boys. ​c) Flip a coin until you get a head or 3 consecutive tails. ​d) Roll two ​dice; record the larger number.

​a) Which of the following is the sample space for recording the order of heads and tails when tossing 2 ​coins? Let H represent getting a head and T getting a tail. - ​{HH, HT,​ TH, TT} b) Are the outcomes equally​ likely? - yes c) Which of the following is the sample space for the number of boys in the​ family? - ​{0, 1, 2​} d) Are the outcomes equally​ likely? - no e) What is the sample space for flipping a coin until you get a head or 3 consecutive tails? - ​{H, TH, TTH, TTT​} f) Are the outcomes equally​ likely? - no g) ​d) What is the sample space for the larger number when two dice are​ rolled? - {1, 2,​ 3, 4,​ 5, 6} h) Are the outcomes equally​ likely? - no

The accompanying regression analysis looks at the relationship between the number of runs scored and the average attendance at home games for 30 baseball teams in a recent year. Complete parts a through d below.

​c) Interpret the meaning of the slope of the regression line in this context. Select the correct choice below and fill in the answer boxes within your choice. - Every run scores adds an average of 56.641 people in attendance. ​d) In​ general, what would a negative residual mean in this​ context? - It means the​ team's average attendance is lower than the expected average for a team that scores that many runs.


Conjuntos de estudio relacionados

Math in Focus - Fractions (add,subtract, Multiply & Divide)

View Set

NUR202 Evolve: Introductory Quiz - Pain Management

View Set

Chapter 1 | The Study of Human Anatomy

View Set

Business Finance Ch 11 Reading assignment

View Set

Government and civics chapters 10 and 11

View Set