STATS Final
A pharmaceutical company conducts an experiment in which a subject takes 100 mg of a substance orally. The researchers measure how many minutes it takes for one quarter of the substance to exit the bloodstream. What kind of variable is the company studying?
Quantitative variable
A study examining the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped. Create appropriate graphical displays for both groups, and write a brief report comparing their cholesterol levels. The data are given below.
The median score by Smokers is 55 points lower than the median score for Ex-Smokers and the mean score for Smokers is 5.45.4 points higher than the mean score for Ex-Smokers. The Smokers had a range of 204204, while the Ex-Smokers had a range of 201201. The Ex-Smokers had a larger standard deviation. The Smoker's distribution has an outlier.
The company's annual report states, "Our survey shows that 82.65% of our employees are 'very happy' working here." Comment on that claim. Use appropriate statistics terminology.
The survey result is a statistic. It estimates the true proportion of satisfied workers in the population.
Traffic checks on a certain section of highway suggest that 85% of drivers are speeding there. Since 0.85×0.85=0.7225, the multiplication rule might suggest that there is approximately a 72% chance that two vehicles in a row are both speeding. What's wrong with that reasoning?
There are cases when the speed of one car is not independent of the speed of another car, so the multiplication rule does not apply.
A poll question asked, "In the next 5 years, do you think that things will get better, stay the same, or get worse?" The possible responses were "Better,""No change," "Worse," "Don't Know," and "No Response." What kind of variable is the response?
categorical variable
Suppose you have fit a linear model to some data and now take a look at the residuals. For the possible residuals plot shown, tell whether you would try a re-expression and, if so, why.
re-express to straighten the relationship
The Yale Program on Climate Change Communication surveyed 1263 American adults in March 2015 and asked them about their attitudes on global climate change. A display of the percentages of respondents choosing each of the major alternatives offered is provided. List the errors in this display.
- the percentages do not sum to 100% - there is no title - showing the pie chart on a slant violates the area principle
The accompanying table below shows that as the number of oranges on a tree increases, the fruit tends to get smaller. Create a linear model for this relationship and express any concerns you may have.
- the residuals plot is curved
Which matters more about a sample you draw from a population?
- the size of the sample
If you create an online survey, individuals can choose on their own whether to participate in the sample. This causes a form of bias called __________.
- voluntary response
The weights in pounds of a breed of yearling cattle follows the Normal model N(1121,77). What weight would be considered unusually low for such an animal?
Any weight more than 2 standard deviations below the mean, or less than 967 pounds, is unusually low. One would expect to see a steer 3 standard deviations below the mean, or less than 890 pounds only rarely.
A study was conducted on shoe sizes of students, reported in European sizes. For the men, the mean size was 44.16 with a standard deviation of 1.92. To convert European shoe sizes to U.S. sizes for men, use the equation shown below. USsize=EuroSize×0.7999−23.3 a) What is the mean men's shoe size for these responses in U.S. units? b) What is the standard deviation in U.S. units?
a) 12.02 b) 1.54
Load the accompanying data about a particular car race into your preferred statistics package and answer the questions a through c below. a) What was the average speed of the winner in 2011? b) How many times did Arie Luyendyk win the race in the the 1990s? c) How many races took place during the 1930s?
a) 170.265 miles/hr b) 2 times c) 10 races
In a study of streams in the Adirondack Mountains, the following relationship was found between the water's pH and its hardness (measured in grains). Is it appropriate to summarize the strength of association with a correlation?
- The scatterplot is not linear; correlation is not appropriate.
A researcher is working with a model that uses the number of rings in an Abalone's shell to predict its age. He finds an observation that he believes has been miscalculated. After deleting this outlier, he redoes the calculation. Does it appear that this outlier was exerting very much influence?
Yes, this observation was influential. After it was removed, the slope of the regression line changed by a large amount.
A researcher studies high schools and finds a strong positive linear association between funding and reading scores.
a. Yes, schools with higher funding have generally better readers. b. a lurking variable
Disk drives have been getting larger. Their capacity is now often given in terabytes (TB) where 1 TB=1000 gigabytes, or about a trillion bytes. A survey of prices for external disk drives found the data shown to the right. Find and interpret value of R2.
- The value of R2=98.98% indicates the percentage of the variability in the price of these disk drives that can be accounted for by a linear model on the capacity of the drives.
Administrators at Texas A&M University were interested in estimating the percentage of students who are on a vegetarian diet. The A&M student body has about 42,000 members. How might the administrators answer their question by applying the three Big Ideas?
The A&M administrators should take a survey. They should sample a part of the student body, selecting respondents with a randomization method. They should be sure to draw a sufficiently large sample.
A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level. d) Select a brief description of the association between age and blood pressure among these employees. e) Does this prove that people's blood pressure levels increase as they age? Explain.
d) The percentage of employees having low blood pressure decreases and the percentage having high blood pressure increases as they age. e) No, because only a controlled experiment can isolate the relationship between age and blood pressure.
For the following description of data, identify the variables and tell whether each should be treated as categorical or quantitative. Ian Walker, a psychologist at the University of Bath, wondered whether drivers treat bicycle riders differently when they wear helmets. He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him. He then rode on alternating days with and without a helmet. Out of 2500 cars passing him, he found that when he wore his helmet, motorists passed 3.35 inches closer to him, on average, than when his head was bare. [NY Times, Dec. 10, 2006]
(Distance; Quantitative), (Helmet; Categorical)
Here are several scatterplots. The calculated correlations are 0.777, 0.006, −0.923, and 0.951. Which is which?
- (0.006) = parabola - (0.777) = positive linear spread out - (-0.923) = negative - (0.951) = positive linear strong correlation
A town's January high temperatures average 34°F with a standard deviation of 8°,while in July the mean high temperature is 74°and the standard deviation is 9°. In which month is it more unusual to have a day with a high temperature of 56°? Explain.
It is more unusual to have a day with a high temperature of 56°in January.A high temperature of 56° in January is 2.750 standard deviations above the mean and a high temperature of 56° in July is only 2.000 standard deviations below the mean.
An article reported on a school district's magnet school programs. Of the 1885 qualified applicants, 592 were black or Hispanic, 295 Asian, and 998 white. Summarize the relative frequency distribution of ethnicity with a sentence or two (in the proper context, of course).
Of the qualified applicants, 31.431.4% were black or Hispanic, 15.615.6% were Asian, and 52.952.9% were white.
People with z-scores of 2.5 or above on a certain aptitude test are sometimes classified as geniuses. If aptitude test scores have a mean of 100 and a standard deviation of 30 points, what is the minimum aptitude test score needed to be considered a genius?
The minimum aptitude test score needed to be considered a genius is 175 points.
A friend says "I flipped five heads in a row! The next one has to be tails!" Explain why this thinking is incorrect.
There is no law of averages for the short run. The first five flips do not affect the sixth flip.
Is there any evidence that an animal's gestation period is related to the animal's lifespan? The scatterplot shows Gestation Period (in days) vs. Life Expectancy (in years) for 18 species of mammals. The highlighted point at the far right represents humans. Complete parts a through e. a) For these data, r=0.540. This is not a very strong relationship. Do you think the association would be stronger if humans were removed? Explain. b) Is there reasonable justification for removing humans from the data set? Explain.
a. Stronger. Both slope and correlation would increase. b. Yes, restricting the study to nonhuman animals would justify it. c. The association is moderately strong d. On average, for every year increase in life expectancy, the gestation period increases by about 12.97 days. e) A certain mammal has a life expectancy of about 18 years. Estimate the expected gestation period of this species. 326.3326.3 days
Examine each of the following questions for possible bias. If you think the question is biased, indicate how and propose a better question. a) Should companies that promote teen smoking be liable to help pay for the costs of cancer institutions? b) Given that 18-year-olds are old enough to smoke, is it fair to set the drinking age at 21?
a. The question is biased toward "yes" because of the wording "promote teen smoking." A better question may be "Should companies be responsible to help pay for the costs of cancer institutions?" b. The question is biased toward "no" because of the preamble "18-year-olds are old enough to smoke." A better question may be "Do you think the drinking age should be lowered from 21?"
Prior to graduation, a high school class was surveyed about its plans. The table displays the results for white and minority students (the Minority group included African-American, Asian, Hispanic, and Native American students). Complete parts a) through d).
d) Do you see any important differences in the post-graduation plans of white and minority students? There is more than a five percent difference in at least oneof the categories of post-graduation plans for white and minority students. There is evidence of an association between race and post-graduation plans.
When you sample so that every combination of individuals in your population has an equal chance of being chosen you are taking a __________.
- simple random sample
The following data show the percentage change in population for 49 states and the District of Columbia from the 2000 census to the 2010 census. Using appropriate graphical displays and summary statistics, write a report on the percentage change in population by state.
- the median is 7.85 - the first quartile, Q1, is 4.5 - the third quartile, Q3, is 41.1 - the IQR is 9.6 - The minimum -0.6 and maximum 35.1. the range is 35.7. - The histogram shows that the distribution of Percent Change is unimodal and skewed right. The states vary from a minimum of negative 0.6−0.6% to 35.135.1% growth in the decade. The median was 7.857.85% and the middle half of the states had growth between 4.54.5% and 14.114.1%.
A student wants to determine whether or not a value in her data is an outlier. She has calculated Q1=4, median=5, and Q3=10. Where is the upper fence?
- 19
The boxplot shows the fuel economy ratings for 67 subcompact cars with the same model year. Some summary statistics are also provided. The extreme outlier is an electric car whose electricity usage is equivalent to 112 miles per gallon. If that electric car is removed from the data set, how will the standard deviation be affected? The IQR?
- How will removing the electric car affect the standard deviation? The standard deviation will be much lower. Since the standard deviation is calculated by summing the squared differences between the data values and the mean, removing the electric car will drastically lower this sum. - How will removing the electric car affect the IQR?The IQR will not change very much, if at all. All that removing the electric car can do is possibly change the location of each quartile to be the preceding data value, which will not have a huge impact on the IQR.
Concerned about reports of discolored scales on fish caught downstream from a newly sited chemical plant, scientists set up a field station in a shoreline public park. For one week they asked fishermen there to bring any fish they caught to the field station for a brief inspection. At the end of the week, the scientists said that 40% of the 135 fish that were submitted for inspection displayed the discoloration. From this information, can the researchers estimate what proportion of fish in the river have discolored scales? Explain.
- If discolored fish are not equally likely to be caught as normal fish, or fisherman are more disposed to bring discolored fish than normal fish, then the sample will be biased and resulting estimation will be biased.
A government bureau keeps track of the number of adoptions in each region. The accompanying histogram shows the distribution of adoptions in each region. Would you report the standard deviation or the IQR? Explain briefly.
- report the IRQ, since the distribution is skewed
You are trying to study the amount of financial aid students at your University receive. You sample 50 students and find out the average size of their financial aid packages. The average of your sample is a __________.
- sample statistic
The table below shows the number of licensed drivers in a state by age and by sex. Complete parts a) through d).
As age increases, the percentage of female drivers increases. d) Do driver's age and sex appear to be independent? Explain. A. No. There is no association between driver's age and sex. B. Yes. There is some association between driver's age and sex. C. No. There is some association between driver's age and sex. This is the correct answer. D. Yes. There is no association between driver's age and sex.
An auctioneer sold a herd of cattle whose minimum weight was 920 pounds, median was 1180 pounds, standard deviation 80, and IQR 104 pounds. They sold for 30 cents a pound, and the auctioneer took a $20 commission on each animal. Then, for example, a steer weighing 1100 pounds would net the owner 0.30(1100)−20=$310. Find the minimum, median, standard deviation, and IQR of the net sale prices.
The minimum price is $256.00. (Round to the nearest cent as needed.) The median price is $334.00. (Round to the nearest cent as needed.) The standard deviation of the prices is $24.00. (Round to the nearest cent as needed.) The IQR of the prices is $31.20. (Round to the nearest cent as needed.)
You purchased a five-pack of new light bulbs that were recalled because 21%of the lights did not work. What is the probability that at least one of your lights is defective?
The probability that at least one of the light bulbs is defective is 0.692.
For the following description of data, identify the W's from what's given, name the variables, specify for each variable whether its use indicates that it should be treated as categorical or quantitative, and, for any quantitative variable, identify the units in which it was measured (or note that they were not provided). In a study appearing in a science journal, a research team reports that plants in southern England are flowering earlier in the spring. Records of the first flowering dates for 384 species over a period of 48 years show that flowering has advanced an average of 12 days per decade, an indication of climate warming, according to the authors. a) who? b) what? c) when? d) where? e) why? f) how? g) the first flowering date variable is- h) the year variable is- i) the flower species variable is-
a) 384 plant species in southern England b) the first flowering dates for 384 plant species in southern England c) cannot be determined d) southern England e) the study was conducted to determine wether plants are flowering earlier in the spring. f) the how for this situation cannot be determined from the given information g) quantitative with unit of days h) quantitative with units of years i) categorical
Hens usually begin laying eggs when they are about 6 months old. Young hens tend to lay smaller eggs, often weighing less than the desired minimum weight of 54grams. Complete parts a) through c) below. a) The average weight of the eggs produced by the young hens is 51.7 grams, and only 29% of their eggs exceed the desired minimum weight. If a Normal model is appropriate, what would the standard deviation of the egg weights be? b) By the time these hens have reached the age of one year, the eggs they produce average 66.8 grams, and 93% of them are above the minimum weight. What is the standard deviation for these older hens? c) Are egg sizes more consistent for the younger hens or the older ones? Explain.
a) 4.2 grams b) 8.6 grams c) the egg sizes more consistent for the younger hens because their standard deviation is lower.
Sugar is a major ingredient in many breakfastcereals. The histogram displays the sugar content as a percentage of weight for 48 brands of cereal. The boxplot compares sugar content for adult cereals (A) and children's cereals (C). Complete parts a through c. a) What is the range of the sugar contents of these cereals? b) Describe the shape of the distribution. c) What aspect of breakfast cereals might account for this shape? d) Are all children's cereals higher in sugar than adult cereals? e) Which group of cereals varies more in sugar content? Explain.
a) 63% b) bimodial c) Cereals tend to be either very sugary or healthy low-sugar cereals. d) yes e) Although the ranges appear to be comparable for both groups, the IQR is larger for the adult cereals, indicating that there's more variability in the sugar content of the middle 50% of adult cereals.
A livestock cooperative reports that the mean weight of yearling Angus steers is 1126 pounds. Suppose that the weights of all such animals can be described by a Normal model with a standard deviation of 66 pounds. a) What percent of steers weigh over 1050 pounds? b) What percent of steers weigh under 1300 pounds? c) What percent of steers weigh between 900 and 1200 pounds?
a) 87.5% b) 99.6% c) 86.9%
Data from a recent football season reported the number of yards gained by each of the league's 452 receivers. The mean is 276.35 yards, with a standard deviation of 312.81 yards. Complete parts a through c below.
a) According to the Normal model and the 68-95-99.7 Rule, what percent of receivers would be expected to gain more yards than 2 standard deviations above the mean number of yards? 2.5% b) For these data, what does that mean? About 10 receiver(s) should gain more than 902 yard(s). This is fewer than the actual amount of receivers that gained this many yards. c) Explain the problem in using a Normal model here. These data are strongly skewed to the right, so a Normal model is not appropriate.
Here are boxplots of the points scored during the first 10 games of the season for both Alex and Kelly. a) Summarize the similarities and differences in their performance so far. b) The coach can take only one player to the state championship. Which one should she take? Why?
a) Both girls have the same approximate median, but Alex has a larger IQR. b) A and B are both possible, depending on the coach's preference.
The scatterplot to the right shows that the trend for the interest rate on a 3-month bond changed dramatically after 1980, so two regression models were fit to the relationship between the rate (in %) and the number of years since 1950, one for 1950 to 1980 and one for the data from 1980 to 2007. The accompanying display shows the plots of the interest rate on the 3-month bond from 1950 to 1980 and from 1980 to 2007 and their corresponding regression models. Complete parts a through d below.
a) How does the model for the data between 1980 and 2007 compare to the one for the data between 1950 and 1980? - The two models both fit well, but they have very different slopes. c) Do you trust this newer predicted value? Explain. - No, because extrapolating 70 years beyond the beginning of these data would be dangerous and unlikely to be accurate. d) Would you use either of these models to predict the interest rate in the future? Explain. - It would be best not to predict the value because extrapolating beyond the x-values that were used to fit the model can be dangerous.
The full series of data giving the median age at first marriage in the United States for men and women shows the following pattern. Answer parts a through c. a) In what way do these data differ from standard time series? b) Describe the patterns you see here. c) Do you expect the patterns seen since 1960 to continue? Explain.
a) They are time series because they report values over time. However, the values are not all equally spaced because the early values are reported only every decade, while later values are annual. Your answer is correct. b) Age at first marriage declined in the first part of the 20th century, but has been increasing for both men and women since about 1960. Throughout more than a century of data, men have typically been older at first marriage than women. c) The increase in age cannot continue indefinitely. The pattern that men tend to be older than women at first marriage may well continue.
A National Vital Statistics Report provides information on deaths by age, sex, and race. Below is a link to the displays of the distributions of ages at death for White and Black males. Use these displays to complete parts a through c below. a) Describe the overall shapes of these distributions. b) How do the distributions differ? c) Look carefully at the bar definitions. Where do these plots violate the rules for statistical graphs? Select all that apply.
a) both distributions are left skewed and unimodal. b) the central for the distribution of Black males is less than the center of the distribution of white males. c) a. the widths of the far left and right bins differ from the widths of the middle bins, the vertical axes do not have the same maximum
Load the accompanying data about the Kentucky Derby into your preferred statistics package and answer the questions a through d below. a) what was the name of the winning horse in 1884? b) when did the length of the race change? c) what was the winning time in 1929? d) only two horses have run the race in less than 2 minutes. Which horses and in what years?
a) buchanan b) 1896 c) 2 minutes and 10.8 seconds, 130.8 seconds d) Secretariat in 1973 and monarchs in 2001
Here are the summary statistics for the weekly payroll of a small company: lowest salary=$250, mean salary=$800, median=$800, range=$1000, IQR=$700, first quartile=$450, standard deviation=$450. a) Do you think the distribution of salaries is symmetric, skewed to the left, or skewed to the right? Explain why. b) Between what two values are the middle 50% of the salaries found?
a) the distribution is symmetric because the mean is equal to the median. b) $450, $1150
The data in the accompanying table are the annual numbers of deaths from floods in the United States for 21 randomly selected years from 1940 through 2017. Find the a) mean, b) median and quartiles, and c) range and IQR.
a) the mean is 95.05 b) the median is 82 c) the Q1= 53 d) the Q3= 121 e) the range = 277 f) interquartile range is IQR = 68
The histogram to the right shows the distribution of the prices of plain pizza slices (in $) for 308 weeks in a large city. a) Is the mean closer to $5.00, $6.00, or $7.00? Why? b) Is the standard deviation closer to $0.75, $2.50, or $5.00? Explain.
a) the mean is closest to $6.00 because that is the balancing point of the histogram. b) the standard deviation is closest to $0.75 since that is a typical distance from the mean.
During his 20 seasons in the NHL, Wayne Gretzky scored 50% more points than anyone who ever played professional hockey. He accomplished this amazing feat while playing in 280 fewer games than Gordie Howe, the previous record holder. The number of games Gretzky played during each season is provided with an accompanying stem-and-leaf display. Complete parts a through c below. a) Would you use the mean or the median to summarize the center of this distribution? Why? b) Find the median. c) Without actually finding the mean, would you expect it to be lower or higher than the median? Explain.
a) the median should be used to summarized the center of this distribution because the distribution is skewed b) the median is 79 c) the mean would be lower because the distribution is skewed to the left
Identify the W's, name the variables, specify for each variable whether its use indicates that it should be treated as categorical or quantitative, and, for any quantitative variable, identify the units in which it was measured. A listing posted by a sandwich restaurant chain's headquarters gives, for each of the sandwiches it sells, the type of meat in the sandwich, the number of calories, and the serving size in ounces. The data might be used to assess the nutritional value of the different sandwiches. a) who? b) what? c) when? d) where? e) why? f) how? g) one variable- h) another variable- i) a third variable-
a) the restaurant's sandwiches b) type of meat, number of calories, serving size c) the when is not specified d) the chain's restaurants e) report by the chain's headquarters f) report by the chain's headquarters g) type of meat, categorical, has no units h) number of calories, quantitative, its units are calories i) serving size, quantitive, its units are ounces
Tell what each of the residual plots to the right indicates about the appropriateness of the linear model that was fit to the data.
a- fanned to the left, the fanned pattern indicates that they linear model is not appropriate. The model's predicting power increases as the values of the explanatory variable increases. b- parabola, the curved pattern in the residuals post indicates that the linear model is not appropriate. The relationship is not linear. c- smiley face, the curved pattern in the residuals plot indicate that the linear model is not appropriate. The relationship is not linear.
An internet company conducts a global consumer survey to help multinational companies understand different consumer attitudes throughout the world. Within 30 countries, the researchers interview 1000 people aged 13-65. Their samples are designed so that they get 500 males and 500 females in each country. Complete parts a and b below. a) Are they using a simple random sample? Explain. b) What kind of design do you think they are using?
a. No. It would be nearly impossible to get exactly 500 males and 500 females from every country by random chance. b. A stratified sample, stratified by whether the respondent is male or female.
Answer true or false. If false, explain briefly. a) Some of the residuals from a least squares linear model will be positive and some will be negative. b) Least squares means that some of the squares of the residuals are minimized. c) We write y to denote the predicted values and y to denote the observed values.
a. The statement is true. b. The statement is false. Least squares means the sum of the squared residuals is minimized. c. The statement is true.
Researchers collected data on the annual mortality rate (deaths per 100,000) for males in 20 large towns and the water hardness in terms of the calcium concentration (parts per million, ppm) in the drinking water. a) The display to the right shows the relationship between mortality and calcium concentration for these towns. Describe what you see in this scatterplot, in context. c) Interpret the slope of this line in context. d) Explain the meaning of the y-intercept of the line. e) The largest residual has a value of 85. Explain what this value means. g) Explain the meaning of R-squared in this situation.
a. There is a fairly strong, negative, linear relationship between calcium concentration and mortality rate. Towns with harder water tended to have lower mortality rates. c. For each additional point in Calcium (ppm), the model predicts a decrease of 1.482 points in Mortality. d. The model predicts that a town with 0 ppm calcium concentration would have a mortality rate of 1824.644. e. The town had 85 more deaths per 100,000 people than the model predicts. g. 73.3% of the variability in the mortality can be accounted for by a linear model on calcium concentration.
The scatterplot of the Housing Cost Index versus the Median Family Income for 10 regions of a country is shown on the right. The correlation is 0.69. Complete parts a through f. a) Describe the relationship between the Housing Cost Index and Median Family Income by region. Choose the correct answer below. b) If both variables are standardized, what would the correlation coefficient between the standardized variables be? c) If Median Family Income had been measured in thousands of dollars instead of dollars, how would the correlation change? d) Another region of the country has a housing cost index of 572 and a median income of about $40,000. If this region were to be included in the data set, how would that affect the correlation coefficient? e) Do these data provide proof that by raising the median family income in a region, the housing cost index will rise as a result? f) For these data Kendall's tau is 0.55. Does that provide proof that by raising the median income in a state, the Housing Cost Index will rise as a result? Explain what Kendall's tau says and does not say.
a. There is a moderate positive linear association. b. 0.69 c. The correlation coefficient would have the same sign and its magnitude would not change. d. The correlation coefficient would have the same sign and its magnitude would decrease. e. These data do not provide proof, since the value of the correlation coefficient cannot prove any causation. f) Tau says that there is an association between median income and housing costs, but it makes no claims about the form of this association. It has no requirement that the relationship be linear. Here it appears that the plot "thickens" from left to right. That could affect the correlation, but not tau.
Perhaps fans are just more interested in teams that win. Below is a correlation table and scatterplot of data from a subset of NationalLeague teams for the 2016 season. Complete parts a through c below. a) Do winning teams generally enjoy greater attendance at their home games? Describe the association. b) Is attendance more strongly associated with winning or scoring runs? Explain. c) How strongly is scoring more runs associated with winning more games?
a. There is a moderate positive relationship between the number of wins and average home attendance. Their correlation of 0.722 is moderate. b. The correlation coefficient for winning and home attendance, 0.722 is greater than the correlation coefficient for scoring runs and home attendance, 0.513. Thus, attendance is more strongly associated with winning. c. The correlation between runs and wins is 0.454.
Examine each of the following questions for possible bias. If you think the question is biased, indicate how and propose a better question. a) Do you think high school students should be required to wear uniforms? b) Given humanity's great tradition of exploration, do you favor continued funding for space flights?
a. There is no indication of bias. b. The question may be biased towards yes because of "great tradition." A better question would be "Do you favor continued funding for the space program?"
Consider each of the situations below. Do you think the proposed sampling method is appropriate? a) We want to know what percentage of local doctors accept patients without medical insurance. We call the offices of 50 doctors randomly selected from local Yellow Pages ads. b) We want to know what percentage of local businesses anticipate sales to decrease in the upcoming month. We randomly select a page in the Yellow Pages and call every business listed there.
a. This sampling method is not appropriate. This method will probably result in undercoverage of those doctors who did not purchase a Yellow Pages ad. b. This sampling method is not appropriate. The sample will probably contain listings for only one or two types of businesses, resulting in undercoverage.
A least squares regression line was calculated to relate the length (cm) of newborn boys to their weight in kg. The line is weight=−6.16+0.1925 length. Explain in words what this model means. Should new parents (who tend to worry) be concerned if their newborn's length and weight don't fit this equation?
a. What does the given model mean? - The weight of a newborn boy can be predicted as −6.16 kg plus 0.1925 kg per cm of length. b. Should new parents (who tend to worry) be concerned if their newborn's length and weight don't fit this equation? - No, because this is a model fit to data. No particular baby should be expected to fit this model exactly.
State police want to estimate the percentage of cars with up-to-date registration, insurance, and safety inspection stickers. State police set up a roadblock on a randomly selected street to question people. They usually find problems with about 19% of the cars they stop. a) Identify the population. Choose the correct answer below. b) Identify the population parameter of interest. Choose the correct answer below. c) Identify the sampling frame. Choose the correct answer below. d) Identify the sample. Choose the correct answer below. e) Identify the sampling method, including whether or not randomization was employed. Choose the correct answer below. f) Identify who (if anyone) was left out of the study. Choose the correct answer below. g) Identify any potential sources of bias and any problems in generalizing to the population of interest. Choose the correct answer below.
a. cars b. proportion with up-to-date registration, insurance, and safety inspections. c. all cars on that road d. those actually stopped by roadblock e. cluster sample of location f. local drivers that do not take that road g. undercoverage bias is possible. the time of day and location may not be representative of all cars.
Is there any pattern to the locations of the planets in a distant solar system? The table shows the average distance of each of the planets from the star.
a. positive, starts at zero, curved, b. The relationship between position and distance is nonlinear, with a positive direction. c. The relationship is not linear. d. positive, doesn't start at zero, slightly curved the opposite e. The relationship between position number and log of distance appears to be roughly linear.
For the following description of data, identify the W's, name the variables, specify for each variable whether its use indicates it should be treated as categorical or quantitative, and for any quantitative variable, identify the units in which it was measured (or note that they were not provided). In 1992, a magazine collected data and published an article evaluating dishwashers. It listed 38 models, giving the brand, cost (dollars), size (cu ft), type, estimated annual energy cost (dollars), an overall rating (good, excellent, etc.), and repair history for that brand (percentage requiring repairs over the past 5 years). a.) The W's are Who, What, When, Where, Why, and hoW. Identify the Who. Choose the correct answer below. b.) Identify the What.Choose the correct answer below. c.) Identify the When. Select the correct choice below and, if necessary, fill in any answer boxes within your choice. d.) Identify the Where. Choose the correct answer below. e.) Identify the Why. Choose the correct answer below. f.) Identify the how. Choose the correct answer below. g.) brand- cost- size- type- estimated annual energy cost- overall rating- repair history- h.) cost- size- estimated annual energy cost- repair history-
a.) 38 models of dishwashers b.) Brand, cost, size, type, estimated annual energy cost, overall rating, repair history. c.) The data were recorded in 1992 d.) The information is not provided. e.) To provide information to the magazine's readers f.) The information is not provided g.) - categorical - quantitive - quantitive - categorical - quantitive - categorical - quantitive e.) - dollars - cubic feet - dollars - percentages
For the following description of data, identify Who and What were investigated and the Population of interest. A look at 539 participants in a study found that participants who ate three or more candy bars a week experienced waist size increases four times greater than those of people who didn't eat the candy bars. a.) Identify the Who for this study. b.) what ? c.) population of interest?
a.) All 539 participants in the study b.) waist size change, number of candy bars consumed per week c.) all people
Identify Who and What were investigated and the Population of interest. A study begun in 2011 examines the use of stem cells in treating two forms of haemophilia. Each of the 28 patients entered one of two separate trials in which embryonic stem cells were to be used to treat the condition. a.) who? b.) what? c.) population of interest?
a.) the 28 hemophilic patients b.) the effects the treatments have on hemophilia c.) all people with these two forms of hemophilia
Disk drives have been getting larger. Their capacity is now often given in terabytes (TB) where 1 TB=1000 gigabytes, or about a trillion bytes. A survey of prices for external disk drives found the data shown to the right. For this data, we want to predict Price from Capacity. Complete parts a through i below. b) What does the slope mean, in this context? d) What does the intercept mean, in this context? Is it meaningful? g) You have found a 20.0 TB drive for $2560. According to the model, does this seem like a good buy? How much more or less would you pay compared to what the model predicts? h) Does the model overestimate or underestimate the price? i) The correlation is very high. Does this mean that the model is accurate? Use the scatterplot shown below.
b. It indicates the additional price in dollars for each additional TB of capacity. d. It indicates the price for a hard drive with no capacity. It is meaningless and should not be interpreted. g. No. Because the actual price is more than the predicted price. h. It underestimates the price. i. The model might not be accurate because there is an extreme outlier.
Use the advertised prices for used cars of a particular model in the accompanying table to create a linear model for the relationship between a car's Age and its Price. Complete parts a through g. b) Explain the meaning of the slope of the line. Select the correct choice below and fill in the answer box to complete your choice. c) Explain the meaning of the y-intercept of the line. Select the correct choice below and fill in the answer box to complete your choice. e) You have a chance to buy one of two cars. They are about the same age and appear to be in equally good condition. Would you rather buy the one with a positive residual or the one with a negative residual? Explain. g) Would this regression model be useful in establishing a fair price for a 23-year-old car? Explain.
b. The slope indicates that every 1-year increase in Age decreases the Price of cars of this model by $811, on average. c. The y-intercept means that a new car of this model costs $17,622 on average. e. The car with a negative residual is better because its actual price is below the predicted price for its age. g. No, because the predicted price is negative, which does not make sense.
Here are engine size (displacement, in liters) and gas mileage (estimated combined city and high-way) for a random sample of 10 model cars. b) Describe the direction, form, and strength of the plot. Choose the correct answer below. c) Find the correlation between horsepower and miles per gallon. d) What does the plot say about fuel economy?
b. There is a negative, straight, and moderate association. c. r = -0.542 d. Vehicles in the selected group with more displacement have lower mileage.
c) Suppose business has been good and the company gives each employee a $150 raise. Tell the new value of each of the summary statistics. d) Instead, suppose the company gives each employee a 10% raise. Tell the new value of each of the summary statistics.
c) After everyone receives a $150 raise, the new lowest salary is $400. After everyone receives a $150 raise, the new mean salary is $950. After everyone receives a $150 raise, the new median is $950. After everyone receives a $150 raise, the new range is $1000. After everyone receives a $150 raise, the new IQR is $700. After everyone receives a $150 raise, the new first quartile is $600. After everyone receives a $150 raise, the new standard deviation is $450. d) After everyone receives a 10% raise, the new lowest salary is $275. After everyone receives a 10% raise, the new mean salary is $880. After everyone receives a 10% raise, the new median is $880. After everyone receives a 10% raise, the new range is $1100. After everyone receives a 10% raise, the new IQR is $770. After everyone receives a 10% raise, the new first quartile is $495. After everyone receives a 10% raise, the new standard deviation is $495.
The accompanying graph shows the percentage of winners of a horse race that have run slower than a given speed. Note that few have won running less than 33 miles per hour, but about 85% of the winning horses have run less than 37miles per hour. (A cumulative frequency graph like this is called an "ogive.") Using only the graph, complete parts a through e below.
a) Estimate the median winning speed. The median winning speed is 35.935.9 mph. b) Estimate the quartiles. The lower quartile is 34.534.5 mph.The upper quartile is 36.736.7 mph. c) Estimate the range and the IQR.The range is 6.46.4 mph.The IQR is 2.22.2 mph. e) Write a few sentences about the speeds of the horse race winners. The distribution of speeds is skewed to the left. The typical horse race winner has a speed of 3636 mph. Most of the horse race winners are between first quartile 34.534.5 mph and third quartile 36.736.7 mph.
At its website, a polling company publishes results of a new survey each day. Scroll down to the end of the published results and you'll find a statement that includes words as shown below. Results are based on telephone interviews with 1,008 national adults, aged 18 and older, conducted on April 2-5, 2007 ... In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls. Complete parts (a) through (c).
a) For this survey, identify the population of interest. Choose the correct answer below. - everyone in the nation that is 18+ years old b) The company performs its surveys by phoning numbers generated at random by a computer program. What is the sampling frame? - everyone with a telephone c) What problems, if any, would you be concerned about in matching the sampling frame with the population? - some people do not have telephones
For high school students, college admissions to the most selective schools are very competitive. College A accepted about 5.8% of its applicants, College B 8%, and College C 12.7%. Susan has applied to all three. She figures that her chances of getting into at least one of the three must be about 26.5%. Complete parts a through c below.
a) How has she arrived at this conclusion? - She has added the probability of getting accepted at College A with the probability of getting accepted at College B with the probability of getting accepted at College C. b) What additional assumption is she making? - To reach her conclusion using the Addition Rule, she is assuming that the events are disjoint. c) Do you agree with her conclusion? - No, because many students get accepted to more than one college, so the events are not disjoint.
The correlation between Education and Income as measured on 100 people is r=0.70. Explain whether or not each of the following possible conclusions is justified. a) When Education increases, Income increases as well. b) The form of the relationship between Education and Income is straight. c) There are several outliers in the scatterplot of Income vs. Education. d) If we measure Education in months instead of years, the correlation will increase.
a) Is this conclusion justified? Explain. - No, because this cannot be concluded from the correlation alone. There may be a nonlinear relationship or outliers. b) b) Is this conclusion justified? Explain. - No, because the form of the relationship cannot be determined from the correlation. c) No, because the form of the relationship cannot be determined from the correlation. - No, because the correlation coefficient does not provide evidence for or against the existence of outliers in the data. d) Is this conclusion justified? Explain. - No, because correlation depends on thez-scores, and they are unaffected by changes in center or scale.
How long is your arm compared with your hand size? Put your right thumb at your left shoulder bone, stretch your hand open wide, and extend your hand down your arm. Put your thumb at the place where your little finger is, and extend down the arm again. Repeat this a third time. Now your little finger will probably have reached the back of your left hand. If the fourth hand width goes past the end of your middle finger, turn your hand sideways and count finger widths to get there. a) Suppose you repeat your measurement 25 times and average your results. What parameter would this average estimate? What is the population? b) Suppose you now collect arm lengths measured in this way from 7 friends and average these 7 measurements. What is the population now? What parameter would this average estimate? c) Do you think these 7 arm lengths are likely to be representative of the population of arm lengths in your community? In the country? Why or why not?
a) The parameter is the length of your arm and the population is all possible measurements of your arm. b) The population is all possible measurements of your arm and your friends' arms and the average would estimate the mean of the measurements. c) The sample would not be representative of either the community or the country because a group of friends might not be as diverse as the population.
Shown on the right is a scatterplot of the production budgets (in millions of dollars) vs. the running time (in minutes) for major release movies in 2005. Dramas are plotted as red x's and all other genres are plotted as blue dots. A separate least squares regression line has been fitted to each group. For the following questions, examine the plot. Complete parts a through c below.
a) What are the units for the slopes of these lines? - million dollars per min b) In what way are dramas and other movies similar with respect to this relationship? - they have the same rate of increase in budget per increase in runtime. c) In what way are dramas different from other genres of movies with respect to this relationship? - On average dramas cost about $20 million less for the same runtime.
The regression of Price on Size of homes in a certain city had R2=89.9%. Complete parts a through c below.
a) What is the correlation between Size and Price? 0.948 b) What would one predict about the Price of a home 1 standard deviation above average in Size? Select the correct choice below and fill in the answer box to complete your choice. - Price should be 0.9480.948 standard deviation(s) above the mean in price. c) What would one predict about the Price of a home 2 standard deviations below average in Size? Select the correct choice below and fill in the answer box to complete your choice. - Price should be 1.896 standard deviation(s) below the mean in price.
Students in an introduction to statistics course were asked to describe their politics as "Liberal," "Moderate," or "Conservative." The results are shown in the table. Complete parts a through d below. (38,51,89,37,44,81,9,20,29)
a) What percent of the class is male? 57.8 b) What percent of the class considers themselves to be "Conservative"? 14.6% c) What percent of the males in the class consider themselves to be "Conservative"? 17.4% d) What percent of all students in the class are males who consider themselves to be "Conservative"? 10.1%
For each of the following, list the sample space and tell whether you think the outcomes are equally likely. a) Toss 2 coins; record the order of heads and tails. b) A family has 2 children; record the number of boys. c) Flip a coin until you get a head or 3 consecutive tails. d) Roll two dice; record the larger number.
a) Which of the following is the sample space for recording the order of heads and tails when tossing 2 coins? Let H represent getting a head and T getting a tail. - {HH, HT, TH, TT} b) Are the outcomes equally likely? - yes c) Which of the following is the sample space for the number of boys in the family? - {0, 1, 2} d) Are the outcomes equally likely? - no e) What is the sample space for flipping a coin until you get a head or 3 consecutive tails? - {H, TH, TTH, TTT} f) Are the outcomes equally likely? - no g) d) What is the sample space for the larger number when two dice are rolled? - {1, 2, 3, 4, 5, 6} h) Are the outcomes equally likely? - no
The accompanying regression analysis looks at the relationship between the number of runs scored and the average attendance at home games for 30 baseball teams in a recent year. Complete parts a through d below.
c) Interpret the meaning of the slope of the regression line in this context. Select the correct choice below and fill in the answer boxes within your choice. - Every run scores adds an average of 56.641 people in attendance. d) In general, what would a negative residual mean in this context? - It means the team's average attendance is lower than the expected average for a team that scores that many runs.