Probability & Statistics Module #3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Students at a large state university system are upset over the rate at which their fees have increased in the last five years​ (2005-2010). A small group present before the state legislature and report a predicted fee for 2020 based on their model. What error could they be accused​ of?

extrapolating

A student in an intro stats course collects data at her university. She wants to model the relationship between student jobs and GPA. She collects a random sample of students and asks each for their GPA and the number of hours per week they work. What graph should she make to check the conditions for linear​ regression?

A scatterplot of work hours and GPA.

Shown on the right is a scatterplot of the production budgets​ (in millions of​ dollars) vs. the running time​ (in minutes) for major release movies in 2005. Dramas are plotted as red​ x's and all other genres are plotted as blue dots. A separate least squares regression line has been fitted to each group. a) What are the units for the slopes of these​ lines? b) In what way are dramas and other movies similar with respect to this​ relationship? c) In what way are dramas different from other genres of movies with respect to this​ relationship?

A. Million dollars per minute B. They have the same rate of increase in budget per increase in runtime. C. On average dramas cost about​ $20 million less for the same runtime.

A random sample of records of home sales from Feb. 15 to Apr.​ 30, 1993, from the files maintained by the Albuquerque Board of Realtors gives the Price and Size​ (in square​ feet) of 117 homes. A regression to predict Price​ (in thousands of​ dollars) from Size has an R2 of 71.4%. The residuals plot indicated that a linear model is appropriate. Complete parts a through c below. a) What are the variables and units in this​ regression? b) What units does the slope​ have? c) Do you think the slope is positive or​ negative? Explain.

A. Price​ (in thousands of​ dollars) is y and Size​ (in square​ feet) is x B. The slope has units of thousands of dollars per square foot. C. The slope is positive. As the size of the home​ increases, the price should also increase.

Tell what each of the residual plots to the right indicates about the appropriateness of the linear model that was fit to the data. A. left top bottom curve up to the right B. Scattered widely across C. left bottom top curve bottom right

A. The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear. B. The scattered residuals plot indicates an appropriate linear model. C. The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear.

A researcher wants to determine if the nicotine content of a cigarette is related to​ "tar". A collection of data​ (in milligrams) on 29 cigarettes produced the accompanying​ scatterplot, residuals​ plot, and regression analysis. a) Is the linear model appropriate​ here? Explain. b) The linear model on tar content accounts for​ 92.4% of the variability in nicotine content.

A. The linear model could be appropriate. There is some curvature to the residuals but not enough to completely disregard the linear model. Some more data points may be required. B. The linear model on tar content accounts for​ 92.4% of the variability in nicotine content.

Is there evidence that the age at which women get married has changed over the past 100​ years? The accompanying scatterplot shows the trend in age at first marriage for American women. a) Is there a clear​ pattern? Describe the trend. b) Is the association​ strong? c) Is the correlation​ high? Explain. d) Is a linear model​ appropriate? Explain.

A. The trend appears to be linear up to about​ 1940, but from 1940 to about 1970 the trend appears to be nonlinear. From 1975 or so to the​ present, the trend appears to be linear. B. The association appears relatively strong. C. ​No, as a whole the graph is clearly nonlinear. D. No, a linear model would not be​ appropriate, although one could fit a linear model to the period from 1975 to 2003.

A least squares regression line was calculated to relate the length​ (cm) of newborn boys to their weight in kg. The line is weight=−5.22+0.1635 length. Explain in words what this model means. Should new parents​ (who tend to​ worry) be concerned if their​ newborn's length and weight​ don't fit this​ equation? a)What does the given model​ mean? b)Should new parents​ (who tend to​ worry) be concerned if their​ newborn's length and weight​ don't fit this​ equation?

A. The weight of a newborn boy can be predicted as −5.22 kg plus 0.1635 kg per cm of length. B. No, because this is a model fit to data. No particular baby should be expected to fit this model exactly.

If you find any outliers or high leverage points in your​ data, you should delete them from the analysis.

False

A variable that is not part of the model but affects the way variables in the model appear to be related is called​ a(n) _____________.

Lurking variable

There is a strong correlation between the temperature and the number of skinned knees on playgrounds. Does this tell us that warm weather causes children to​ trip?

No. In warm​ weather, more children will go outside and play.

Some friends of yours in a political science class are angry about a new town ordinance restricting​ off-campus parties. They make an online survey asking​ students' opinions. This type of sampling might be classified as a​ __________ sample.

convenience

For many​ people, breakfast cereal is an important source of fiber in their diets. Cereals also contain​ potassium, a mineral shown to be associated with maintaining a healthy blood pressure. An analysis of the amount of fiber​ (in grams) and the potassium content​ (in milligrams) in servings of 77 breakfast cereals produced the regression model Potassium=38+27Fiber. If your cereal provides 9 grams of fiber per​ serving, how much potassium does the model estimate you will​ get?

281 milligrams of potassium

Which of the following is a true statement about​ residuals? a)The regression line is the line that minimizes the standard deviation of the residuals. b) The residual plot for a model that a good fit to the data should not show any pattern or have any unusual features. c) A residual is the difference between the actual data value and the value predicted by the model. d) All of the above

All of the above

Least squares means that the square of the largest residual is as small as it could possibly be.

False. Least squares means that the sum of the squares of all the residuals is minimized.

Choose the linear model that passes through the most data points on the scatterplot.

False. The line usually touches none of the points. Minimize the sum of the squared errors.

A student in an intro stats course collects data at her university. She wants to model the relationship between student jobs and GPA. She collects a random sample of students and asks each for their GPA and the number of hours per week they work. She checks the conditions and makes a linear model. If GPA is the response​ variable, what units will the slope of her line​ be?

GPA points/hr

Noting a recent study predicting the increase in cell phone​ costs, a friend remarks that by the time​ he's a​ grandfather, no one will be able to afford a cell phone. Explain where his thinking went awry.

He is extrapolating into the future. It is impossible to know if a trend like this will continue so far into the future.

A CEO complains that the winners of his​ "rookie junior executive of the​ year" award often turn out to have less impressive performance the following year. He wonders whether the award actually encourages them to slack off. Can you offer a better​ explanation? Which of the following is a better explanation for why the winners of the​ "rookie junior executive of the​ year" award often turn out to have less impressive performance the following​ year?

Perhaps they​ weren't really better than other rookie​ executives, but just happened to have a lucky year.

An analysis of the amount of fiber​ (in grams) and the potassium content​ (in milligrams) in servings of 77 breakfast cereals produced the regression model Potassium=39+28Fiber. Explain what the slope means.

The model predicts that cereals will have approximately 28 more milligrams of potassium for every additional gram of fiber.

In justifying his choice of a​ model, a student​ wrote, "I know this is the correct model because R2=​99.4%." a) Is this reasoning​ correct? Explain. b) Does this model allow the student to make accurate​ predictions? Explain.

A. No. The scatterplot should be examined first to see if the conditions are satisfied. B. No, the linear model might not fit the data everywhere.

A regression of total revenue on ticket sales determined by a concert production company is given below. Revenue = -12,289 + 33.12 * Ticket Sales a) Management is considering adding a​ stadium-style venue that would seat​ 10,000. What does this model predict that revenue would be if the new venue were to sell​ out? b) Why would it be unwise to assume that this model accurately predicts revenue for this​ situation?

A. Revenue = $318,911 B. An extrapolation this far from the data is unreliable.

You are doing a study for a​ non-profit group helping​ at-risk children in your city. Suppose you know that​ 14.2% of the children in your city live in poverty. This percentage is an example of a ​ __________.

Population parameter

Which matters more about a sample you draw from a​ population?

The size of the sample

A study based on data in which no one manipulates any experimental factors is called an​ _______________.

observational study

The model predicts that cereals will have approximately 28 more milligrams of potassium for every additional gram of fiber.

The true potassium contents of cereals vary from the predicted amounts with a standard deviation of 30.98 milligrams.

The residuals are the observed​ y-values minus the​ y-values predicted by the linear model.

True. The residuals are the observed​ y-values minus the​ y-values predicted by the linear model.

To look for​ outliers, and to check the Equal Variance​ Assumption, a​ ____________ should be created.

residual plot

If our data in a scatterplot is​ "straight enough" we can model our data with a​ _____.

linear model

there any evidence that an​ animal's gestation period is related to the​ animal's lifespan? The scatterplot shows Gestation Period​ (in days) vs. Life Expectancy​ (in years) for 18 species of mammals. The highlighted point at the far right represents humans a) For these​ data, r=0.541. This is not a very strong relationship. Do you think the association would be stronger if humans were​ removed? Explain. b) Is there reasonable justification for removing humans from the data​ set? Explain. c) Here are the scatterplot and regression analysis for the 17 nonhuman species. Comment on the strength of the association. d) Interpret the slope of the line. e) A certain mammal has a life expectancy of about 24 years. Estimate the expected gestation period of this species.

A. Stronger. Both slope and correlation would increase. B. Yes, restricting the study to nonhuman animals would justify it. C. The association is moderately strong D. On​ average, for every year increase in life​ expectancy, the gestation period increases by about 12.97 days. E. 395.4 days

A concert production company examined its records. The manager made the following scatterplot. The company places concerts in two​ venues, a​ smaller, more intimate theater​ (plotted with blue​ circles) and a larger​ auditorium-style venue​ (red x's). a) Describe the relationship between talent cost and total revenue.​ (Remember: direction,​ form, strength,​ outliers.) b) How are the results for the two venues​ similar? c) How are they​ different?

A. The scatterplot shows a strong positive linear relationship between talent cost and total revenue. There is 1 outlier that stands apart from the majority of the data. B. Both venues show an increase of revenue with talent cost. C. The larger venue has greater variability. Revenue for that venue is more difficult to predict.

You recently began an internship at your local chapter of savethepigeons.com. Concerned about a city ballot initiative dealing with the​ environment, you conduct a telephone survey of local residents. What are some possible sources of bias in your​ results?

Response bias, non-response bias, and undercoverage of the population

You are trying to study the amount of financial aid students at your University receive. You sample 50 students and find out the average size of their financial aid packages. The average of your sample is a​ __________.

Sample statistic

An analysis of spending by a sample of credit card bank cardholders shows that spending by cardholders in January​ (Jan) is related to their spending in December​ (Dec). The assumptions and conditions of the linear regression seemed to be satisfied and an analyst was about to predict January spending using the model Jan=$612.07+$0.403•Dec Another analyst worried that different types of cardholders might behave differently. She examined the spending patterns of the cardholders and placed them into five market segments. Then she plotted the data using different colors and symbols for the five different segments. Look at this plot carefully and discuss why she might be worried about the predictions from the model.

The different segments are not scattered at random throughout the residual plot. Each segment may have a different relationship.

For many​ people, breakfast cereal is an important source of fiber in their diets. Cereals also contain​ potassium, a mineral shown to be associated with maintaining a healthy blood pressure. An analysis of the amount of fiber​ (in grams) and the potassium content​ (in milligrams) in serving of 77 breakfast cereals produced the regression model Potassium=38+27Fiber. From this model you can estimate a​ cereal's potassium content from the amount of fiber it contains. In this​ context, what does it mean to say that a cereal has a negative​ residual?

The potassium content is actually lower than the model predicts for a cereal with that much fiber.

Researchers collected data on the annual mortality rate​ (deaths per​ 100,000) for males in 20 large towns and the water hardness in terms of the calcium concentration​ (parts per​ million, ppm) in the drinking water. a) The display to the right shows the relationship between mortality and calcium concentration for these towns. Describe what you see in this​ scatterplot, in context. b)Here is the regression analysis of mortality and calcium concentration. What is the regression​ equation? c) Interpret the slope of this line in context. d) Explain the meaning of the​ y-intercept of the line. e) The largest residual has a value of 81.2. Explain what this value means. f) The hardness of a certain​ town's municipal water is about 239 ppm of calcium. Use this equation to predict the mortality rate in this town. g) Explain the meaning of​ R-squared in this situation.

A. There is a fairly​ strong, negative, linear relationship between calcium concentration and mortality rate. Towns with harder water tended to have lower mortality rates. B. 1852.377 + -5.031x C. For each additional point in Calcium (ppm)​, the model predicts a decrease of 5.031 points in Mortality. D. The model predicts that a town with 0 ppm calcium concentration would have a mortality rate of 1852.377. E. The town had 81.2 more deaths per​ 100,000 people than the model predicts. F. The town is expected to have a mortality rate of 649.968 deaths per​ 100,000 people. G. 97.4% of the variability in the mortality can be accounted for by a linear model on calcium concentration.

Which of the following is a characteristic of a good​ experiment?

Comparative, Randomization, placebo controlled, and double blinded

A regression analysis of 117 homes for sale produced the following​ model, where price is in thousands of dollars and size is in square feet. Price=47.88+0.068​(Size) ​a) Explain what the slope of the line says about housing prices and house size. ​b) What price would you predict for a 2500​-square-foot house in this​ market? ​c) A real estate agent shows a potential buyer a 1300​-square-foot ​house, saying that the asking price is ​$6500 less than what one would expect to pay for a house of this size. What is the asking​ price d) what is the ​$6500 ​called?

A. For every additional square foot of area of a​ house, the price is predicted to increase by ​$68. B. $217,880 C. $129,780 D. Residual

he scatterplot provided shows the gestations periods and life expectancies for several animal species. The plot contains two points that may be of concern. The point in the upper right corner of this scatterplot is for​ elephants, and the other point at the far right is for hippos. a) By removing one of these​ points, we could make the association appear to be stronger. Which​ point? Explain. b) Would the slope of the line increase or​ decrease? c) Should we just keep removing animals to increase the strength of the​ model? Explain. d) The slope of the​ scatterplot's regression line is 15.5. If we remove elephants from the​ scatterplot, the slope of the regression line becomes 11.6 days per year. Do you think elephants were an influential​ point? Explain.

A. Removing hippos would make the association​ stronger, since hippos are more of a departure from the pattern. B. ​Increase, because the point for the hippos is below the regression line. C. No. Only data points that are outliers should be removed. D. Yes, removing it lowered the slope significantly.

Players in any sport who are having great​ seasons, turning in performances that are much better than anyone might have​ anticipated, often are pictured on the cover of Sports Illustrated.​ Frequently, their performances then falter​ somewhat, leading some athletes to believe in a​ "Sports Illustrated​ jinx." Similarly, it is common for phenomenal rookies to have less stellar second​ seasons, the​ so-called "sophomore​ slump." While​ fans, athletes, and analysts have proposed many theories about what leads to such​ declines, a statistician might offer a simpler​ (statistical) explanation. Explain. What would be a better explanation for the decrease in performance of the Sports Illustrated cover​ athlete?

People on the cover are usually there for outstanding performances. Because they are so far from the​ mean, the performance in the next year is likely to be closer to the mean.

A group of students in your intro stats class design an experiment to test whether popcorn stored in the freezer pops better​ (fewer kernels left​ un-popped) than room temperature popcorn. In​ addition, they also want to test different power levels on their microwaves. The​ factor(s) in this experiment​ is(are) ___________________.

are temperature and power


Conjuntos de estudio relacionados

MGMT 862 - Organizational Behavior - Chapter 7 Trust, Justice and Ethics

View Set

Prep U Exam 6 -Chapter 43: Loss, Grief, and Dying - ML3

View Set

pass point health promotion and maintenance

View Set

EMT Chap 20: immunologic emergencies

View Set