STAT 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is a reasonable value for level of significance?

0.05

Teresa thinks that a person's age and BMI and a runner's 200 M dash time influence a runner's 400 meter time. She samples 17 runners and runs a multiple linear regression. The MSE is 24.4 and the mean square of the model is 123.5. The R2 multiple correlation coefficient is _________. Please choose the correct answer from the following choices, and then select the submit answer button. 0.2327

0.5404 SSM=370.5, SSE=315.12. R2=370.5/(315.12+370.5)=.5404

Valerie wishes to conduct a study where she wants to see if the height of the father and the height of the mother help to predict the height of their child. She samples 22 fully grown males. She measures their heights and the heights of each of their parents. The value of R2 is ________.

0.56 This is SSM/SST=1204/2150=0.56

Frank is a high school mathematics teacher. He is interested in what habits affect his students' final exam performance. He surveyed a random 60 out of 100 students in his classes and asked each one how many hours he or she spent studying. He also rated their class participation on a scale from 1 to 10. His predicted model equation is: ŷ = 40 + 2.25*Studying + 3.89*Participation. A student spent 2 hours studying and got a 3 in participation and received a 72 on the exam. The value of the residual is _______.

15.83 ei = yi - ŷi = 72 - 56.17 = 15.83

Sally is trying to predict model home values for her city based on home size and number of bedrooms. She looked at the last 32 sales in her city. The predicted response equation is. For a 95% confidence interval, the value for t* is _______ (round to 3 decimal places).

2.045 This is found by looking up the critical value on table D, using 32-2-1=29 degrees of freedom and a confidence level of 95%.

z* for 98% confidence is ________. (Hint: Use t-Table A, or software.) Please answer to three decimal places.

2.326 For 98% confidence, z* equals 2.326. To find this using the t-Table, go to the top row of the table showing the confidence levels. Find the column for 98% confidence. Follow that column down to the 3rd row from the bottom called the z* row. You should read 2.326 in the intersection of the 98% column and z* row.

Frank is a high school mathematics teacher. He is interested in what habits affect his students' final exam performance. He surveyed a random 63 out of 100 students in his classes and asked each one how many hours he or she spent studying. He also rated their class participation on a scale from 1 to 10. His predicted model equation is: ŷ = 40 + 2.25*Studying + 3.89*Participation. The standard errors are SEbStudying = 1.01 and SEbParticipation = 1.78. The value of t* for the 99% confidence intervals is ______. (Round to 4 decimal places)

2.6603 Degrees of freedom= 63-2-1=60. The t* value for 99% confidence level is 2.6603

True or False: The U.S. Census Bureau gathered data to find the percentage of each state's population living in a metropolitan area (x) and each state's per capita spending in dollars (y). Statistical software gives the following table of results for the least-squares regression line. Because the confidence interval for the slope β1 is (-0.548, 0.746), sometimes the linear relationship between x and y is negative and sometimes positive, depending of the value of x.

False

True or false: The P-value is the probability of failing to reject a true null hypothesis.

False

An endocrinologist is interested in the effects of depression on the thyroid. It is believed that healthy subjects have a mean thyroxin (a hormone related to thyroid function) level of 7.0 micrograms/100 ml and a standard deviation of 1.6 micrograms/100 ml. The endocrinologist wants to assess whether the mean thyroxin level is different for those with depression. She samples 35 subjects with depression and obtains a sample mean of 7.82 micrograms/100 ml for thyroxin. She finds the P-value to be 0.0124 for testing H0: μ = 7 versus Ha: μ ≠ 7. Is the following interpretation of this P-value correct or incorrect? The P-value is the probability that the true mean thyroxin is 7 micrograms/100 ml.

False The P-value is NOT about the chance the null hypothesis is true.

True or false: Whenever the symbol > is found in the alternative hypothesis, we declare the test to be two-sided.

False > is only one direction (greater than the claimed value), so it is a one-sided test.

True or False: Edwin Hubble collected data from 24 galaxies measuring the distance a galaxy is from Earth (x) and the velocity with which it appears to be receding (y) to investigate if there was a linear relationship between the two variables. He used the following model: μy = β0 + β1x. Based on the information given, the 95% confidence interval for slope is (378.92, 529.40).

False For a confidence interval for slope, we use the general form for a confidence interval: estimate ± (t-table value)*(SE of the estimate). This interval did not use a value for t in its calculation.

True or false: We never express hypotheses in terms of population parameters.

False Hypotheses are claims about population parameters, never sample outcomes.

True or False: In a multiple linear regression, the standard deviation, σ, is assumed to be different for all subpopulations.

False Multiple linear regression assumes that the standard deviation of the response is the same in all subpopulations.

2.045 This is found by looking up the critical value on table D, using 32-2-1=29 degrees of freedom and a confidence level of 95%.

False SSM=370.5, SSE=315.12. R-squared=370.5/(315.12+370.5)=.5404

True or False: Teresa thinks that a person's age and BMI and a runner's 200 M dash time influence a runner's 400 meter time. She samples 17 runners and runs a multiple linear regression. The MSE is 24.4 and the mean square of the model is 123.5. The R2 multiple correlation coefficient is 0.8359.

False SSM=370.5, SSE=315.12. R-squared=370.5/(315.12+370.5)=.5404

True or False: The U.S. Environmental Protection Agency collected data from 82 automobiles to study the relationship between vehicle weight (x) and average miles per gallon (y). Statistical software gives the following table and s value 4.281 for the least-squares regression line.

False The regression standard error does not directly give the margin of error for a confidence interval. The confidence interval for the slope uses the standard error of the estimate b1, so the confidence interval is b1 ± t*SEb = -1.112 ± (1.990)(0.058).

True or False: Edwin Hubble collected data from 24 galaxies measuring the distance a galaxy is from Earth (x) and the velocity with which it appears to be receding (y) to investigate if there was a linear relationship between the two variables. He used the following model: μy = β0 + β1x. For a confidence interval for slope, we use the general form for a confidence interval: estimate ± (t-table value)*(SE of the estimate). Based on the information in the table, the value for the standard error of the estimate is 83.44.

False The value given is the standard error of the least-squares intercept a. To calculate the confidence interval for slope, we need SEb = 75.24, the standard error of the least-squares slope b.

True or False: The statistical model for a multiple linear regression is: µy = β0 + β1x1 + β2x2 + ... + βpxp.

False This is the equation for the mean response. It is a function of the explanatory variables.

True or False: Sally is trying to predict model home values for her city based on home size and number of bedrooms. She looked at the last 32 sales in her city. The predicted response equation is . Her alternative hypothesis is Ha:βsqfeet ≠ 0. The p-value for this test is .0067.

False This is the one-tailed p-value. We are doing a two-tailed test so you need to multiply this number by 2.

True or False: The Chicago Bears believe that their quarterback's passer rating [0-158.3] and total running yards influence their point total in each game. The population regression equation is: µpoints = β0 + β1PassRating + β280.

False This is the subpopulation regression equation for games with 80 yards of rushing.

True or False: Sunnyville High School wishes to predict a student's ACT test performance. They believe that age (in years), GPA [0-4.0], and average final exam grades [0-100] for the previous semester all influence a student's ACT score. The population regression equation is: µscore = β0 + β1Age + β2GPA + β3Final Exam.

False This regression equation does not include the intercept.

The label on a can of a particular brand of extra large olives states that there are about 33 olives in each can. A gourmet cook feels that the claim of 33 olives per can is too high, but that the average number of olives per can is fewer than 33. He samples 35 cans and finds x¯ = 32.9. What alternative hypothesis does he want to test?

Ha: μ < 33

Consider the plot shown below. Which of the following conditions for regression inference is violated according to the plot?

None of the three conditions cited here are violated according to this plot. The scatter of points in this residual plot supports the conditions of a linear relationship between x and y and constant standard deviation. We would still need to check the Normality condition before proceeding with inference of the regression.

An outcome that would rarely happen if a claim were true is good evidence that the claim is

Not true

A real estate firm collected data from home sales to study the relationship between area of living space measured in square feet and the selling price in thousands of dollars. Suppose a realtor wants to estimate the mean selling price for all homes that have 2000 ft2 of living space. Statistical software gives the following table.

She is 95% confident that the mean selling price is between $123,022 and $132,009. The 95% confidence interval for mean selling price is needed to say something about all 2000 ft2 homes

An observed outcome that would rarely happen if a claim were true is good evidence for what conclusion?

That the claim is not true. An observed outcome that would rarely happen if a claim were true is good evidence that the claim is not true.

When we say that our confidence is 95%, what is our confidence in?

The confidence interval procedure. Our confidence is in the procedure used to calculate our confidence interval. For 95% confidence, the confidence interval procedure creates confidence intervals that capture the population parameter 95% of the time.

True or false: The primary purpose of a test of significance is to give a clear statement of the degree of evidence against the null hypothesis.

True

True or false: A randomized trial of interventions for reducing transmission of HIV reported an incident rate ratio of 1.00, meaning that the intervention group and the control group both had the same rate of HIV-1 Infection. The 95% confidence interval was 0.63 to 1.58. We should conclude that the intervention may help and may hurt - more data are needed.

True Right - the confidence interval makes this clear.

true or false: An airline wants to know the average time it takes their passengers to claim their luggage from the carousel. The time to claim luggage for this airline is known to be Normally distributed with mean μ and standard deviation σ = 5 minutes. The airline took a simple random sample of 10 passengers and calculated a 95% confidence interval to be 21.9 to 28.1 minutes. Suppose the airline decides to increase the confidence level to 99% and they don't want to increase their margin of error. They can increase their sample size.

True Since increasing the confidence level from 95% to 99% will increase the margin of error, they need to do something to decrease the margin of error. Increasing the sample size will do this.

True or False: ABC insurance company wishes to predict the amount of money a customer will cost them if they have a claim. They believe that age (in years), sex, and car age (in years) all affect the cost of the accident. The population regression equation is µcost = β0 + β1Sex + β2Age + β3CarAge.

True The population regression equation says that the mean response depends on the explanatory variables.

True or False: In a linear regression of y predicted by x, the standard deviation of y is assumed to be the same for all values

True The standard deviation of y allows us to determine whether the data points fall close to the regression line or are widely scattered. A condition for inference is that the spread of response values is constant across all values of x.

True or False: We assume that the residuals of a multiple linear regression are an SRS of the standard normal distribution. Please choose the correct answer from the following choices, and then select the submit answer button.

True We assume the deviations (residuals) are independent and normally distributed with a mean of zero and a constant standard deviation.

True or false: When doing regression, it is possible to differentiate between the regression line for a sample and for the whole population.

True. Good - estimating features of the population is usually the goal of doing regression on a sample.

In a test of significance we make a claim and ask if the data give evidence ____________ it.

agianst

Suppose we are testing H0: μ = 100 versus Ha: μ > 100. Which of the following will have the smallest P-value?

bar(x} = 120 Since this test is one-sided with greater than in the alternative, an x¯ of 120 is the farthest above μ = 100 and will have the smallest P-value.

To estimate the mean response μy = β0 + β1x when x takes on a specific value, a ________ should be used.

confidence interval

An airline wants to know the average time it takes their passengers to claim their luggage. The time to claim luggage for this airline is known to be Normally distributed with mean μ and standard deviation σ = 5 minutes. If the airline took a simple random sample of 100 passengers instead of 10 passengers, the margin of error for a 95% confidence interval would ____________.

decrease

Increasing the desired margin of error will ________________ the required sample size.

decrease Increasing the desired margin of error will require a smaller sample size. The margin of error is in the denominator. Making that larger will lower the result of the calculation.

Edwin Hubble collected data from 24 galaxies measuring the distance a galaxy is from Earth (x) and the velocity with which it appears to be receding (y) to investigate if there was a linear relationship between the distance and the velocity. He used the following model: μy = β0 + β1x. Then, β1 represents the average change in velocity for a one megaparsec increase in ________ for all galaxies.

distance Since β1 is a parameter for slope, it signifies the average rate of change in velocity given distance.

A real estate firm collected data from home sales to study the relationship between area of living space measured in square feet and the selling price in dollars. If we compute a 95% prediction interval for the selling price of a 1700 ft2 home and a 95% confidence interval for the mean selling prices of all 1700 ft2 homes, the center of the prediction interval will be ________ the center of the confidence interval.

equal to The least-squares regression line provides the center for both of these intervals. Both would use ŷ = β0 + β1(1700) as the estimate for the response.

Frank is a high school mathematics teacher. He is interested in what habits affect his students' final exam performance. He surveyed a random 60 out of 100 students in his classes and asked each one how many hours he or she spent studying. He also rated their class participation on a scale from 1 to 10. The response variable is __________.

final exam performance

Sample statistic values that are unlikely to occur if a claim were true are said to be ______________ with the claim.

inconsistent

We assume that with a linear relationship between two variables, for any fixed value of x, the response y follows a ________ distribution.

normal

A test of significance uses evidence provided by sample data to assess whether a claim about ___________ is supported or refuted.

parameter

We always express hypotheses in terms of

parameter

The mean response has a linear relationship with x given by a population regression line with slope β1 and intercept β0. Since they are values that correspond to the entire population, they are unknown ________.

parameters Because the slope β1 and intercept β0 are values based on the population, they are unknown parameters and must be estimated with sample statistics b and a, respectively.

If P-value < α, we ___________________ the null hypothesis.

reject

Which of the following values provides an estimate for the amount of variability of the response y about the regression line?

s This is the estimate of σ based on the sample data.

If the population standard deviation were 2 instead of 5, the margin of error would be _______.

smaller Smaller standard deviations will make the margin of error smaller, because we are changing σ in the formula ME.

In a linear regression of y predicted by x, the population ________ of y is assumed to be the same for all values of x.

standard deviation The standard deviation of y allows us to determine whether the data points fall close to the regression line or are widely scattered. A condition for inference is that the spread of response values is constant across all values of x.

The purpose of a confidence interval for μ is to give a range of reasonable values for __________________.

the population mean A confidence interval for μ gives a range of reasonable values for the unknown population mean, μ. Its basic purpose is to estimate the value of the parameter μ.

True or false: In a test of significance we typically want to find evidence against the null hypothesis.

true

Edwin Hubble collected data from 24 galaxies measuring the distance a galaxy is from Earth (x) and the velocity with which it appears to be receding (y) to investigate if there was a linear relationship between the distance and the velocity. He used the following model: μy = β0 + β1x. Then, β1 represents the average change in ________ for a one megaparsec increase in distance for all galaxies.

velocity Since β1 is a parameter for slope, it signifies the average rate of change in velocity given distance.

The reasoning of a test of significance, as all inference, is based on _________________.

what would happen if we repeated the procedure many times This is the basis of all inference including confidence intervals.

Suppose we are testing a claim that μ = 60 and that the standard deviation of the sampling distribution of x¯ equals 2. Which of the following sample means gives the most evidence against the claim?

x¯ = 54 This value is furthest from the claimed mean of 60, so is least likely to happen

Edwin Hubble collected data from 24 galaxies measuring the distance a galaxy is from Earth (x) and the velocity with which it appears to be receding (y) to investigate if there was a linear relationship between the variables using the following model: μy = β0 + β1x. Based on the information in the table, what is the correct equation for the least-squares regression line?

ŷ = −40.78 + 454.16x

ABC insurance company wishes to predict the cost of the accident if they have a claim. They believe that age (in years), sex, and car age (in years) all effect the cost of the accident. The population regression equation is:

µcost = β0 + β1Sex + β2Age + β3CarAge This is the correct population regression equation.

The label of a particular brand of extra large olives states that there are about 33 olives in each can. A gourmet cook feels that the claim of 33 olives per can is too high, but that the average number of olives per can is less than 33. He samples 35 cans and finds x¯ = 32.9. Assuming that the standard deviation is 0.5 olives, the value of the test statistic for testing H0: μ = 33 is __________. (Give answer as X.XX) Please type the correct answer in the following input field, and then select the submit answer button or press the enter key when finished.

-1.18 The test statistic is z= x-u /(o/ sqr(n)) Which is 32.9 -33/(0.5/sqr35) = -1.183 which is rounded to -1.18.

Brittany believes that the height of a car (in feet), the outdoor temperature (Fahrenheit), and the average driving speed (Miles per hour) all help explain the miles per gallon that a car gets. She takes a random sample of 28 cars throughout the United States and collects this data on each of their cars for 15 gallons of gas. The fitted regression equation is

0.035; 0.201 Correct. 0.118 ± 2.064(.04) = [0.035, 0.201]`

Frank is a high school mathematics teacher. He is interested in what habits affect his students' final exam performance. He surveyed a random 63 out of 100 students in his classes and asked each one how many hours he or she spent studying. He also rated their class participation on a scale from 1 to 10. From the following ANOVA table, you can calculate the R2 value to be _________.

0.339 R2=SSM/SST=1648/4867=.3386

Sally is trying to predict model home values for her city based on home size and number of bedrooms. She looked at the last 32 sales in her city. From the following ANOVA table, the R2 multiple correlation coefficient is _______

0.4841 SSM/SST=1275/1359=0.4841

Brittany believes that the height of a car (in feet), the outdoor temperature (Fahrenheit), and the average driving speed (Miles per hour) all help explain the miles per gallon that a car gets. She takes a random sample of 28 cars throughout the United States and collects this data on each of their cars for 15 gallons of gas. The Mean Square of the Error in the following ANOVA table is ________.

21.78436 Correct. MSE=522.8247/24=21.7844

Valerie wishes to conduct a study where she wants to see if the height of the father and the height of the mother help to predict the height of their child. She samples 22 fully grown males. She measures their heights and the heights of each of their parents. From the ANOVA table below, the Mean Square of the Error is ________.

49.79

There are many factors that determine the selling price of a home. Sally is interested in estimating how much her home and her friends' homes are worth in the current market. She believes that the number of bedrooms, the total square footage, the number of bathrooms, sale price of nearby homes, and the location (census block group) are all important factors. There are p=_____ explanatory variables.

5

The predicted response equation is ŷ = 55.2 + 5.1x1 - 3.2x2. If x1 = 5 and x2 = 3, the predicted value of y is ________.

71.1 ŷ = 55.2 + 5.1(5) - 3.2(3) = 71.1f

Regarding linear regression, which of the following does not accurately describe what a residual is?

A point that falls outside of the overall pattern of the regression line

If sample statistic values are unlikely to occur if a claim about a parameter were true, what can we conclude?

FALSE That the claim should be rejected Sample statistic values that are unlikely to occur if the claim is true are NOT consistent with the claim, so they give evidence that the claim is NOT true

A light bulb manufacturer knows the mean life of their standard light bulb is 1500 hours. They have developed a new full spectrum light bulb and are interested in testing whether the mean life of their new light bulb is different than their standard light bulb. The mean life of a random sample of 100 full spectrum light bulbs was x¯ =1460 hours. What direction should you use in alternative hypotheses for this study?

Different from They want to see if the mean life of their new light bulb is different than their standard bulb. So, they want different from in the alternative.

What is statistical inference on μ?

Drawing conclusions about a population mean based on a sample mean.

Backgammon's Pizza claims that their mean delivery time is 30 minutes. Because you had to wait so long for your pizza, you decide to test whether the average delivery time is longer than 30 minutes. A random sample of 15 deliveries had a mean delivery time of x¯ = 33.5 minutes. What are the appropriate null and alternative hypotheses for this study?

H0: μ = 30 versus Ha: μ > 30 You suspect that the mean delivery time is longer than 30 minutes, so this is what you want to establish.

An endocrinologist is interested in the effects of depression on the thyroid. It is believed that healthy subjects have a mean thyroxin (a hormone related to thyroid function) level of 7.0 micrograms/100 ml and a standard deviation of 1.6 micrograms/100 ml. The endocrinologist wants to assess whether the mean thyroxin level is different for those with depression. She samples 35 subjects with depression and obtains a sample mean of 7.82 micrograms/100 ml for thyroxin. What null and alternative hypotheses should she test?

H0: μ = 7 versus Ha: μ ≠ 7

True or False: There are 66 degrees of freedom in a model with 72 observations and 6 explanatory variables in the model.

Incorrect. The number of degrees of freedom in the model is n-p-1= 72-6-1=65

We reject the null hypothesis whenever P-value is _________ α.

Less than

What tells us how close is likely to be to μ?

The sampling distribution of This provides information so that we can figure out how close might be to μ

The manufacturer of a new type of light bulb did a study to estimate its mean life. They reported that the mean life of the light bulb is 1600 hours with a margin of error of 25 hours with 95% confidence. Interpret this margin of error.

The true mean life of the light bulb differs from the estimated mean by no more than 25 hours for the middle 95% of the 's from all possible samples of size n. If the study was conducted over and over again, 95% of the sample means will differ from the true mean by no more than 25 hours.

A very large school district in Connecticut wanted to estimate the average SAT (for college admissions) score of this year's graduating class. The district took a simple random sample of 100 seniors and calculated the 95% confidence interval for μ as 505 to 520 points. Consider this interpretation of the confidence interval: For the sample of 100 graduating seniors, 95% of their SAT scores were between 505 and 520 points with 95% confidence. What needs to be done to fix this statement?

The true parameter needs to be stated correctly and in context. Confidence intervals address an unknown population mean. This interval talks about what the scores in the sample were. We know what the values in the sample were. Further, individual values are more variable than means. A better interpretation is, With 95% confidence, the mean SAT score of all graduating students in this Connecticut school district is between 505 and 520 points.

The values β0, β1, and σ are ________ for a population regression line that describes the linear relationship between two variables.

The values β0, β1, and σ are ________ for a population regression line that describes the linear relationship between two variables.

True or False: A real estate firm collected data from home sales to study the relationship between area of living space measured in square feet and the selling price in dollars. If we give a prediction interval for one home whose size is 2000 ft2, this interval estimates the selling price for that one home. Please choose the correct answer from the following choices, and then select the submit answer button.

True The described interval for a single home is a prediction interval, whereas a confidence interval estimates the mean selling price for all homes of size 2000 ft2.

True or False: To find an interval estimate for the response y of an individual when x takes on a specific value, a prediction interval should be used.

True Since y could take on different values due to variability, we use a prediction interval.

A randomized trial of interventions for reducing transmission of HIV reported an incident rate ratio of 1.00, meaning that the intervention group and the control group both had the same rate of HIV-1 Infection. The 95% confidence interval was 0.63 to 1.58. We should conclude:.

_The intervention may help and may hurt &ndash

To estimate the mean response when x takes on a specific value, a ________ should be used.

confidence interval Since the mean response μy is a parameter whose value is fixed, we use a confidence interval.

What describes how often confidence intervals will capture the true parameter value in repeated sampling?

confidence level The confidence level is the success rate for the confidence interval estimation method.

When data are a random sample from a population, the sample mean is an estimate of the _______ mean.

population

To find an interval estimate for the response y of an individual when x takes on a specific value, a ________ interval should be used.

prediction Since y could take on different values due to variability, we use a prediction interval.

Which one of the following is NOT part of a confidence interval for μ?

the population size The population size does not affect the confidence interval for μ. The population size does not matter as long as it is large relative to the sample size.

What do we hope to capture within a confidence interval?

the value of the unknown parameter

You run a two-sided statistical test for a population mean, and find z = 2.34. At a significance level of .05, you should conclude that ________ Please choose the correct answer from the following choices, and then select the submit answer button.:

there is good evidence to support the alternative hypothesis.

The slope of the population regression line ________ is estimated by the unbiased estimator ________

β1 ... b1 β1 is the population parameter for slope, and b1 is the sample statistic.


Ensembles d'études connexes

Series 7 Top-off - Chapter 1 **copy**

View Set

NC Accidental & Health Agent Prep

View Set

Level 2 English Foundations 3: ③ Count the Syllables (Learn/Test)

View Set

Reading 9- The Firm and Market Structure

View Set

Penny Ch. 27: The Fetal Heart and Chest (Review questions)

View Set

Immunology Unit 3 - Chapter 11 B-Cell Activation, Differentiation, and Memory Generation

View Set

Chapter 8 The Nervous System: Neuron Structure and Glial Cells

View Set