Stat 113 Final Exam

Ace your homework & exams now with Quizwiz!

According to the Empirical​ Rule, 68% of the area under the normal curve is between μ−σ and μ+σ. What percent of the area under the normal curve is between μ and μ+σ​?

34%

A college student was interested in the average amount college students spend on entertainment each week. He randomly sampled 200 students and found the following​ 95% confidence​ interval: (24,28) dollars per week. What is the value of the margin of​ error?

$2

Suppose the equation of a​ least-squares regression line is y=−3.17−2.4x. What can be said about the​ y-intercept?

It is −3.17.

Suppose the equation of a​ least-squares regression line is y=−3.17−2.4x. What can be said about the correlation​ coefficient?

It is​ negative, but its exact value cannot be determined from the given information.

In​ regression, what can be said about the sum of the residuals of all the​ observations?

It will always be 0.

Which of the following is NOT needed to construct a​ boxplot?

Mean

Which information can you obtain from a​ stem-and-leaf plot but not from a​ histogram?

Minimum and maximum data values

Researchers wondered what the average braking distance is for new cars traveling at 60 MPH. They randomly sampled 40 new cars made by two different companies. For each​ car, the same driver obtained a speed of 60 MPH and then pushed on the brake pedal as hard as she could. The average stopping distance for these 40 cars was 155.2 feet. Suppose σ is known to be 12 feet. Assume that braking distances of the cars is independent. Are the other conditions satisfied to use the​ one-sample z-methods to construct a confidence​ interval?

No. Even though the distribution of sample means will be approximately normal because of the large sample​ size, this sample of new cars is not representative of all new cars since a random sample of all new cars was not taken.

Suppose your statistics professor teaches two sections of your course this semester. She gives the same exam to each class. Their performance is summarized below. Can she conclude that the overall mean on the exam was 80​ (the average of the two individual class​ means)? Why or why​ not? First​ Class: n=32 mean=75 standard deviation=5.6 Second​ Class: n=38 mean=85 standard deviation=7.2

No. Since the class sizes are​ different, she would need to find the weighted mean.

Suppose a newspaper surveys 250 adults in a nearby town and inquires about their cell phone carrier. The accompanying table summarizes the results. Does this table describe a relative frequency​ distribution? Why or why​ not? Carrier A Percent: 30 Carrier B Percent: 30 Carrier C Percent: 10 Carrier D Percent: 20 Carrier E Percent: 5

No. The sum of the relative frequencies is​ 95%, not​ 100%.

The numbers used to separate the classes of a frequency​ distribution, but without the gaps created by class​ limits, are called​ ____________________.

class boundaries.

A​ ____________________ is found by adding the lower and upper class limit and then dividing the sum by 2.

class midpoint

The​ ____________________ is the difference between two consecutive lower class limits or two consecutive upper class limits.

class width

A quantitative variable that has an infinite number of possible values that are not countable is called​ _______.

continuous

The​ ____________________ for a class is the sum of the frequencies for that class and all previous classes.

cumulative class frequency

A quantitative variable that has a finite or countable number of values is called​ _______.

discrete

A​ __________ random variable has either a finite or countable number of values.

discrete

Which terms are used to describe events that have no outcomes in​ common?

disjoint or mutually exclusive

The more variable the​ data, the​ _______ accurate the sample mean will be as an estimate of the population mean.

less

When performing a linear regression​ analysis, it is important that the relationship between the two quantitative variables be​ _______.

liner

​Typically, the idea of the​ _______ hypothesis is that of​ "no effect,"​ "no difference," or​ "no change."

null

The​ _________________ is/are a subset of the population that is being studied.

sample

The​ _______ of a probability experiment is the collection of all possible outcomes.

sample space

Suppose that a researcher is interested in the average standardized test score for fifth graders in a local school district. The fifth graders at a specific school would comprise a​ ___________ and their average test score would be a​ ___________.

sample; statistic

A​ z-score represents how many​ ______________ a data value is above or below the​ ______________.

standard deviation; mean

A​ _________________ is a numerical measurement describing some characteristic of a sample.

statistic

A regression was performed on test data for 37 car models to examine the association between the weight​ (thousands of​ pounds) of the car and the fuel efficiency​ (miles per gallon​ (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct​ test-statistic (with degrees of​ freedom, if​ needed) that should be used for this hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339

t35

Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph has two bumps) (Concept hw 7, question 22)

​No, because this graph is not​ bell-shaped.

In a normal​ distribution, approximately​ 99.7% of the area under the normal curve is within how many standard​ deviation(s) of the​ mean?

Three

Determine whether the graph can represent a Normal density function or explain why it cannot. (Concept hw 7, question 23)

Yes

What is a variable other than x and y that simultaneously affects both variables​ called?

a lurking variable

​A(n) ____________________ is a bar graph in which the height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the​ same, and the rectangles touch each other.

histogram

For a given degrees of​ freedom, the larger the​ chi-square statistic, the​ ____________ evidence there is to reject the null hypothesis.

more

The larger the​ sample, the​ _______ accurate the sample mean will be as an estimate of the population mean.

more

A sample is said to be ​ __________ if the statistics computed from it accurately reflect the corresponding population parameters.

representative

According to the Empirical​ Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation from the​ mean?

32%

The margin of error is​ _____________ the width of the confidence interval.

half

A correlation coefficient can be 0.

The statement is true.

A research organization wanted to estimate the average number of hours a college student sleeps per night during the school year. After randomly sampling 150 college​ students, the research organization determined the following​ 95% confidence​ interval: (7.1​ hours/night, 7.5​ hours/night). What is the value of the average number of hours slept per night during the school year for all college​ students?

​We're 95% confident that​ it's somewhere between 7.1 and 7.5 hours per night.

Can the standard deviation ever be larger than the​ variance? Explain.

​Yes; if the variance is less than​ one, then its square root​ (the standard​ deviation) will be larger than the variance.

Can a qualitative variable have values that are​ numeric? Why or why​ not?

​Yes; it is possible to have numeric variables that do not count or measure​ anything, and, as a​ result, are qualitative rather than quantitative.

A professor wondered if there was a difference in the proportion of students who dropped math classes between females and males. The professor randomly selected 20 math classes around campus and recorded the gender of the individual and whether or not a student enrolled in the class at the beginning of the term dropped the class at some point during the term. Assuming all conditions are​ satisfied, which of the following tests should the researcher​ use?

​two-sample z-test for proportions

Click the icon to view the table of areas under the​ t-distribution.

(a) Find the​ t-value such that the area in the right tail is 0.05 with 27 degrees of freedom. 1.703 ​(b) Find the​ t-value such that the area in the right tail is 0.01 with 19 degrees of freedom. 2.539 ​(c) Find the​ t-value such that the area left of the​ t-value is 0.15 with 12 degrees of freedom.​ [Hint: Use​ symmetry.] -1.083 ​(d) Find the critical​ t-value that corresponds to 95​% confidence. Assume 20 degrees of freedom. 2.086

A baseball pitcher threw a​ no-hitter. The accompanying​ side-by-side boxplot shows the pitch speed​ (in miles per​ hour) for all of his pitches during the game. Complete parts​ (a) through​ (f) below. ​(a) Which pitch is typically thrown the​ fastest? ​(b) Which pitch is most erratic as far as pitch speed​ goes? ​(c) Which pitch is more consistent as far as pitch speed​ goes, the cut fastball or the​ four-seam fastball? ​(d) Are there any outliers for the​ pitcher's cut​ fastball? If​ so, approximate the pitch speed of any outliers. Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. ​(e) Describe the shape of the distribution of the​ pitcher's curveball. ​(f) Describe the shape of the distribution of the​ pitcher's four-seam fastball. Image: (Concept hw 3, question 36)

(a) ​Two-seam fastball (b) ​Two-seam fastball (c) ​Four-seam fastball (d) Outlier(s) at 90 miles per hour (e) The distribution is symmetric. (f) The distribution is skewed right.

Determine the value of each expression below. ​(a)​ 7! ​(b)​ 0! ​(c) 9C4 ​(d) 10C3 ​(e) 9P2 ​(f) 12P4

(a)​ 7!=50405040 ​ ​(b)​ 0!=11 ​ ​(c) 9C4=126126 ​ ​(d) 10C3=120120 ​ ​(e) 9P2=7272 ​(f) 12P4=1188011880 ​

The quality of the orange juice produced by a manufacturer is constantly monitored. There are numerous sensory and chemical components that combine to make the​ best-tasting orange juice. One manufacturer developed a quantitative index of the​ "sweetness" of orange juice.​ (The higher the​ index, the sweeter the orange​ juice.) The manufacturer wondered if there was a relationship between the amount of​ water-soluble pectin​ (in parts per​ million) in the orange juice and the sweetness index. Data were collected on these two variables during 24 production runs in a particular plant. Review the accompanying scatterplot. Which is the value of the correlation coefficient with all outliers​ included? (Try to answer this question without calculating the correlation​ coefficient.) Image: (Concept hw 4, question 18)

+0.48

Data were collected on many different variables of a fast food​ chain's sandwiches several years ago. Two variables were the serving size​ (in ounces) of a sandwich and the number of calories in the sandwich. Review the accompanying scatterplot of serving size versus number of calories. What is the correlation​ coefficient? (Try to figure out the correct answer without calculating the correlation​ coefficient.) Image: (Concept hw 4, question 21)

+0.80

There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser​ (duration) and the time between when that eruption ends and the next eruption begins​ (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. What is the correlation​ coefficient? (Try to figure out the correct answer without calculating the correlation​ coefficient.) Image: (Concept hw 4, question 18)

+0.88

In a study conducted to examine the quality of fish after 7 days in ice​ storage, ten raw fish of the same kind and approximately the same size were caught and prepared for ice storage. The fish were placed in ice storage at different times after being caught. A measure of fish quality was given to each fish after 7 days in ice storage. Review the accompanying sample data and​ scatterplot, where​ "Time" is the number of hours after being caught that the fish was placed in ice storage and​ "Fish Quality" is the measure given to each fish after 7 days in ice storage​ (higher numbers mean better​ quality). What is the correlation​ coefficient? (Try to figure out the correct answer without calculating the correlation​ coefficient.) Image: (Concept hw 4, question 17)

-0.99

A regression was performed on test data for 37 car models to examine the association between the weight​ (thousands of​ pounds) of the car and the fuel efficiency​ (miles per gallon​ (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct way to get the​ test-statistic in this hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339

-4.87/1.339

Suppose two events E and F are disjoint. What is​ P(E and​ F)?

0

If the area under the standard normal curve to the left of z=−1.72 is​ 0.0427, then what is the area under the standard normal curve to the right of z=​1.72?

0.0427

In a study conducted to examine the quality of fish after 7 days in ice​ storage, ten raw fish of the same kind and approximately the same size were caught and prepared for ice storage. The fish were placed in ice storage at different times after being caught. A measure of fish quality was given to each fish after 7 days in ice storage. The sample data are shown​ below, where​ "Time" is the number of hours after being caught that the fish was placed in ice storage and​ "Fish Quality" is the measure given to each fish after 7 days in ice storage​ (higher numbers mean better​ quality). The​ least-squares regression equation is y=8.4425−​0.1495x, where x is​ "Time" and y is predicted​ "Fish Quality." What is the residual for the first​ observation, (0,8.5)? Time 0 0 2 3 5 6 7 9 11 12 Fish Quality 8.5 8.4 8.0 8.1 7.8 7.6 7.3 7.0 6.8 6.7

0.0575

If the area under the standard normal curve between z=−1.46 and z=0 is​ 0.4279, then what is the area under the standard normal curve between z=0 and z=​1.46?

0.4279

Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting​ for, either North High School or South High School. From the results of his​ survey, Eric obtained a​ 95% confidence interval of​ (0.52,0.68) for the proportion of all adults in the city rooting for North High. What proportion of the 150 adults in the survey said they were rooting for North High​ School?

0.60

After constructing any relative frequency​ distribution, what should be the sum of the relative​ frequencies?

1 or​ 100%

Suppose the probability that a randomly selected​ man, aged 55​ - 59, will die of cancer during the course of the year is 300/100,000. How would you find the probability that a man in this age category does NOT die of cancer during the course of the​ year?

1-0.003

The expression zα denotes the​ z-score with an area of​ _______ to its left.

1-α

The methods of statistics follow a process. Place the processes in the correct order.

1. Identify the Research Objective 2. Collect the Data Needed to Answer the Research​ Question(s) 3. Describe the Data 4. Perform Inference

Suppose a​ four-digit alarm code is formed by choosing digits from 0 to​ 9, with repetition allowed. Which of the following expressions would be a correct way to count the number of such​ codes?

10•10•10•10

According to the Empirical​ Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation above the​ mean?

16%

Classified ads in a newspaper offered 21 used cars of the same make and model for sale. A regression analysis was performed with the age of the car​ (in years) as the explanatory variable and asking price​ (in dollars) as the response variable. A​ 95% confidence interval for the slope of the population regression line is ​(−​1439,−​1011). What is the margin of​ error?

214

A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a​ 99% confidence interval of​ (17,25) hours/week. What is the margin of​ error?

4​ hours/week

According to the Empirical​ Rule, 95% of the area under the normal curve is within two standard deviations of the mean. What percent of the area under the normal curve is more than two standard deviations from the​ mean?

5%

Researchers studied the mean egg length​ (in millimeters) for a bird population. After taking a random sample of​ eggs, they obtained a​ 95% confidence interval of​ (45,60). What is the value of the sample​ mean?

52.5 mm

Researchers studied the mean egg length​ (in millimeters) for a bird population. After taking a random sample of​ eggs, they obtained a​ 95% confidence interval of​ (45,60). What is the value of the margin of​ error?

7.5 mm

A Type II Error is made...

A Type II Error is made when​ there's not enough evidence to reject the null​ hypothesis, but the null hypothesis is not true.

Suppose that you have data which indicates that​ 90% of adults in a nearby town have cell phones. Of those who have cell​ phones, 30% use Carrier​ A, 30% use Carrier​ B, 10% use Carrier​ C, 20% use Carrier​ D, 5% use Carrier​ E, and​ 5% use other carriers. Would a bar graph or pie chart be better if the goal is to compare Carrier B and Carrier​ C? Explain.

A bar graph would be better since you are trying to compare two​ parts, not a part to the whole. The angles might be difficult to judge on a pie​ chart, making it hard to directly compare two sectors.

Which of the following is a correct explanation of what a confidence interval​ is?

A confidence interval is a range of values used to estimate the true value of a population parameter. The confidence level is the probability the interval actually contains the population​ parameter, assuming that the estimation process is repeated a large number of times.

Which distribution shape​ (skewed left, skewed​ right, or​ symmetric) is most likely to result in the mean being substantially smaller than the​ median?

A distribution that is skewed left will likely have a mean that is smaller than the median since the extreme values in the tail tend to pull the mean to the left.

Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression​ analysis?

A linear regression analysis relies on a straight line being fit between the points on a scatterplot.

In​ regression, what is the difference between an observed value of the response variable and its predicted value​ called?

A residual

It is hypothesized that​ 50% of Americans attend church regularly. Which of the following would be an example of making a Type I​ Error?

A study was conducted that had evidence to reject the null hypothesis. In​ reality, half of Americans actually do attend church regularly.

Which of the following is not a valid explanation of the Law of Large​ Numbers?

A. The relative frequency approximations for an event tend to get better with more observations. B. If one looks at the proportion of times an event has occurred over a long period of time​ (or over a large number of​ trials), one can be more certain of the likelihood of its occurrence. C. The more times an experiment is​ repeated, the closer the relative frequency of an event tends to get to the actual probability of the event. D. These statements are all valid. ANSWER: D

Explain why Social Security Number is considered a qualitative variable even though it contains numbers.

Addition and subtraction of Social Security Numbers does not provide meaningful results. This makes it qualitative even though it is numeric.

A student at a large university was interested in how students and faculty felt about the behavior of calling an instructor by their first name. She randomly selected 100 students and 75 faculty and asked each to rate the behavior of calling an instructor by their first name on a scale from 1 to​ 5, where 1=totally inappropriate and 5=totally appropriate. Call each​ individual's rating of this behavior the perception score. The student wanted to construct a​ 95% confidence interval for the difference in the average perception score between students and faculty at this large university. Which condition for using the​ two-sample t-methods for inference should this student be concerned​ about?

All conditions are most likely met in this problem.​ Therefore, this student can make inferences to the populations of interest by using the​ two-sample t-methods to construct the confidence interval.

A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house​ (in thousands of​ dollars) and the distance from the burning house to the nearest fire station​ (in miles) was recorded. Part of the output from a simple linear regression analysis is provided. Assume all conditions are met for simple linear regression inference. What is the response​ variable? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S​ = 9.8101 ​ R-Sq =​ 26.0%

Amount of damage to the house​ (in thousands of​ dollars)

The Empirical Rule tells the approximate percentage of the data which falls into certain ranges. To which distributions does the Empirical Rule​ apply?

Any normal distribution

State the condition required to use the Empirical Rule to check for unusual observations in a binomial experiment.

As a rule of​ thumb, if X is binomially​ distributed, the Empirical Rule can be used when np(1−p)≥10

In sampling without​ replacement, the assumption of independence required for a binomial experiment is violated. Under what circumstances can we sample without replacement and still use the binomial probability formula to approximate​ probabilities?

As a rule of​ thumb, if the sample size is less than 5​% of the population​ size, the trials can be considered nearly independent.

Explain the Law of Large Numbers. How does this law apply to gambling​ casinos?

As the number of repetitions of a probability experiment​ increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome. This applies to casinos because they are able to make a profit in the long run because they have a small statistical advantage in each game.

A​ p-value is the probability​ _____________.

A​ p-value is the probability of observing the actual​ result, a sample​ mean, for​ example, or something more unusual just by chance if the null hypothesis is true.

What is an advantage to using a​ stem-and-leaf plot instead of a histogram to display​ data?

A​ stem-and-leaf plot allows for retrieval of the original data from the plot while the histogram does not.

Which probability method requires that an experiment have equally likely​ outcomes?

Classical

This occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable not accounted for in the study.

Confounding

Suppose every student in a class is surveyed and it is reported that​ 75% of the class plans to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Descriptive​ statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.

A researcher randomly assigns the individuals in a study to groups, intentionally manipulates the value of the explanatory variable and controls other explanatory variables at fixed values, and then records the value of the response variable for each individual.

Designed Experiment

If a researcher wants to claim causation between an explanatory variable and a response​ variable, which of the following should they​ use?

Designed experiment

The Addition Rule​ P(E or ​F)=​P(E)+​P(F) applies only to which type of​ events?

Disjoint

A simple random sample of 150 adults is obtained and each​ person's red blood cell count​ (in cells per​ microliter) is measured. The sample mean is 4.63. The population standard deviation for red blood cell counts is 0.54. Which of the following is true regarding the distribution of sample​ means?

Even though the distribution of the population data is not​ given, the distribution of sample means will be approximately normal because the sample size is large enough according to the Central Limit Theorem.

There were 100 random samples of 100 individuals taken from a population which is known to have a moderately skewed distribution with some mean and a known standard deviation. A​ 95% confidence interval for the population mean was constructed for each of the 100 random samples using the​ one-sample z-methods. Which of the following statements is​ true?

Even though the population data are​ skewed, the distribution of sample means will be approximately normal because of the large sample size.​ Therefore, approximately​ 95% of the 100 confidence intervals constructed using the​ one-sample z-methods will capture the true population mean.

Nurses wondered if birth weights of babies are going up. They knew that the average birth weight of a baby last year was 7.6 pounds. A random sample of 15 weights of babies at the hospital where the nurses work gave an average birth weight of 7.9 pounds. Nurses felt that the birth weights this year were normally distributed. Which of the following is true about the distribution of sample​ means?

Even though the sample size is less than​ 30, the distribution of sample means will be normal because the population data follow a normal distribution.

There were 100 random samples of 10 individuals taken from a population which is known to have a normal distribution with some mean and a known standard deviation. A​ 95% confidence interval for the population mean was constructed for each of the 100 random samples using the​ one-sample z-methods. Which of the following statements is​ true?

Even though the sample size is​ small, the distribution of sample means will be​ normal, since the population data follow a normal distribution.​ Therefore, approximately​ 95% of the 100 confidence intervals constructed using the​ one-sample z-methods will capture the true population mean.

Explain why the mean should not be found for a sample of zip codes. Which measure of center should be used​ instead?

Even though they are numeric​ data, zip codes are qualitative since they do not measure or count anything. The mean cannot be found since adding zip codes would be meaningless. For qualitative​ data, the mode is the only measure of center that can be found.

What must be true for a sample to be considered a simple random​ sample?

Every member​ (or sample) must have the same chance of being selected as every other member​ (or sample of the same​ size).

In​ regression, what is predicting outside the range of the​ x-values from the sample data​ called?

Extrapolation

Researchers conducted a study and obtained a​ p-value of 0.75. Based on this​ p-value, what conclusion should the researchers​ draw?

Fail to reject the null hypothesis but do not accept the null hypothesis as true either.

A confidence interval indicates how confident we are with the hypothesized value for the population mean.

False

There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser​ (duration) and the time between when that eruption ends and the next eruption begins​ (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The​ least-squares regression equation is y=33.967+​11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. For a duration of 4​ minutes, y=75.4 minutes. This means that a visitor will have to wait exactly 75.4 minutes after the current eruption ends before the next eruption begins. Is this statement true or​ false? Image: (Concept hw 4, question 27)

False

Steve calculated a correlation coefficient between gas price and miles driven as −0.15. Steve said there was a strong negative association between gas price and miles driven. Is this statement true or​ false?

False.

Both of the graphs represent normal distributions with a standard deviation of σ=2. Determine which of the two normal distributions has a mean of μ=8 and which has a mean of μ=14. Explain how you know which is which. Image: (Concept hw 7, question 25)

Graph A has a mean of μ=8. Graph B has a mean of μ=14. A normal curve will be centered over its mean.

Both of the graphs represent normal distributions with a mean of μ=10. Determine which of the two normal distributions has a standard deviation of σ=2 and which has a standard deviation of σ=3. Explain how you know which is which. (Concept hw 7, question 24)

Graph A has a standard deviation of σ=2. Graph B has a standard deviation of σ=3. Since Graph B is a wider​ graph, it has a greater spread and a larger standard deviation.

Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting​ for, either North High School or South High School. Of the surveyed​ adults, 96 said they were rooting for North High while the rest said they were rooting for South High. Eric wants to determine if this is evidence that more than half the adults in this city will root for North High School. Which of the following is the correct null​ hypothesis?

H0: p=​0.5, where p=the proportion of all adults in this city rooting for North High School

Alex hypothesized​ that, on​ average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is the null hypothesis Alex will​ test?

H0: μx=2 hours per week per credit

A regression was performed on test data for 37 car models to examine the association between the weight​ (thousands of​ pounds) of the car and the fuel efficiency​ (miles per gallon​ (MPG)). A partial output from the simple linear regression analysis is given below. Determine the null hypothesis if a hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339

H0​: weight of cars is not a significant predictor of fuel efficiency of​cars, meaning the slope of the regression line in the population is 0.

Alex hypothesized​ that, on​ average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is​ Alex's alternative​ hypothesis?

H1: μ<2 hours per week per credit

A commuter has a choice of two routes for his morning drive to work. In an effort to determine the best​ route, he collects data on his drive time for each route. If he is interested in a route with a predictable drive​ time, which route should he choose and​ why? Image: Concept hw 3, question 33)

He should choose the Rural Roads route. The smaller IQR indicates a smaller spread and more consistent times for that route.

Explain why​ z-scores would be an appropriate way to compare the heights of the​ world's tallest man and tallest woman.

Height distributions for men and women have different centers and​ spreads, making it difficult to compare male and female heights directly.

Benjamin performed a​ two-tailed one-sample​ t-test and obtained a p-value=1. What conclusion should he​ make?

His sample mean must have been exactly equal to his hypothesized value for the population mean.

Suppose that you have data which indicates that​ 90% of adults in a nearby town have cell phones. Of those who have cell​ phones, 30% use Carrier​ A, 30% use Carrier​ B, 10% use Carrier​ C, 20% use Carrier​ D, 5% use Carrier​ E, and​ 5% use other carriers. Which of the following is not a reasonable graph to display this​ information?

Histogram

An association of Realtors reports​ state-by-state median​ existing-home prices for each quarter. Why do you suppose they use the median instead of the​ mean? What might be the disadvantage of reporting the​ mean?

Home prices are probably skewed to the right and not symmetric. This makes the median a better representation of the center than the mean which would be influenced by the extremely high priced homes. Reporting the mean would give the impression that the​ "typical" home price is higher than it is.

Researchers wondered if brain size has an effect on a​ person's IQ. From a sample of 20​ individuals, the equation of the​ least-squares regression line is y=71.8+​0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the​ y-intercept?

IQ is predicted to be 71.8 for a brain size of 0 cubic centimeters.

Researchers wondered if brain size has an effect on a​ person's IQ. From a sample of 20​ individuals, the equation of the​ least-squares regression line is y=71.8+​0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the​ slope?

IQ is predicted to increase by 0.0286 for every 1 cubic centimeter increase in brain size.

Which of the following would increase the width of a confidence interval for a population​ mean?

Increase the level of confidence

Suppose every student in a class is surveyed and it is found that​ 75% of the class plans to take another math class. It is reported that​ 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential​ statistics? Explain.

Inferential​ statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.

Which measure of spread is considered​ resistant?

Interquartile range

In a typical​ boxplot, the length of the box indicates which measure of​ spread?

Interquartile range​ (IQR)

A freshman in college wanted to determine if the​ "Freshman 15" is true. That​ is, this student wanted to determine if freshmen in college gain more than 15 pounds during their freshman year. She randomly selected 50 freshmen during the first week of school at the beginning of the year and weighed them. During finals week of the last term of the​ year, she weighed the same 50 students. She recorded the weight change of each-a positive value indicated a weight gain while a negative value indicated a weight loss during the year. Based on her​ sample, a​ 95% confidence interval for the average weight change of freshmen during their freshman year is​ (8.9,12.1) lbs. What conclusion can be made based on this confidence​ interval?

It appears that the​ "Freshman 15" is not true. That​ is, it appears that freshman do not gain more than 15 pounds during their freshman​ year, on​ average, since the upper bound is less than 15.

Regression was performed on test data for 37 car models to examine the association between the weight​ (thousands of​ pounds) of the car and the fuel efficiency​ (miles per gallon​ (MPG)). A​ 95% confidence interval for the slope of the regression line is ​(−​7.6,−​2.2). Interpret this confidence interval.

It can be said with​ 95% confidence that fuel efficiency of a car will decrease between 2.2 and 7.6 miles per gallon for every 1000 pound increase in the weight of the car.

The simple linear regression model is yi=β0+β1xi+εi. For what does yi ​ stand?

It is the observed value of the response variable for the ith observation in the population.

The simple linear regression model is yi=β0+β1xi+εi. For what does εi​ stand?

It is the residual of the ith observation in the population.

The simple linear regression model is yi=β0+β1xi+εi. For what does β1​ stand?

It is the slope of the population regression line.

Data were collected on many different variables of a fast food​ chain's sandwiches several years ago. Two variables were the serving size​ (in ounces) of a sandwich and the number of calories in the sandwich. Review the accompanying scatterplot of serving size versus number of calories. There are a couple of potential outliers—the sandwich with a serving size of around 6.75 ounces and about 260​ calories, and the sandwich with a serving size of around 8.4 ounces and about 720 calories. If these two observations were​ removed, how would the correlation coefficient​ change? Image: (Concept hw 4, question 22)

It would increase since the removal of these data points makes the relationship stronger.

If an observation has a residual of​ 0, which of the following statements is​ true?

Its predicted value is the same as its observed value.

This is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, these are typically related to explanatory variables considered in the study.

Lurking Variable

When analyzing two quantitative​ variables, what is the first thing that should be​ done?

Make a scatterplot.

Data were collected on many different variables of a fast food​ chain's sandwiches several years ago. Two variables were the serving size​ (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in​ mind, which variable would go on the​ y-axis in the​ scatterplot?

Number of calories goes on the​ y-axis, since it is the response variable.

A researcher measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, the researcher observes the behavior of individuals in the study and records the values of the explanatory and response variables.

Observational Study

In a normal​ distribution, approximately​ 68% of the area under the normal curve is within how many standard​ deviation(s) of the​ mean?

One

Classified ads in a newspaper offered 21 used cars of the same make and model for sale. A regression analysis was performed with the age of the car​ (in years) as the explanatory variable and asking price​ (in dollars) as the response variable. A​ 95% confidence interval for the slope of the population regression line is ​(−​1439,−​1011). What is the correct interpretation of this confidence​ interval?

One can be​ 95% confident that the asking price of used cars of the same make and model will decrease between​ $1011 and​ $1439 for every one year increase in the age of the car.

The entire group of individuals to be studied is called a ______.

Popultion

Allow classification of individuals based on some attribute or characteristic

Qualitative Variable

This provides numerical measures of individuals. The values of these can be added or subtracted, and provide meaningful results.

Quantitative Variable

Suppose you want to know if more technical service calls are made to homes with cable television or with satellite dish television. Should you use frequencies or relative frequencies to make the​ comparison? Why?

Relative frequencies should be used since there is likely a difference in the number of users of cable and satellite television. If you make comparisons using​ frequencies, the results can be very misleading for different population sizes.

The variable of interest in the outcome of a study _______.

Response Variable

What​ factor(s) affect the accuracy of the sample mean as an estimate of the population​ mean?

Sample size and variability

Jan performed a study and obtained a​ p-value of 1.24. What conclusion should Jan​ make?

She made an error since it is not possible to get a​ p-value of 1.24.

Which measure of center must be equal to an actual data​ value? Explain why.

Since the mode is the most frequent observation that occurs in the data​ set, it must be an actual value from the data set.

This histogram shows the heights of 20 students in a statistics class. Explain why it is not appropriate to find summary statistics for this distribution.

Since there appear to be two​ modes, this data probably represents men and women and should be split into those two groups before finding any summary statistics. (two highest points)

A researcher wants to assess the effects of taking prenatal vitamins on the health of​ newborns, using the newborn weight as the response variable. Explain why it might be inappropriate to use a designed experiment to address this research objective.

Since there is a perceived benefit to taking prenatal​ vitamins, there would be ethical issues in intentionally denying them to some pregnant women.

Which of the following statements is true concerning standardizing data into​ z-scores?

Standardizing data into​ z-scores does not change the shape of a distribution of a variable.

This is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It is also about providing a measure of confidence in any conclusions.

Statistics

Ronnie randomly sampled 80 college​ students, 50 living a dorm and the other 30 living in an apartment. She asked each how much they spent on food and beverages​ (non-alcoholic) within the last 7 days. A​ 95% confidence interval for the difference in the mean amount spent on food and drinks over the past 7 days between students living in a dorm and students living in an apartment ​(dorm−​apartment) is (−​$25.80,−​$11.20). Which of the following is true regarding the difference in sample​ means?

Students living in an apartment spent​ $18.50 more, on​ average, on food and drinks over the past 7 days than students living in a dorm.

Suppose that a binomial random variable X is counting the number of patients with cancer at a particular hospital. How will​ "success" be defined in this​ situation?

Success would be defined as selecting a patient at the hospital who has cancer.

Days before a presidential​ election, an article based on a nationwide random sample of registered voters reported the following​ statistic, "52% (±​3%)of registered voters will vote for Robert​ Smith." What is the "±​3%" called?

The "±​3%" is called the margin of error.

Identify the properties of​ Student's t-distribution.

The area under the curve is​ 1; half the area is to the right of 0 and half the area is to the left of 0. As the sample size n​ increases, the distribution​ (and the density​ curve) of the​ t-distribution becomes more like the standard normal distribution. As t gets extremely​ large, the graph​ approaches, but never​ equals, zero.​ Similarly, as t gets extremely small​ (negative), the graph​ approaches, but never​ equals, zero. It is symmetric around t=0.

If a professor adds 10 points to each​ student's final exam​ score, how will it affect the distribution of final exam​ scores?

The center will​ change, but the shape and spread will remain the same.

What is wrong with the following class limits for organizing weight data for a sample of 200 adult men in the United​ States? ​140-150 pounds ​150-160 pounds ​160-170 pounds ​170-180 pounds ​180-190 pounds ​190-200 pounds ​200-210 pounds ​210-220 pounds ​220-230 pounds

The classes are overlapping.

Suppose an experiment consists of rolling a fair die ten times and recording the number of sevens obtained. If event E is defined as getting at least one​ 7, how would you describe the complement of​ E?

The complement of event E is the event that no sevens are obtained.

April calculated a correlation coefficient between sex and GPA as −0.25. She said there is a weak correlation between a​ person's sex and their GPA. Which of the following is an appropriate comment about​ April's statement?

The correlation coefficient does not make sense to describe the relationship between a categorical and quantitative variable.

What is the definition of the correlation​ coefficient?

The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.

A​ chi-square test is being performed to test the null hypothesis that the distribution of eye color is the same for both males and females. In order for conclusions from the​ chi-square test to be valid to all males and​ females, which of the following does not have to be​ true?

The distribution of sample means is approximately normal.

Birth weights in the United States are normally distributed with σx=500 grams. A random sample of 15 babies was taken. The average birth weight of these 15 babies was 3450 grams. Which of the following is true regarding the distribution of sample​ means?

The distribution of sample means will be normal since birth weights of all babies are normally distributed.

Which of the statements below is true concerning bar​ graphs?

The height of each bar represents the​ category's frequency or relative frequency.

A high correlation coefficient indicates that the relationship between the two quantitative variables must be linear.

The statement is false.

A​ 95% confidence interval for μ1−μ2 using the​ two-sample t-methods with 40 degrees of freedom is​ (65,71). The margin of error is​ _____________.

The margin of error is 3.

Suppose there are ten​ five- and​ six-year-olds attending a birthday party. When a​ 30-year-old mother walks into the room with an infant in her​ arms, what happens to the mean age in the​ room? What happens to the standard deviation of ages in the​ room?

The mean and standard deviation will both change.

What is the mean of a probability​ distribution?

The mean is the expected value of the random variable.

Which of the following is a property of the standard normal​ curve, but not necessarily a property of every normal​ curve?

The mean is zero and the standard deviation is one.

Identify which statement about the mean of a discrete random variable is not true or state that they are all true.

The mean must be a possible value of the random variable.

Suppose a normal model has a standard deviation of σ=​10, and​ 40% of the values are below 75. Which of the following must be true about the​ mean? You should be able to answer without doing any calculations.

The mean must be greater than 75.

Suppose a normal model has a standard deviation of σ=​10, and​ 60% of the values are below 75. Which of the following must be true about the​ mean? You should be able to answer without doing any calculations.

The mean must be less than 75.

The following​ stem-and-leaf plot shows the daily high temperature in a town on April 1st for​ twenty-four random years. Would you expect the mean to be higher or lower than the​ median? Explain. 5 1 1 2 2 3 4 6 6 6 7 8 6 0 0 1 2 4 4 9 7 2 3 6 8 1 2 9 0

The mean should be higher than the​ median, since the distribution is skewed right.

If a professor adds 10 points to each​ student's final exam​ score, how will it affect the class mean on the final​ exam?

The mean will increase by 10 points.

​Suppose, on the warmest day of the​ month, the daily high temperature in a city is accidentally recorded as 700 instead of 70 degrees Fahrenheit. Compare the effect this mistake will have on the mean monthly high temperature to the effect on the median monthly high temperature.

The mean will increase​ significantly, but the median will not change as a result of the mistake.

A sample of thirty users of a popular social networking site yielded the histogram on the right for the number of friends. Which measure of central tendency better describes the​ "center" of the​ distribution? Image: (the higher points of the graph are on the left, lowest point on the right meaning that it is skewed to the right)

The median is a better of measure of the center of the data since the distribution is skewed to the right.

How can you tell from a boxplot if the distribution is skewed​ right?

The median is to the left of the center of the​ box, and the right whisker is substantially longer than the left whisker.

Allie calculated a correlation coefficient of −0.5. She made a mistake in her calculation since the correlation coefficient cannot be negative.

The statement is false.

Describe the null and alternative hypotheses.

The null hypothesis typically contains an equality while the alternative hypothesis will contain an inequality.

A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house​ (in thousands of​ dollars) and the distance from the burning house to the nearest fire station​ (in miles) was recorded. Part of the output from a simple linear regression analysis is provided. Assume all conditions are met for simple linear regression inference. In​ words, what is the correct alternative hypothesis to determine if the distance the burning house is to the nearest fire station is a significant predictor of damage to a burning​ house? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S​ = 9.8101 ​ R-Sq =​ 26.0%

The number of miles the burning house is from the nearest fire station helps explain the amount of damage to the burning house.

Suppose you want to count the number of​ four-letter passwords that can be formed using the letters in the word NUMBER. The four letters in the password must all be different. Which of the following expressions is true concerning the number of such​ passwords?

The number of passwords can be counted by using 6•5•4•3 or 6P4.

Suppose 5 objects are to be chosen from 10 distinct objects with no repetition allowed. Which will be​ larger, the number of permutations or the number of​ combinations?

The number of permutations will be larger since there are more ways to choose if different orderings are counted separately.

Junior wondered how students felt about a proposed increase in student fees. He orally surveyed a group of students waiting in line for food at the food court. He asked their year in school​ (freshman, sophomore,​ junior, senior,​ graduate) and whether or not each supported the fee increase. Why should Junior question the results from a​ chi-square test?

The opinion of the group of students surveyed may not be representative of the opinion of all students at that​ university, and the opinions in the sample may not be independent.

What does it mean to say that the trials in a binomial experiment are independent of each​ other?

The outcome of one trial does not affect the outcomes of the other trials.

Suppose x=​60, H0​: μx=​50, HA​: μx>​50, and the​ p-value from a​ one-sample test is 0.04. What does this​ p-value mean?

The probability of getting a sample mean of 60 or more if the true population mean is 50 is 0.04.

State an advantage and a disadvantage of using the range instead of the variance as a measure of dispersion in sample data.

The range is easier to​ calculate, but it is too affected by extreme values in the data set.

When looking at a scatterplot of two quantitative​ variables, what do we typically look​ for?

The relationship between the two variables and if there are any deviations from the pattern​ (outliers or clusters of​ points, for​ example).

Mark performed a​ two-sample z-test for proportions to test the hypothesis that there was no difference in the proportion who support increasing student fees between male and female students at a particular university. Mark obtained a​ z-statistic of 0. Based on this​ information, which of the following is always​ true?

The sample proportions will be the same for both male and female students at this university.

A community college school board is negotiating a new contract with the college faculty. The distribution of faculty salaries is skewed right by several faculty members who make over​ $100,000 per year. If the school board wants to give the community the impression that the faculty are already​ overpaid, should they advertise the mean or median of the faculty​ salaries?

The school board should use the mean to make their argument. The mean will be higher than the median since it will be influenced by the few high salaries.

There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser​ (duration) and the time between when that eruption ends and the next eruption begins​ (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The​ least-squares regression equation is y=33.967+​11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. In this​ equation, what is​ 11.358? Image: (Concept hw 4, question 28)

The slope of the​ least-squares regression line

A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. From a group of adult​ volunteers, researchers randomly assigned half to receive an injection that contained the drug believed to reduce the risk of getting the flu and the other half to receive an injection containing no active ingredient​ (i.e. sugar​ water). A hypothesis test was performed and a​ p-value of 0.0002 was obtained. Which of the following statements is​ true?

The small​ p-value indicates strong evidence to reject the null hypothesis. Because an experiment was​ performed, it can be concluded that the reduction in the risk of getting the flu was caused by the flu shot.

If all the data values in a set are​ identical, what can you conclude about the standard​ deviation?

The standard deviation is zero.

Suppose a normal model has a mean of μ=​100, and​ 95% of the values are between 90 and 110. Which of the following must be true about the standard​ deviation?

The standard deviation must be approximately equal to 5.

Suppose a normal model has a mean of μ=​100, and​ 50% of the values are between 90 and 110. Which of the following must be true about the standard​ deviation?

The standard deviation must be greater than 10.

Suppose a normal model has a mean of μ=​100, and​ 80% of the values are between 90 and 110. Which of the following must be true about the standard​ deviation?

The standard deviation must be less than 10.

Although​ rare, it is possible to get a​ p-value from a​ two-sided test greater than 1.

The statement is false.

A correlation coefficient close to 1 is evidence of a​ cause-and-effect relationship between the two variables.

The statement is false.

Determine if the following statement is true or false. Benjamin was investigating the relationship between outside temperature on a given day and number of hours spent outside that day. After sampling 25 people on 25 different​ days, he obtained the displayed scatterplot. He should use the correlation coefficient to describe the strength of the relationship between temperature and hours spent outside. Image: (Concept hw 4, question 2)

The statement is false.

Heather was investigating the relationship between outside temperature and type of activity people were engaged in​ (indoor versus​ outdoor). She can use the correlation coefficient to describe the strength of this relationship as long as the relationship is linear.

The statement is false.

Researchers conducted a study and obtained a​ p-value of 0.30. Because the​ p-value is quite​ high, there is evidence to accept the null hypothesis.

The statement is false.

The population will be normally distributed if the sample size is 30 or more.

The statement is false.

Identify the requirements for a discrete probability distribution.

The sum of the probabilities must equal one. Each probability must be between zero and one inclusive.

Which of the following is not a criterion for the binomial​ distribution?

The trials must be dependent.

What is wrong with the following definition of the correlation​ coefficient? The correlation coefficient measures the strength and direction of the linear relationship between two variables.

The two variables must be quantitative.

A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a​ 99% confidence interval of​ (17,25) hours/week. Which of the following would be true if the level of confidence was lowered to​ 95%?

The width of the confidence interval would be smaller.

A research organization wanted to estimate the average number of hours a college student sleeps per night during the school year. After randomly sampling 150 college​ students, the research organization determined the following​ 95% confidence​ interval: (7.1​ hours/night, 7.5​ hours/night). What would happen to the width of the confidence interval if the level of confidence increased​ (assuming everything else remained the​ same)?

The width of the confidence interval would increase.

Explain why it is misleading to use the term​ "average" to describe your typical bowling score.

The word​ "average" is ambiguous and can refer to any measure of center. It is better to use the specific measure of center you intend​ (mean, median, or​ mode).

If​ someone's gross annual income has a​ z-score of positive​ 2, what can be​ concluded?

Their income is 2 standard deviations above the mean income.

Gina calculated a correlation coefficient between hours studied and grade point average as​ +0.75. Which of the following is a correct statement based on this correlation​ coefficient?

There is a fairly strong positive relationship between hours studied and grade point​ average, indicating that grade point averages tend to be higher for students who study more.

Which of the following statements best describes this​ scatterplot? Image: (Concept hw 4, question 11)

There is a​ negative, moderately strong relationship between X and Y with one outlier.

Data was collected on the heights of boys at 12 and 24 months of age. The data is summarized in the following boxplots. Is there more variation in the​ boys' heights at 12 months or 24​ months? Explain how you can tell from the boxplots. Image: (concept hw 3, question 32)

There is more variation in height at 24 months. The box length is​ longer, indicating a larger interquartile range and greater spread in the data.

What does a correlation coefficient of 0​ indicate?

There is no linear relationship between the two quantitative variables.

A regression was performed on test data for 37 car models to examine the association between the weight​ (thousands of​ pounds) of the car and the fuel efficiency​ (miles per gallon​ (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct conclusion based on the results of the hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339

There is strong evidence to indicate weight of cars is a significant predictor of fuel efficiency since the​ p-value from the hypothesis test is quite small ​(<​0.001).

What does the​ 95% represent in a​ 95% confidence​ interval?

The​ 95% represents the proportion of intervals that would contain the parameter​ (for example, the population mean or population​ proportion) if a large number of different samples is obtained.

Doug wondered how students felt about a proposed increase in student fees. He randomly sampled 100​ freshmen, 100​ sophomores, 100​ juniors, and 100 seniors at his university. He asked each whether or not they supported the fee increase. Assuming all conditions are​ satisfied, which test should Doug use to test the hypothesis that the distribution of support for the fees is the same for all four​ classes?

The​ chi-square test of homogeneity

Brett is a huge sports fan. He wondered if there was a relationship between​ someone's favorite sport and where they lived. He randomly sampled 500 American sports fans and asked each what their favorite sport was​ (football, baseball,​ basketball, hockey, or​ other) and what part of the country they lived in​ (Northeast, Southeast,​ Midwest, Rocky​ Mountains, Pacific​ Coast). Assuming all conditions are​ satisfied, which of the following tests should Brett use to test his​ hypothesis?

The​ chi-square test of independence

Brett is a huge sports fan. He hypothesized half of sports fans liked football the​ best, one-quarter liked baseball the​ best, 15% liked basketball the​ best, and​ 5% liked hockey the​ best, and the rest liked some other sport the best. He surveyed 100 sports fans and asked what sport they liked the best. Assuming all conditions are​ satisfied, which of the following tests should Brett use to test his​ hypothesis?

The​ goodness-of-fit chi-square test

Suppose increasing the sample size will not change the sample mean or the standard deviation. What will happen to the​ p-value by increasing the sample​ size?

The​ p-value will decrease.

If a​ z-score is equal to​ zero, which of the follow must be​ true?

The​ x-value must be equal to the mean of the distribution.

If the area to the left of a​ z-score is equal to​ 0.5, what must be​ true?

The​ z-score must be equal to zero.

If the area to the left of a​ z-score is less than​ 0.5, what must be​ true?

The​ z-score must be negative.

Suppose you want to calculate the​ z-score for your height. How will the​ z-scores compare if you use your height in inches verses​ centimeters?

The​ z-scores will be the same regardless of the unit used for your height because​ z-scores are unitless.

Determine if the following statement is true or false. If it is​ false, explain why. A​ p-value is the number of standard deviations an observation is from the mean.

This statement is false. The definition given is for the​ z-statistic. A​ p-value is the probability of observing a value of a statistic or a value that is more unusual just by chance if the null hypothesis is true.

Determine if the following statement is true or false. If it is​ false, explain why. A​ p-value is the probability that the null hypothesis is true.

This statement is false. The null hypothesis will either be true or it​ won't be true​ - there is no probability associated with this fact. A​ p-value is the probability of observing a sample mean​ (for example) that we did or something more unusual just by chance if the null hypothesis is true.

Determine if the following statement is true or false. If it is​ false, explain why. A​ p-value is the probability of accepting the null hypothesis.

This statement is false. We never accept the null hypothesis no matter what the​ p-value is. A​ p-value is the probability of observing a sample mean​ (for example) that we did or something more unusual just by chance if the null hypothesis is true.

Explain how to find the mean of a discrete random variable.

To find the mean of a random​ variable, multiply each value of the random variable by its probability and then add those products.

In​ regression, a residual can be negative. Is this statement true or​ false?

True

True or​ false? A histogram and a relative frequency​ histogram, constructed from the same​ data, always have the same basic shape.

True. A relative frequency histogram will have a different scale on the​ y-axis but the same shape as a regular histogram.

In a normal​ distribution, approximately​ 95% of the area under the normal curve is within how many standard​ deviation(s) of the​ mean?

Two

A research organization keeps track of what citizens think is the most important problem facing the country today. They randomly sampled a number of people in 2003 and again in 2009 using a different random sample of people in 2009 than in 2003 and asked them to choose the most important problem facing the country today from the following​ choices, war,​ economy, health​ care, or other. Which of the following is the correct test to use to determine if the distribution of​ "problem facing this country​ today" is different between the two different​ years?

Use a​ chi-square test of homogeneity.

The characteristics of the individuals within the population

Variable

Many drivers of cars that can run on regular gas actually buy premium in the belief that they will get better gas mileage. To test that​ belief, 10 cars in a company fleet were used. All the cars run on regular gas. Each car was filled first with either regular or premium​ gasoline, decided by a coin toss. The mileage for that tank of gas was recorded. Then the car was filled with the other type of gasoline and the mileage for that tank of gas was recorded. The difference in gas mileage between the two types of gasoline ​(premium−​regular) for the 10 cars was recorded. A​ 95% confidence interval was constructed using the paired​ t-methods: (0.5,3.5) mpg. Which of the following is the correct interpretation of this confidence​ interval?

We are​ 95% confident that average increase in gas mileage when using premium rather than regular gas is between 0.5 and 3.5 miles per gallon.

Ronnie randomly sampled 80 college​ students, 50 living a dorm and the other 30 living in an apartment. She asked each how much they spent on food and beverages​ (non-alcoholic) within the last 7 days. A​ 95% confidence interval for the difference in the mean amount spent on food and drinks over the past 7 days between students living in a dorm and students living in an apartment ​(dorm−​apartment) is ​(−​$25.80,−​$11.20). Which of the following is a correct interpretation of this confidence​ interval?

We are​ 95% confident that college students living in dorms spent between​ $11.20 and​ $25.80 less on food and drinks over the past 7​ days, on​ average, than college students living in apartments.

Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting​ for, either North High School or South High School. From the results of his​ survey, Eric obtained the following​ 95% confidence interval for the proportion of all adults in the city rooting for North​ High, (0.52,0.68). Interpret this confidence interval.

We are​ 95% sure that between​ 52% and​ 68% of all adults in this city will root for North High School.

A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a​ 99% confidence interval of​ (17.3,22.5) hours/week. In the context of the​ problem, which of the following interpretations is​ correct?

We are​ 99% sure that the average amount of time spent studying among graduate students at this​ student's school is between 17.3 and 22.5 hours per week.

Suppose that in a certain​ community, the probability of a randomly selected individual having red hair is 0.08 and the probability of a randomly selected individual being​ left-handed is 0.15. What additional information would be needed to find the probability of randomly selecting an individual in this community who has red hair or is​ left-handed?

We would need to know the percentage of individuals in the community who have red hair and are​ left-handed.

In a​ chi-square test, when would the null hypothesis be​ true?

When all observed counts are the same as their expected counts

When will a​ chi-square statistic be​ 0?

When all observed counts are the same as their expected counts

Identify when the interquartile range is better than the standard deviation as a measure of dispersion and explain its advantage.

When the distribution is skewed left or right or contains some extreme​ observations, then the interquartile range is preferred since it is resistant.

When are conclusions said to be​ "statistically significant"?

When the​ p-value is less than a given significance level

Tammie wondered how her friends felt about their cell phone service. She randomly selected 10 of her friends who used company A and another 10 of her friends who used company B and asked if they felt their service was​ excellent, good,​ fair, or poor. Why should Tammie not use the​ chi-square test?

With such small sample​ sizes, there is no way for all the expected counts to be at least 5.

A simple random sample of 1500 birth weights was taken from all birth weights last year. The average birth weight from this sample was 3433 grams. Researchers knew that the standard deviation of all birth weights was 495 grams. Assuming all birth weights in the sample are independent of each​ other, are the other conditions satisfied to use the​ one-sample z-methods to construct a confidence​ interval?

Yes. The sample is representative of the population of all birth weights since a random sample was taken and the distribution of sample means will be approximately normal since the sample size is more than 30.

Suppose a student earns a 75 on his statistics​ exam, and his grade has a​ z-score of 1.5. Since the class did not perform well on the​ exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the​ student's z-score?

Your​ z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.

Rejecting the null hypothesis when the null hypothesis is true is called​ _____________.

a Type I Error.

Typically, the direction (>​,<​, or ≠​) used in the​ _______ hypothesis is determined from the question of interest.

alternative

Elmo likes music. He wondered if listening to music while studying will improve scores on an exam. Fifty students who were to take the midterm in a week agreed to be part of a study. Half were randomly assigned to listen to classical music while studying for the exam. The other half were told not to listen to any music while studying for the exam. A hypothesis test is to be performed to determine if the average scores of those listening to music while studying for the exam were higher than those who did not listen to any music while studying for the exam. Which of the following hypothesis tests should be​ used?

a​ two-sample t-test

When a statistic consistently either underestimates or overestimates a population​ parameter, it is called​ _____________.

biased.

Before using the normal model to represent a data​ set, first check that the shape of the​ data's distribution is what​ shape?

both symmetric and unimodal

A​ two-sided test is performed when we are interested in deviations​ _____________ from the hypothesized value.

either greater than or less than

​A(n) _______ is any collection of outcomes from a probability experiment.

event

In a television​ advertisement, a company called​ "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this​ claim, an independent​ company, called​ "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one​ month, each participant was weighed again. The percent of weight lost was recorded for each​ person, where negative values indicated a weight gain. What type of study was​ performed?

experiment

Two events E and F are​ __________ if the occurrence of event E in a probability experiment does not affect the probability of event F.

independent

The sample standard deviation better estimates the population standard deviation for​ _______ sample sizes.

larger

The​ _________________ is/are the entire group of individuals or items being studied.

population

Standardizing data into​ z-scores is just shifting them by the​ _______ and rescaling them by the​ _______.

mean; standard deviation

The following​ stem-and-leaf plot shows the daily high temperature in a town on April 1st for​ twenty-four random years. Which measures of center and spread are most appropriate for this​ data? 5 1 1 2 2 3 4 6 6 6 7 8 6 0 0 1 2 4 4 9 7 2 3 6 8 1 2 9 0

median and interquartile range

The only measure of center that can be found for both quantitative and qualitative data is the ​ ______________.

mode

A frequency distribution lists the​ _________ of occurrences of each category of​ data, while a relative frequency distribution lists the​ _________ of occurrences of each category of data.

number; proportion

Do people walk faster in an airport when they are departing​ (getting on a​ plane) or after they have arrived​ (getting off a​ plane)? An interested passenger watched a random sample of people departing and a random sample of people arriving and measured the walking speed​ (in feet per​ minute) of each. What type of study design is being​ performed?

observation study

Many people believe that students gain weight as freshmen in college. To determine if this is​ true, a student randomly sampled 100 freshmen. Each was weighed when college started in the fall and again when they left for home after the spring term. Should a paired​ t-test or a​ two-sample t-test be used to determine if students weigh more at the end of their freshman year compared to the beginning of their freshman​ year, on​ average?

paired​ t-test

A​ _________________ is a numerical measurement describing some characteristic of a population.

parameter

A​ ________________ variable classifies individuals based on some attribute or characteristic.

qualitative

A​ ________________ variable counts or measures something and has numeric values.

quantitative

​A(n) __________ is a numerical measure of the outcome of a probability experiment.

random variable

In a​ boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left​ whisker, the distribution is skewed

right

The claim being assessed in a hypothesis test is called​ _____________.

the null hypothesis.

What does the standard error of the distribution of sample means​ estimate?

the standard deviation of the distribution of sample means

The simple linear regression model is of the form A=B+​C(xi​)+D. What does A represent in the​ model?

yi

If the null hypothesis is​ true, what will the​ chi-square statistic​ equal?

zero

The standard normal probability distribution has a mean of​ _______ and a standard deviation of​ _______.

zero; one

The expression zα denotes the​ z-score with an area of​ _______ to its right.

α

The expression zα/2 denotes the​ z-score with an area of​ _______ to its right.

α/2

The simple linear regression model is of the form A=B+​C(xi​)+D. What does B represent in the​ model?

β0

The simple linear regression model is of the form A=B+​C(xi​)+D. What does D represent in the​ model?

εi

Professional baseball players have a mean salary of​ $3.340 million​ (as of last​ year's opening​ day). Of​ course, the salaries for baseball players vary. If you had to​ guess, which of the following seems most reasonable for the IQR of professional baseball player​ salaries?

​$1,500,000

Brett is a huge sports fan. He hypothesized half of sports fans liked football the​ best, 25% liked baseball the​ best, 15% liked basketball the​ best, and​ 5% liked hockey the​ best, and the rest liked some other sport the best. He surveyed 500 sports fans and asked what sport they liked the best. Which of the following is the way to calculate the number of these 500 sports fans expected to say that basketball is their favorite sport if the null hypothesis is​ true?

​(500)(0.15)

A nutritionist wants to estimate the difference between the percentage of men and women who have high cholesterol. What sample size should be obtained if she wishes the estimate to be within 2 percentage points with​ 90% confidence, assuming the​ following? ​(a) She uses the estimates of​ 18.8% male and​ 20.5% female from the National Center for Health Statistics. ​(b) She does not use any prior estimates.

​(a) n=n1=n2=2135 ​(b) n=n1=n2=3382

A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house and the distance from the burning house to the nearest fire station were recorded. Part of the output from a simple linear regression analysis is given below. Which of the following is a correct interpretation of​ R-square? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S​ = 9.8101 ​ R-Sq =​ 26.0%

​26% of the variation in the amount of damage to a house is explained by a simple linear regression with the distance of the burning house from the nearest fire station as the explanatory variable.

Concern over the weather associated with El Nino has increased interest in the possibility that the climate on earth is getting warmer. The most common theory relates an increase in atmospheric levels of carbon dioxide ​(CO2​), a greenhouse​ gas, to increases in temperature. A regression analysis of the mean annual CO2 concentration​ (in parts per​ million) in the atmosphere at the top of Mauna Loa in Hawaii and the mean annual air temperature​ (in degrees​ Celsius) over both land and sea across the globe for 37 years was performed. Assume all conditions are met for simple linear regression inference. What percent of the variation in average annual air temperatures is explained by the regression analysis with annual CO2 levels over Mauna Loa as the explanatory​ variable? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S​ = 9.8101 ​ R-Sq =​ 26.0%

​33.4%

Which of the following statements is not equivalent to the​ others?

​35% of individuals who have never married are male.

According to the Empirical​ Rule, 95% of the area under the normal curve is between μ−2σ and μ+2σ. What percent of the area under the normal curve is between μ and μ+2σ​?

​47.5%

A student was wondering if students at her university arrived on campus each day the same way as another university. At the other​ university, 60%​ drove, 30% biked or​ walked, and the other​ 10% arrived using other means of transportation. The student randomly sampled 150 students one afternoon at her university and asked how they arrived at campus that day. Which hypothesis test should the student use to determine if students at her university arrive to campus in the same proportion as the other​ university?

​Chi-square goodness of fit test

A voter was interested in comparing the proportion in favor of national health care between people who say they are​ Republicans, Democrats, and Independents. From each​ party, she randomly selected 50 people registered as a member of that party in her county and asked whether or not they were in favor of a national health care program. Which of the following hypothesis tests should this voter​ use?

​Chi-square test of homogeneity

Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each​ day, which measure of central tendency better describes the typical number of text messages per​ day? 21 22 24 26 26 29 32 32 33 88

​Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.

Which is greater in a normal​ distribution, the mean or the​ median? Explain.

​Neither; the mean and median are always equal in a normal​ distribution, since it is symmetric.

Suppose a systematic random sample of amusement park visitors is taken by selecting the 9th visitor to walk through the gates on a given day and every 15th visitor after that until 500 visitors have been surveyed. Would this constitute a simple random​ sample? Why or why​ not?

​No, because every group of 500 visitors does not have the same chance of being selected for the sample.

A student randomly sampled 15 senior male students and 15 senior female students and found their grade point average through their junior year. She obtained the accompanying scatterplot. Can the correlation coefficient be used to describe the strength of the relationship between these two​ variables? Image: (Concept hw 4, question 23)

​No, because sex is a categorical variable.

Determine whether the distribution is a discrete probability distribution. If​ not, state why. x 0 10 20 30 40 50 P(x) 0.2 0.2 0.2 0.2 0.2 0.2

​No, because the probabilities do not sum to 1.

The probability that a randomly selected adult in a particular community is a smoker is​ 20%. The probability that a randomly selected adult in the community is a​ smoker, given that the adult earns more than​ $75,000 per​ year, is​ 10%. Are the events​ "is a​ smoker" and​ "earns more than​ $75,000 per​ year" independent? Explain.

​No, because the probability of smoking is different for people who earn over​ $75,000 per​ year, the events are not independent.

Suppose a fair die is rolled ten times and the result is recorded each time. Does this constitute a binomial​ experiment? Why or why​ not?

​No, because there are more than two outcomes for each trial.

Cards are drawn with replacement from a standard deck until a king is drawn. Does this constitute a binomial​ experiment? Why or why​ not?

​No, because there is not a fixed number of trials.

Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph line ends are above the x axis) (Concept hw 7, question 21)

​No, because this graph increases as the value of x becomes very large.

Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph looks symmetrical but the ends of the line go below the x axis) (Concept hw 7, question 20)

​No, because this graph is not always above the​ x-axis.

Days before a presidential​ election, a nationwide random sample of registered voters was taken. Based on this random​ sample, it was reported that​ "52% of registered voters plan on voting for Robert Smith with a margin of error of ±​3%." The margin of error was based on a​ 95% confidence level. Can we say with​ 95% confidence that Robert Smith will win the election if he needs a simple majority of votes to​ win?

​No, because​ 50% is within the bounds of the confidence interval.

Can the variance of a data set ever be​ negative? Explain.

​No; since the variance is based on the squared deviations from the mean and​ N, it cannot be negative.

Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph is skewed to the right)

​No; this graph is not symmetric.

Which two graphs allow the reader to retrieve the original list of​ data?

​Stem-and-leaf plots and dotplots


Related study sets

Tableau Desktop Specialist Certification

View Set

Module 6 RAID and Expansion Devices

View Set

Causes of the American Revolution & American Revolution

View Set

Chapter 26 - Monopoly behavior: Second-degree price discrimination

View Set

Expressing feelings with verbs or ed/ing adj.

View Set

Cognitive Psychology Chapter 1 Quiz

View Set

pediatric integumentary conditions

View Set