Stat 113 Final Exam
According to the Empirical Rule, 68% of the area under the normal curve is between μ−σ and μ+σ. What percent of the area under the normal curve is between μ and μ+σ?
34%
A college student was interested in the average amount college students spend on entertainment each week. He randomly sampled 200 students and found the following 95% confidence interval: (24,28) dollars per week. What is the value of the margin of error?
$2
Suppose the equation of a least-squares regression line is y=−3.17−2.4x. What can be said about the y-intercept?
It is −3.17.
Suppose the equation of a least-squares regression line is y=−3.17−2.4x. What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
In regression, what can be said about the sum of the residuals of all the observations?
It will always be 0.
Which of the following is NOT needed to construct a boxplot?
Mean
Which information can you obtain from a stem-and-leaf plot but not from a histogram?
Minimum and maximum data values
Researchers wondered what the average braking distance is for new cars traveling at 60 MPH. They randomly sampled 40 new cars made by two different companies. For each car, the same driver obtained a speed of 60 MPH and then pushed on the brake pedal as hard as she could. The average stopping distance for these 40 cars was 155.2 feet. Suppose σ is known to be 12 feet. Assume that braking distances of the cars is independent. Are the other conditions satisfied to use the one-sample z-methods to construct a confidence interval?
No. Even though the distribution of sample means will be approximately normal because of the large sample size, this sample of new cars is not representative of all new cars since a random sample of all new cars was not taken.
Suppose your statistics professor teaches two sections of your course this semester. She gives the same exam to each class. Their performance is summarized below. Can she conclude that the overall mean on the exam was 80 (the average of the two individual class means)? Why or why not? First Class: n=32 mean=75 standard deviation=5.6 Second Class: n=38 mean=85 standard deviation=7.2
No. Since the class sizes are different, she would need to find the weighted mean.
Suppose a newspaper surveys 250 adults in a nearby town and inquires about their cell phone carrier. The accompanying table summarizes the results. Does this table describe a relative frequency distribution? Why or why not? Carrier A Percent: 30 Carrier B Percent: 30 Carrier C Percent: 10 Carrier D Percent: 20 Carrier E Percent: 5
No. The sum of the relative frequencies is 95%, not 100%.
The numbers used to separate the classes of a frequency distribution, but without the gaps created by class limits, are called ____________________.
class boundaries.
A ____________________ is found by adding the lower and upper class limit and then dividing the sum by 2.
class midpoint
The ____________________ is the difference between two consecutive lower class limits or two consecutive upper class limits.
class width
A quantitative variable that has an infinite number of possible values that are not countable is called _______.
continuous
The ____________________ for a class is the sum of the frequencies for that class and all previous classes.
cumulative class frequency
A quantitative variable that has a finite or countable number of values is called _______.
discrete
A __________ random variable has either a finite or countable number of values.
discrete
Which terms are used to describe events that have no outcomes in common?
disjoint or mutually exclusive
The more variable the data, the _______ accurate the sample mean will be as an estimate of the population mean.
less
When performing a linear regression analysis, it is important that the relationship between the two quantitative variables be _______.
liner
Typically, the idea of the _______ hypothesis is that of "no effect," "no difference," or "no change."
null
The _________________ is/are a subset of the population that is being studied.
sample
The _______ of a probability experiment is the collection of all possible outcomes.
sample space
Suppose that a researcher is interested in the average standardized test score for fifth graders in a local school district. The fifth graders at a specific school would comprise a ___________ and their average test score would be a ___________.
sample; statistic
A z-score represents how many ______________ a data value is above or below the ______________.
standard deviation; mean
A _________________ is a numerical measurement describing some characteristic of a sample.
statistic
A regression was performed on test data for 37 car models to examine the association between the weight (thousands of pounds) of the car and the fuel efficiency (miles per gallon (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct test-statistic (with degrees of freedom, if needed) that should be used for this hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339
t35
Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph has two bumps) (Concept hw 7, question 22)
No, because this graph is not bell-shaped.
In a normal distribution, approximately 99.7% of the area under the normal curve is within how many standard deviation(s) of the mean?
Three
Determine whether the graph can represent a Normal density function or explain why it cannot. (Concept hw 7, question 23)
Yes
What is a variable other than x and y that simultaneously affects both variables called?
a lurking variable
A(n) ____________________ is a bar graph in which the height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch each other.
histogram
For a given degrees of freedom, the larger the chi-square statistic, the ____________ evidence there is to reject the null hypothesis.
more
The larger the sample, the _______ accurate the sample mean will be as an estimate of the population mean.
more
A sample is said to be __________ if the statistics computed from it accurately reflect the corresponding population parameters.
representative
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation from the mean?
32%
The margin of error is _____________ the width of the confidence interval.
half
A correlation coefficient can be 0.
The statement is true.
A research organization wanted to estimate the average number of hours a college student sleeps per night during the school year. After randomly sampling 150 college students, the research organization determined the following 95% confidence interval: (7.1 hours/night, 7.5 hours/night). What is the value of the average number of hours slept per night during the school year for all college students?
We're 95% confident that it's somewhere between 7.1 and 7.5 hours per night.
Can the standard deviation ever be larger than the variance? Explain.
Yes; if the variance is less than one, then its square root (the standard deviation) will be larger than the variance.
Can a qualitative variable have values that are numeric? Why or why not?
Yes; it is possible to have numeric variables that do not count or measure anything, and, as a result, are qualitative rather than quantitative.
A professor wondered if there was a difference in the proportion of students who dropped math classes between females and males. The professor randomly selected 20 math classes around campus and recorded the gender of the individual and whether or not a student enrolled in the class at the beginning of the term dropped the class at some point during the term. Assuming all conditions are satisfied, which of the following tests should the researcher use?
two-sample z-test for proportions
Click the icon to view the table of areas under the t-distribution.
(a) Find the t-value such that the area in the right tail is 0.05 with 27 degrees of freedom. 1.703 (b) Find the t-value such that the area in the right tail is 0.01 with 19 degrees of freedom. 2.539 (c) Find the t-value such that the area left of the t-value is 0.15 with 12 degrees of freedom. [Hint: Use symmetry.] -1.083 (d) Find the critical t-value that corresponds to 95% confidence. Assume 20 degrees of freedom. 2.086
A baseball pitcher threw a no-hitter. The accompanying side-by-side boxplot shows the pitch speed (in miles per hour) for all of his pitches during the game. Complete parts (a) through (f) below. (a) Which pitch is typically thrown the fastest? (b) Which pitch is most erratic as far as pitch speed goes? (c) Which pitch is more consistent as far as pitch speed goes, the cut fastball or the four-seam fastball? (d) Are there any outliers for the pitcher's cut fastball? If so, approximate the pitch speed of any outliers. Select the correct choice below and, if necessary, fill in the answer box to complete your choice. (e) Describe the shape of the distribution of the pitcher's curveball. (f) Describe the shape of the distribution of the pitcher's four-seam fastball. Image: (Concept hw 3, question 36)
(a) Two-seam fastball (b) Two-seam fastball (c) Four-seam fastball (d) Outlier(s) at 90 miles per hour (e) The distribution is symmetric. (f) The distribution is skewed right.
Determine the value of each expression below. (a) 7! (b) 0! (c) 9C4 (d) 10C3 (e) 9P2 (f) 12P4
(a) 7!=50405040 (b) 0!=11 (c) 9C4=126126 (d) 10C3=120120 (e) 9P2=7272 (f) 12P4=1188011880
The quality of the orange juice produced by a manufacturer is constantly monitored. There are numerous sensory and chemical components that combine to make the best-tasting orange juice. One manufacturer developed a quantitative index of the "sweetness" of orange juice. (The higher the index, the sweeter the orange juice.) The manufacturer wondered if there was a relationship between the amount of water-soluble pectin (in parts per million) in the orange juice and the sweetness index. Data were collected on these two variables during 24 production runs in a particular plant. Review the accompanying scatterplot. Which is the value of the correlation coefficient with all outliers included? (Try to answer this question without calculating the correlation coefficient.) Image: (Concept hw 4, question 18)
+0.48
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. Review the accompanying scatterplot of serving size versus number of calories. What is the correlation coefficient? (Try to figure out the correct answer without calculating the correlation coefficient.) Image: (Concept hw 4, question 21)
+0.80
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. What is the correlation coefficient? (Try to figure out the correct answer without calculating the correlation coefficient.) Image: (Concept hw 4, question 18)
+0.88
In a study conducted to examine the quality of fish after 7 days in ice storage, ten raw fish of the same kind and approximately the same size were caught and prepared for ice storage. The fish were placed in ice storage at different times after being caught. A measure of fish quality was given to each fish after 7 days in ice storage. Review the accompanying sample data and scatterplot, where "Time" is the number of hours after being caught that the fish was placed in ice storage and "Fish Quality" is the measure given to each fish after 7 days in ice storage (higher numbers mean better quality). What is the correlation coefficient? (Try to figure out the correct answer without calculating the correlation coefficient.) Image: (Concept hw 4, question 17)
-0.99
A regression was performed on test data for 37 car models to examine the association between the weight (thousands of pounds) of the car and the fuel efficiency (miles per gallon (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct way to get the test-statistic in this hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339
-4.87/1.339
Suppose two events E and F are disjoint. What is P(E and F)?
0
If the area under the standard normal curve to the left of z=−1.72 is 0.0427, then what is the area under the standard normal curve to the right of z=1.72?
0.0427
In a study conducted to examine the quality of fish after 7 days in ice storage, ten raw fish of the same kind and approximately the same size were caught and prepared for ice storage. The fish were placed in ice storage at different times after being caught. A measure of fish quality was given to each fish after 7 days in ice storage. The sample data are shown below, where "Time" is the number of hours after being caught that the fish was placed in ice storage and "Fish Quality" is the measure given to each fish after 7 days in ice storage (higher numbers mean better quality). The least-squares regression equation is y=8.4425−0.1495x, where x is "Time" and y is predicted "Fish Quality." What is the residual for the first observation, (0,8.5)? Time 0 0 2 3 5 6 7 9 11 12 Fish Quality 8.5 8.4 8.0 8.1 7.8 7.6 7.3 7.0 6.8 6.7
0.0575
If the area under the standard normal curve between z=−1.46 and z=0 is 0.4279, then what is the area under the standard normal curve between z=0 and z=1.46?
0.4279
Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained a 95% confidence interval of (0.52,0.68) for the proportion of all adults in the city rooting for North High. What proportion of the 150 adults in the survey said they were rooting for North High School?
0.60
After constructing any relative frequency distribution, what should be the sum of the relative frequencies?
1 or 100%
Suppose the probability that a randomly selected man, aged 55 - 59, will die of cancer during the course of the year is 300/100,000. How would you find the probability that a man in this age category does NOT die of cancer during the course of the year?
1-0.003
The expression zα denotes the z-score with an area of _______ to its left.
1-α
The methods of statistics follow a process. Place the processes in the correct order.
1. Identify the Research Objective 2. Collect the Data Needed to Answer the Research Question(s) 3. Describe the Data 4. Perform Inference
Suppose a four-digit alarm code is formed by choosing digits from 0 to 9, with repetition allowed. Which of the following expressions would be a correct way to count the number of such codes?
10•10•10•10
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation above the mean?
16%
Classified ads in a newspaper offered 21 used cars of the same make and model for sale. A regression analysis was performed with the age of the car (in years) as the explanatory variable and asking price (in dollars) as the response variable. A 95% confidence interval for the slope of the population regression line is (−1439,−1011). What is the margin of error?
214
A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a 99% confidence interval of (17,25) hours/week. What is the margin of error?
4 hours/week
According to the Empirical Rule, 95% of the area under the normal curve is within two standard deviations of the mean. What percent of the area under the normal curve is more than two standard deviations from the mean?
5%
Researchers studied the mean egg length (in millimeters) for a bird population. After taking a random sample of eggs, they obtained a 95% confidence interval of (45,60). What is the value of the sample mean?
52.5 mm
Researchers studied the mean egg length (in millimeters) for a bird population. After taking a random sample of eggs, they obtained a 95% confidence interval of (45,60). What is the value of the margin of error?
7.5 mm
A Type II Error is made...
A Type II Error is made when there's not enough evidence to reject the null hypothesis, but the null hypothesis is not true.
Suppose that you have data which indicates that 90% of adults in a nearby town have cell phones. Of those who have cell phones, 30% use Carrier A, 30% use Carrier B, 10% use Carrier C, 20% use Carrier D, 5% use Carrier E, and 5% use other carriers. Would a bar graph or pie chart be better if the goal is to compare Carrier B and Carrier C? Explain.
A bar graph would be better since you are trying to compare two parts, not a part to the whole. The angles might be difficult to judge on a pie chart, making it hard to directly compare two sectors.
Which of the following is a correct explanation of what a confidence interval is?
A confidence interval is a range of values used to estimate the true value of a population parameter. The confidence level is the probability the interval actually contains the population parameter, assuming that the estimation process is repeated a large number of times.
Which distribution shape (skewed left, skewed right, or symmetric) is most likely to result in the mean being substantially smaller than the median?
A distribution that is skewed left will likely have a mean that is smaller than the median since the extreme values in the tail tend to pull the mean to the left.
Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression analysis?
A linear regression analysis relies on a straight line being fit between the points on a scatterplot.
In regression, what is the difference between an observed value of the response variable and its predicted value called?
A residual
It is hypothesized that 50% of Americans attend church regularly. Which of the following would be an example of making a Type I Error?
A study was conducted that had evidence to reject the null hypothesis. In reality, half of Americans actually do attend church regularly.
Which of the following is not a valid explanation of the Law of Large Numbers?
A. The relative frequency approximations for an event tend to get better with more observations. B. If one looks at the proportion of times an event has occurred over a long period of time (or over a large number of trials), one can be more certain of the likelihood of its occurrence. C. The more times an experiment is repeated, the closer the relative frequency of an event tends to get to the actual probability of the event. D. These statements are all valid. ANSWER: D
Explain why Social Security Number is considered a qualitative variable even though it contains numbers.
Addition and subtraction of Social Security Numbers does not provide meaningful results. This makes it qualitative even though it is numeric.
A student at a large university was interested in how students and faculty felt about the behavior of calling an instructor by their first name. She randomly selected 100 students and 75 faculty and asked each to rate the behavior of calling an instructor by their first name on a scale from 1 to 5, where 1=totally inappropriate and 5=totally appropriate. Call each individual's rating of this behavior the perception score. The student wanted to construct a 95% confidence interval for the difference in the average perception score between students and faculty at this large university. Which condition for using the two-sample t-methods for inference should this student be concerned about?
All conditions are most likely met in this problem. Therefore, this student can make inferences to the populations of interest by using the two-sample t-methods to construct the confidence interval.
A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house (in thousands of dollars) and the distance from the burning house to the nearest fire station (in miles) was recorded. Part of the output from a simple linear regression analysis is provided. Assume all conditions are met for simple linear regression inference. What is the response variable? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S = 9.8101 R-Sq = 26.0%
Amount of damage to the house (in thousands of dollars)
The Empirical Rule tells the approximate percentage of the data which falls into certain ranges. To which distributions does the Empirical Rule apply?
Any normal distribution
State the condition required to use the Empirical Rule to check for unusual observations in a binomial experiment.
As a rule of thumb, if X is binomially distributed, the Empirical Rule can be used when np(1−p)≥10
In sampling without replacement, the assumption of independence required for a binomial experiment is violated. Under what circumstances can we sample without replacement and still use the binomial probability formula to approximate probabilities?
As a rule of thumb, if the sample size is less than 5% of the population size, the trials can be considered nearly independent.
Explain the Law of Large Numbers. How does this law apply to gambling casinos?
As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome. This applies to casinos because they are able to make a profit in the long run because they have a small statistical advantage in each game.
A p-value is the probability _____________.
A p-value is the probability of observing the actual result, a sample mean, for example, or something more unusual just by chance if the null hypothesis is true.
What is an advantage to using a stem-and-leaf plot instead of a histogram to display data?
A stem-and-leaf plot allows for retrieval of the original data from the plot while the histogram does not.
Which probability method requires that an experiment have equally likely outcomes?
Classical
This occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable not accounted for in the study.
Confounding
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.
A researcher randomly assigns the individuals in a study to groups, intentionally manipulates the value of the explanatory variable and controls other explanatory variables at fixed values, and then records the value of the response variable for each individual.
Designed Experiment
If a researcher wants to claim causation between an explanatory variable and a response variable, which of the following should they use?
Designed experiment
The Addition Rule P(E or F)=P(E)+P(F) applies only to which type of events?
Disjoint
A simple random sample of 150 adults is obtained and each person's red blood cell count (in cells per microliter) is measured. The sample mean is 4.63. The population standard deviation for red blood cell counts is 0.54. Which of the following is true regarding the distribution of sample means?
Even though the distribution of the population data is not given, the distribution of sample means will be approximately normal because the sample size is large enough according to the Central Limit Theorem.
There were 100 random samples of 100 individuals taken from a population which is known to have a moderately skewed distribution with some mean and a known standard deviation. A 95% confidence interval for the population mean was constructed for each of the 100 random samples using the one-sample z-methods. Which of the following statements is true?
Even though the population data are skewed, the distribution of sample means will be approximately normal because of the large sample size. Therefore, approximately 95% of the 100 confidence intervals constructed using the one-sample z-methods will capture the true population mean.
Nurses wondered if birth weights of babies are going up. They knew that the average birth weight of a baby last year was 7.6 pounds. A random sample of 15 weights of babies at the hospital where the nurses work gave an average birth weight of 7.9 pounds. Nurses felt that the birth weights this year were normally distributed. Which of the following is true about the distribution of sample means?
Even though the sample size is less than 30, the distribution of sample means will be normal because the population data follow a normal distribution.
There were 100 random samples of 10 individuals taken from a population which is known to have a normal distribution with some mean and a known standard deviation. A 95% confidence interval for the population mean was constructed for each of the 100 random samples using the one-sample z-methods. Which of the following statements is true?
Even though the sample size is small, the distribution of sample means will be normal, since the population data follow a normal distribution. Therefore, approximately 95% of the 100 confidence intervals constructed using the one-sample z-methods will capture the true population mean.
Explain why the mean should not be found for a sample of zip codes. Which measure of center should be used instead?
Even though they are numeric data, zip codes are qualitative since they do not measure or count anything. The mean cannot be found since adding zip codes would be meaningless. For qualitative data, the mode is the only measure of center that can be found.
What must be true for a sample to be considered a simple random sample?
Every member (or sample) must have the same chance of being selected as every other member (or sample of the same size).
In regression, what is predicting outside the range of the x-values from the sample data called?
Extrapolation
Researchers conducted a study and obtained a p-value of 0.75. Based on this p-value, what conclusion should the researchers draw?
Fail to reject the null hypothesis but do not accept the null hypothesis as true either.
A confidence interval indicates how confident we are with the hypothesized value for the population mean.
False
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The least-squares regression equation is y=33.967+11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. For a duration of 4 minutes, y=75.4 minutes. This means that a visitor will have to wait exactly 75.4 minutes after the current eruption ends before the next eruption begins. Is this statement true or false? Image: (Concept hw 4, question 27)
False
Steve calculated a correlation coefficient between gas price and miles driven as −0.15. Steve said there was a strong negative association between gas price and miles driven. Is this statement true or false?
False.
Both of the graphs represent normal distributions with a standard deviation of σ=2. Determine which of the two normal distributions has a mean of μ=8 and which has a mean of μ=14. Explain how you know which is which. Image: (Concept hw 7, question 25)
Graph A has a mean of μ=8. Graph B has a mean of μ=14. A normal curve will be centered over its mean.
Both of the graphs represent normal distributions with a mean of μ=10. Determine which of the two normal distributions has a standard deviation of σ=2 and which has a standard deviation of σ=3. Explain how you know which is which. (Concept hw 7, question 24)
Graph A has a standard deviation of σ=2. Graph B has a standard deviation of σ=3. Since Graph B is a wider graph, it has a greater spread and a larger standard deviation.
Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. Of the surveyed adults, 96 said they were rooting for North High while the rest said they were rooting for South High. Eric wants to determine if this is evidence that more than half the adults in this city will root for North High School. Which of the following is the correct null hypothesis?
H0: p=0.5, where p=the proportion of all adults in this city rooting for North High School
Alex hypothesized that, on average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is the null hypothesis Alex will test?
H0: μx=2 hours per week per credit
A regression was performed on test data for 37 car models to examine the association between the weight (thousands of pounds) of the car and the fuel efficiency (miles per gallon (MPG)). A partial output from the simple linear regression analysis is given below. Determine the null hypothesis if a hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339
H0: weight of cars is not a significant predictor of fuel efficiency ofcars, meaning the slope of the regression line in the population is 0.
Alex hypothesized that, on average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is Alex's alternative hypothesis?
H1: μ<2 hours per week per credit
A commuter has a choice of two routes for his morning drive to work. In an effort to determine the best route, he collects data on his drive time for each route. If he is interested in a route with a predictable drive time, which route should he choose and why? Image: Concept hw 3, question 33)
He should choose the Rural Roads route. The smaller IQR indicates a smaller spread and more consistent times for that route.
Explain why z-scores would be an appropriate way to compare the heights of the world's tallest man and tallest woman.
Height distributions for men and women have different centers and spreads, making it difficult to compare male and female heights directly.
Benjamin performed a two-tailed one-sample t-test and obtained a p-value=1. What conclusion should he make?
His sample mean must have been exactly equal to his hypothesized value for the population mean.
Suppose that you have data which indicates that 90% of adults in a nearby town have cell phones. Of those who have cell phones, 30% use Carrier A, 30% use Carrier B, 10% use Carrier C, 20% use Carrier D, 5% use Carrier E, and 5% use other carriers. Which of the following is not a reasonable graph to display this information?
Histogram
An association of Realtors reports state-by-state median existing-home prices for each quarter. Why do you suppose they use the median instead of the mean? What might be the disadvantage of reporting the mean?
Home prices are probably skewed to the right and not symmetric. This makes the median a better representation of the center than the mean which would be influenced by the extremely high priced homes. Reporting the mean would give the impression that the "typical" home price is higher than it is.
Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y=71.8+0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the y-intercept?
IQ is predicted to be 71.8 for a brain size of 0 cubic centimeters.
Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y=71.8+0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the slope?
IQ is predicted to increase by 0.0286 for every 1 cubic centimeter increase in brain size.
Which of the following would increase the width of a confidence interval for a population mean?
Increase the level of confidence
Suppose every student in a class is surveyed and it is found that 75% of the class plans to take another math class. It is reported that 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Inferential statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.
Which measure of spread is considered resistant?
Interquartile range
In a typical boxplot, the length of the box indicates which measure of spread?
Interquartile range (IQR)
A freshman in college wanted to determine if the "Freshman 15" is true. That is, this student wanted to determine if freshmen in college gain more than 15 pounds during their freshman year. She randomly selected 50 freshmen during the first week of school at the beginning of the year and weighed them. During finals week of the last term of the year, she weighed the same 50 students. She recorded the weight change of each-a positive value indicated a weight gain while a negative value indicated a weight loss during the year. Based on her sample, a 95% confidence interval for the average weight change of freshmen during their freshman year is (8.9,12.1) lbs. What conclusion can be made based on this confidence interval?
It appears that the "Freshman 15" is not true. That is, it appears that freshman do not gain more than 15 pounds during their freshman year, on average, since the upper bound is less than 15.
Regression was performed on test data for 37 car models to examine the association between the weight (thousands of pounds) of the car and the fuel efficiency (miles per gallon (MPG)). A 95% confidence interval for the slope of the regression line is (−7.6,−2.2). Interpret this confidence interval.
It can be said with 95% confidence that fuel efficiency of a car will decrease between 2.2 and 7.6 miles per gallon for every 1000 pound increase in the weight of the car.
The simple linear regression model is yi=β0+β1xi+εi. For what does yi stand?
It is the observed value of the response variable for the ith observation in the population.
The simple linear regression model is yi=β0+β1xi+εi. For what does εi stand?
It is the residual of the ith observation in the population.
The simple linear regression model is yi=β0+β1xi+εi. For what does β1 stand?
It is the slope of the population regression line.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. Review the accompanying scatterplot of serving size versus number of calories. There are a couple of potential outliers—the sandwich with a serving size of around 6.75 ounces and about 260 calories, and the sandwich with a serving size of around 8.4 ounces and about 720 calories. If these two observations were removed, how would the correlation coefficient change? Image: (Concept hw 4, question 22)
It would increase since the removal of these data points makes the relationship stronger.
If an observation has a residual of 0, which of the following statements is true?
Its predicted value is the same as its observed value.
This is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, these are typically related to explanatory variables considered in the study.
Lurking Variable
When analyzing two quantitative variables, what is the first thing that should be done?
Make a scatterplot.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in mind, which variable would go on the y-axis in the scatterplot?
Number of calories goes on the y-axis, since it is the response variable.
A researcher measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, the researcher observes the behavior of individuals in the study and records the values of the explanatory and response variables.
Observational Study
In a normal distribution, approximately 68% of the area under the normal curve is within how many standard deviation(s) of the mean?
One
Classified ads in a newspaper offered 21 used cars of the same make and model for sale. A regression analysis was performed with the age of the car (in years) as the explanatory variable and asking price (in dollars) as the response variable. A 95% confidence interval for the slope of the population regression line is (−1439,−1011). What is the correct interpretation of this confidence interval?
One can be 95% confident that the asking price of used cars of the same make and model will decrease between $1011 and $1439 for every one year increase in the age of the car.
The entire group of individuals to be studied is called a ______.
Popultion
Allow classification of individuals based on some attribute or characteristic
Qualitative Variable
This provides numerical measures of individuals. The values of these can be added or subtracted, and provide meaningful results.
Quantitative Variable
Suppose you want to know if more technical service calls are made to homes with cable television or with satellite dish television. Should you use frequencies or relative frequencies to make the comparison? Why?
Relative frequencies should be used since there is likely a difference in the number of users of cable and satellite television. If you make comparisons using frequencies, the results can be very misleading for different population sizes.
The variable of interest in the outcome of a study _______.
Response Variable
What factor(s) affect the accuracy of the sample mean as an estimate of the population mean?
Sample size and variability
Jan performed a study and obtained a p-value of 1.24. What conclusion should Jan make?
She made an error since it is not possible to get a p-value of 1.24.
Which measure of center must be equal to an actual data value? Explain why.
Since the mode is the most frequent observation that occurs in the data set, it must be an actual value from the data set.
This histogram shows the heights of 20 students in a statistics class. Explain why it is not appropriate to find summary statistics for this distribution.
Since there appear to be two modes, this data probably represents men and women and should be split into those two groups before finding any summary statistics. (two highest points)
A researcher wants to assess the effects of taking prenatal vitamins on the health of newborns, using the newborn weight as the response variable. Explain why it might be inappropriate to use a designed experiment to address this research objective.
Since there is a perceived benefit to taking prenatal vitamins, there would be ethical issues in intentionally denying them to some pregnant women.
Which of the following statements is true concerning standardizing data into z-scores?
Standardizing data into z-scores does not change the shape of a distribution of a variable.
This is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It is also about providing a measure of confidence in any conclusions.
Statistics
Ronnie randomly sampled 80 college students, 50 living a dorm and the other 30 living in an apartment. She asked each how much they spent on food and beverages (non-alcoholic) within the last 7 days. A 95% confidence interval for the difference in the mean amount spent on food and drinks over the past 7 days between students living in a dorm and students living in an apartment (dorm−apartment) is (−$25.80,−$11.20). Which of the following is true regarding the difference in sample means?
Students living in an apartment spent $18.50 more, on average, on food and drinks over the past 7 days than students living in a dorm.
Suppose that a binomial random variable X is counting the number of patients with cancer at a particular hospital. How will "success" be defined in this situation?
Success would be defined as selecting a patient at the hospital who has cancer.
Days before a presidential election, an article based on a nationwide random sample of registered voters reported the following statistic, "52% (±3%)of registered voters will vote for Robert Smith." What is the "±3%" called?
The "±3%" is called the margin of error.
Identify the properties of Student's t-distribution.
The area under the curve is 1; half the area is to the right of 0 and half the area is to the left of 0. As the sample size n increases, the distribution (and the density curve) of the t-distribution becomes more like the standard normal distribution. As t gets extremely large, the graph approaches, but never equals, zero. Similarly, as t gets extremely small (negative), the graph approaches, but never equals, zero. It is symmetric around t=0.
If a professor adds 10 points to each student's final exam score, how will it affect the distribution of final exam scores?
The center will change, but the shape and spread will remain the same.
What is wrong with the following class limits for organizing weight data for a sample of 200 adult men in the United States? 140-150 pounds 150-160 pounds 160-170 pounds 170-180 pounds 180-190 pounds 190-200 pounds 200-210 pounds 210-220 pounds 220-230 pounds
The classes are overlapping.
Suppose an experiment consists of rolling a fair die ten times and recording the number of sevens obtained. If event E is defined as getting at least one 7, how would you describe the complement of E?
The complement of event E is the event that no sevens are obtained.
April calculated a correlation coefficient between sex and GPA as −0.25. She said there is a weak correlation between a person's sex and their GPA. Which of the following is an appropriate comment about April's statement?
The correlation coefficient does not make sense to describe the relationship between a categorical and quantitative variable.
What is the definition of the correlation coefficient?
The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.
A chi-square test is being performed to test the null hypothesis that the distribution of eye color is the same for both males and females. In order for conclusions from the chi-square test to be valid to all males and females, which of the following does not have to be true?
The distribution of sample means is approximately normal.
Birth weights in the United States are normally distributed with σx=500 grams. A random sample of 15 babies was taken. The average birth weight of these 15 babies was 3450 grams. Which of the following is true regarding the distribution of sample means?
The distribution of sample means will be normal since birth weights of all babies are normally distributed.
Which of the statements below is true concerning bar graphs?
The height of each bar represents the category's frequency or relative frequency.
A high correlation coefficient indicates that the relationship between the two quantitative variables must be linear.
The statement is false.
A 95% confidence interval for μ1−μ2 using the two-sample t-methods with 40 degrees of freedom is (65,71). The margin of error is _____________.
The margin of error is 3.
Suppose there are ten five- and six-year-olds attending a birthday party. When a 30-year-old mother walks into the room with an infant in her arms, what happens to the mean age in the room? What happens to the standard deviation of ages in the room?
The mean and standard deviation will both change.
What is the mean of a probability distribution?
The mean is the expected value of the random variable.
Which of the following is a property of the standard normal curve, but not necessarily a property of every normal curve?
The mean is zero and the standard deviation is one.
Identify which statement about the mean of a discrete random variable is not true or state that they are all true.
The mean must be a possible value of the random variable.
Suppose a normal model has a standard deviation of σ=10, and 40% of the values are below 75. Which of the following must be true about the mean? You should be able to answer without doing any calculations.
The mean must be greater than 75.
Suppose a normal model has a standard deviation of σ=10, and 60% of the values are below 75. Which of the following must be true about the mean? You should be able to answer without doing any calculations.
The mean must be less than 75.
The following stem-and-leaf plot shows the daily high temperature in a town on April 1st for twenty-four random years. Would you expect the mean to be higher or lower than the median? Explain. 5 1 1 2 2 3 4 6 6 6 7 8 6 0 0 1 2 4 4 9 7 2 3 6 8 1 2 9 0
The mean should be higher than the median, since the distribution is skewed right.
If a professor adds 10 points to each student's final exam score, how will it affect the class mean on the final exam?
The mean will increase by 10 points.
Suppose, on the warmest day of the month, the daily high temperature in a city is accidentally recorded as 700 instead of 70 degrees Fahrenheit. Compare the effect this mistake will have on the mean monthly high temperature to the effect on the median monthly high temperature.
The mean will increase significantly, but the median will not change as a result of the mistake.
A sample of thirty users of a popular social networking site yielded the histogram on the right for the number of friends. Which measure of central tendency better describes the "center" of the distribution? Image: (the higher points of the graph are on the left, lowest point on the right meaning that it is skewed to the right)
The median is a better of measure of the center of the data since the distribution is skewed to the right.
How can you tell from a boxplot if the distribution is skewed right?
The median is to the left of the center of the box, and the right whisker is substantially longer than the left whisker.
Allie calculated a correlation coefficient of −0.5. She made a mistake in her calculation since the correlation coefficient cannot be negative.
The statement is false.
Describe the null and alternative hypotheses.
The null hypothesis typically contains an equality while the alternative hypothesis will contain an inequality.
A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house (in thousands of dollars) and the distance from the burning house to the nearest fire station (in miles) was recorded. Part of the output from a simple linear regression analysis is provided. Assume all conditions are met for simple linear regression inference. In words, what is the correct alternative hypothesis to determine if the distance the burning house is to the nearest fire station is a significant predictor of damage to a burning house? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S = 9.8101 R-Sq = 26.0%
The number of miles the burning house is from the nearest fire station helps explain the amount of damage to the burning house.
Suppose you want to count the number of four-letter passwords that can be formed using the letters in the word NUMBER. The four letters in the password must all be different. Which of the following expressions is true concerning the number of such passwords?
The number of passwords can be counted by using 6•5•4•3 or 6P4.
Suppose 5 objects are to be chosen from 10 distinct objects with no repetition allowed. Which will be larger, the number of permutations or the number of combinations?
The number of permutations will be larger since there are more ways to choose if different orderings are counted separately.
Junior wondered how students felt about a proposed increase in student fees. He orally surveyed a group of students waiting in line for food at the food court. He asked their year in school (freshman, sophomore, junior, senior, graduate) and whether or not each supported the fee increase. Why should Junior question the results from a chi-square test?
The opinion of the group of students surveyed may not be representative of the opinion of all students at that university, and the opinions in the sample may not be independent.
What does it mean to say that the trials in a binomial experiment are independent of each other?
The outcome of one trial does not affect the outcomes of the other trials.
Suppose x=60, H0: μx=50, HA: μx>50, and the p-value from a one-sample test is 0.04. What does this p-value mean?
The probability of getting a sample mean of 60 or more if the true population mean is 50 is 0.04.
State an advantage and a disadvantage of using the range instead of the variance as a measure of dispersion in sample data.
The range is easier to calculate, but it is too affected by extreme values in the data set.
When looking at a scatterplot of two quantitative variables, what do we typically look for?
The relationship between the two variables and if there are any deviations from the pattern (outliers or clusters of points, for example).
Mark performed a two-sample z-test for proportions to test the hypothesis that there was no difference in the proportion who support increasing student fees between male and female students at a particular university. Mark obtained a z-statistic of 0. Based on this information, which of the following is always true?
The sample proportions will be the same for both male and female students at this university.
A community college school board is negotiating a new contract with the college faculty. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the school board wants to give the community the impression that the faculty are already overpaid, should they advertise the mean or median of the faculty salaries?
The school board should use the mean to make their argument. The mean will be higher than the median since it will be influenced by the few high salaries.
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The least-squares regression equation is y=33.967+11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. In this equation, what is 11.358? Image: (Concept hw 4, question 28)
The slope of the least-squares regression line
A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. From a group of adult volunteers, researchers randomly assigned half to receive an injection that contained the drug believed to reduce the risk of getting the flu and the other half to receive an injection containing no active ingredient (i.e. sugar water). A hypothesis test was performed and a p-value of 0.0002 was obtained. Which of the following statements is true?
The small p-value indicates strong evidence to reject the null hypothesis. Because an experiment was performed, it can be concluded that the reduction in the risk of getting the flu was caused by the flu shot.
If all the data values in a set are identical, what can you conclude about the standard deviation?
The standard deviation is zero.
Suppose a normal model has a mean of μ=100, and 95% of the values are between 90 and 110. Which of the following must be true about the standard deviation?
The standard deviation must be approximately equal to 5.
Suppose a normal model has a mean of μ=100, and 50% of the values are between 90 and 110. Which of the following must be true about the standard deviation?
The standard deviation must be greater than 10.
Suppose a normal model has a mean of μ=100, and 80% of the values are between 90 and 110. Which of the following must be true about the standard deviation?
The standard deviation must be less than 10.
Although rare, it is possible to get a p-value from a two-sided test greater than 1.
The statement is false.
A correlation coefficient close to 1 is evidence of a cause-and-effect relationship between the two variables.
The statement is false.
Determine if the following statement is true or false. Benjamin was investigating the relationship between outside temperature on a given day and number of hours spent outside that day. After sampling 25 people on 25 different days, he obtained the displayed scatterplot. He should use the correlation coefficient to describe the strength of the relationship between temperature and hours spent outside. Image: (Concept hw 4, question 2)
The statement is false.
Heather was investigating the relationship between outside temperature and type of activity people were engaged in (indoor versus outdoor). She can use the correlation coefficient to describe the strength of this relationship as long as the relationship is linear.
The statement is false.
Researchers conducted a study and obtained a p-value of 0.30. Because the p-value is quite high, there is evidence to accept the null hypothesis.
The statement is false.
The population will be normally distributed if the sample size is 30 or more.
The statement is false.
Identify the requirements for a discrete probability distribution.
The sum of the probabilities must equal one. Each probability must be between zero and one inclusive.
Which of the following is not a criterion for the binomial distribution?
The trials must be dependent.
What is wrong with the following definition of the correlation coefficient? The correlation coefficient measures the strength and direction of the linear relationship between two variables.
The two variables must be quantitative.
A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a 99% confidence interval of (17,25) hours/week. Which of the following would be true if the level of confidence was lowered to 95%?
The width of the confidence interval would be smaller.
A research organization wanted to estimate the average number of hours a college student sleeps per night during the school year. After randomly sampling 150 college students, the research organization determined the following 95% confidence interval: (7.1 hours/night, 7.5 hours/night). What would happen to the width of the confidence interval if the level of confidence increased (assuming everything else remained the same)?
The width of the confidence interval would increase.
Explain why it is misleading to use the term "average" to describe your typical bowling score.
The word "average" is ambiguous and can refer to any measure of center. It is better to use the specific measure of center you intend (mean, median, or mode).
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income.
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
Which of the following statements best describes this scatterplot? Image: (Concept hw 4, question 11)
There is a negative, moderately strong relationship between X and Y with one outlier.
Data was collected on the heights of boys at 12 and 24 months of age. The data is summarized in the following boxplots. Is there more variation in the boys' heights at 12 months or 24 months? Explain how you can tell from the boxplots. Image: (concept hw 3, question 32)
There is more variation in height at 24 months. The box length is longer, indicating a larger interquartile range and greater spread in the data.
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
A regression was performed on test data for 37 car models to examine the association between the weight (thousands of pounds) of the car and the fuel efficiency (miles per gallon (MPG)). A partial output from the simple linear regression analysis is given below. A hypothesis test is to be performed to determine if weight of cars is a significant predictor of fuel efficiency of cars. Determine the correct conclusion based on the results of the hypothesis test. Variable Coefficient SE(Coef) Constant 46.08 3.412 Weight −4.87 1.339
There is strong evidence to indicate weight of cars is a significant predictor of fuel efficiency since the p-value from the hypothesis test is quite small (<0.001).
What does the 95% represent in a 95% confidence interval?
The 95% represents the proportion of intervals that would contain the parameter (for example, the population mean or population proportion) if a large number of different samples is obtained.
Doug wondered how students felt about a proposed increase in student fees. He randomly sampled 100 freshmen, 100 sophomores, 100 juniors, and 100 seniors at his university. He asked each whether or not they supported the fee increase. Assuming all conditions are satisfied, which test should Doug use to test the hypothesis that the distribution of support for the fees is the same for all four classes?
The chi-square test of homogeneity
Brett is a huge sports fan. He wondered if there was a relationship between someone's favorite sport and where they lived. He randomly sampled 500 American sports fans and asked each what their favorite sport was (football, baseball, basketball, hockey, or other) and what part of the country they lived in (Northeast, Southeast, Midwest, Rocky Mountains, Pacific Coast). Assuming all conditions are satisfied, which of the following tests should Brett use to test his hypothesis?
The chi-square test of independence
Brett is a huge sports fan. He hypothesized half of sports fans liked football the best, one-quarter liked baseball the best, 15% liked basketball the best, and 5% liked hockey the best, and the rest liked some other sport the best. He surveyed 100 sports fans and asked what sport they liked the best. Assuming all conditions are satisfied, which of the following tests should Brett use to test his hypothesis?
The goodness-of-fit chi-square test
Suppose increasing the sample size will not change the sample mean or the standard deviation. What will happen to the p-value by increasing the sample size?
The p-value will decrease.
If a z-score is equal to zero, which of the follow must be true?
The x-value must be equal to the mean of the distribution.
If the area to the left of a z-score is equal to 0.5, what must be true?
The z-score must be equal to zero.
If the area to the left of a z-score is less than 0.5, what must be true?
The z-score must be negative.
Suppose you want to calculate the z-score for your height. How will the z-scores compare if you use your height in inches verses centimeters?
The z-scores will be the same regardless of the unit used for your height because z-scores are unitless.
Determine if the following statement is true or false. If it is false, explain why. A p-value is the number of standard deviations an observation is from the mean.
This statement is false. The definition given is for the z-statistic. A p-value is the probability of observing a value of a statistic or a value that is more unusual just by chance if the null hypothesis is true.
Determine if the following statement is true or false. If it is false, explain why. A p-value is the probability that the null hypothesis is true.
This statement is false. The null hypothesis will either be true or it won't be true - there is no probability associated with this fact. A p-value is the probability of observing a sample mean (for example) that we did or something more unusual just by chance if the null hypothesis is true.
Determine if the following statement is true or false. If it is false, explain why. A p-value is the probability of accepting the null hypothesis.
This statement is false. We never accept the null hypothesis no matter what the p-value is. A p-value is the probability of observing a sample mean (for example) that we did or something more unusual just by chance if the null hypothesis is true.
Explain how to find the mean of a discrete random variable.
To find the mean of a random variable, multiply each value of the random variable by its probability and then add those products.
In regression, a residual can be negative. Is this statement true or false?
True
True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.
True. A relative frequency histogram will have a different scale on the y-axis but the same shape as a regular histogram.
In a normal distribution, approximately 95% of the area under the normal curve is within how many standard deviation(s) of the mean?
Two
A research organization keeps track of what citizens think is the most important problem facing the country today. They randomly sampled a number of people in 2003 and again in 2009 using a different random sample of people in 2009 than in 2003 and asked them to choose the most important problem facing the country today from the following choices, war, economy, health care, or other. Which of the following is the correct test to use to determine if the distribution of "problem facing this country today" is different between the two different years?
Use a chi-square test of homogeneity.
The characteristics of the individuals within the population
Variable
Many drivers of cars that can run on regular gas actually buy premium in the belief that they will get better gas mileage. To test that belief, 10 cars in a company fleet were used. All the cars run on regular gas. Each car was filled first with either regular or premium gasoline, decided by a coin toss. The mileage for that tank of gas was recorded. Then the car was filled with the other type of gasoline and the mileage for that tank of gas was recorded. The difference in gas mileage between the two types of gasoline (premium−regular) for the 10 cars was recorded. A 95% confidence interval was constructed using the paired t-methods: (0.5,3.5) mpg. Which of the following is the correct interpretation of this confidence interval?
We are 95% confident that average increase in gas mileage when using premium rather than regular gas is between 0.5 and 3.5 miles per gallon.
Ronnie randomly sampled 80 college students, 50 living a dorm and the other 30 living in an apartment. She asked each how much they spent on food and beverages (non-alcoholic) within the last 7 days. A 95% confidence interval for the difference in the mean amount spent on food and drinks over the past 7 days between students living in a dorm and students living in an apartment (dorm−apartment) is (−$25.80,−$11.20). Which of the following is a correct interpretation of this confidence interval?
We are 95% confident that college students living in dorms spent between $11.20 and $25.80 less on food and drinks over the past 7 days, on average, than college students living in apartments.
Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained the following 95% confidence interval for the proportion of all adults in the city rooting for North High, (0.52,0.68). Interpret this confidence interval.
We are 95% sure that between 52% and 68% of all adults in this city will root for North High School.
A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a 99% confidence interval of (17.3,22.5) hours/week. In the context of the problem, which of the following interpretations is correct?
We are 99% sure that the average amount of time spent studying among graduate students at this student's school is between 17.3 and 22.5 hours per week.
Suppose that in a certain community, the probability of a randomly selected individual having red hair is 0.08 and the probability of a randomly selected individual being left-handed is 0.15. What additional information would be needed to find the probability of randomly selecting an individual in this community who has red hair or is left-handed?
We would need to know the percentage of individuals in the community who have red hair and are left-handed.
In a chi-square test, when would the null hypothesis be true?
When all observed counts are the same as their expected counts
When will a chi-square statistic be 0?
When all observed counts are the same as their expected counts
Identify when the interquartile range is better than the standard deviation as a measure of dispersion and explain its advantage.
When the distribution is skewed left or right or contains some extreme observations, then the interquartile range is preferred since it is resistant.
When are conclusions said to be "statistically significant"?
When the p-value is less than a given significance level
Tammie wondered how her friends felt about their cell phone service. She randomly selected 10 of her friends who used company A and another 10 of her friends who used company B and asked if they felt their service was excellent, good, fair, or poor. Why should Tammie not use the chi-square test?
With such small sample sizes, there is no way for all the expected counts to be at least 5.
A simple random sample of 1500 birth weights was taken from all birth weights last year. The average birth weight from this sample was 3433 grams. Researchers knew that the standard deviation of all birth weights was 495 grams. Assuming all birth weights in the sample are independent of each other, are the other conditions satisfied to use the one-sample z-methods to construct a confidence interval?
Yes. The sample is representative of the population of all birth weights since a random sample was taken and the distribution of sample means will be approximately normal since the sample size is more than 30.
Suppose a student earns a 75 on his statistics exam, and his grade has a z-score of 1.5. Since the class did not perform well on the exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the student's z-score?
Your z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.
Rejecting the null hypothesis when the null hypothesis is true is called _____________.
a Type I Error.
Typically, the direction (>,<, or ≠) used in the _______ hypothesis is determined from the question of interest.
alternative
Elmo likes music. He wondered if listening to music while studying will improve scores on an exam. Fifty students who were to take the midterm in a week agreed to be part of a study. Half were randomly assigned to listen to classical music while studying for the exam. The other half were told not to listen to any music while studying for the exam. A hypothesis test is to be performed to determine if the average scores of those listening to music while studying for the exam were higher than those who did not listen to any music while studying for the exam. Which of the following hypothesis tests should be used?
a two-sample t-test
When a statistic consistently either underestimates or overestimates a population parameter, it is called _____________.
biased.
Before using the normal model to represent a data set, first check that the shape of the data's distribution is what shape?
both symmetric and unimodal
A two-sided test is performed when we are interested in deviations _____________ from the hypothesized value.
either greater than or less than
A(n) _______ is any collection of outcomes from a probability experiment.
event
In a television advertisement, a company called "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this claim, an independent company, called "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one month, each participant was weighed again. The percent of weight lost was recorded for each person, where negative values indicated a weight gain. What type of study was performed?
experiment
Two events E and F are __________ if the occurrence of event E in a probability experiment does not affect the probability of event F.
independent
The sample standard deviation better estimates the population standard deviation for _______ sample sizes.
larger
The _________________ is/are the entire group of individuals or items being studied.
population
Standardizing data into z-scores is just shifting them by the _______ and rescaling them by the _______.
mean; standard deviation
The following stem-and-leaf plot shows the daily high temperature in a town on April 1st for twenty-four random years. Which measures of center and spread are most appropriate for this data? 5 1 1 2 2 3 4 6 6 6 7 8 6 0 0 1 2 4 4 9 7 2 3 6 8 1 2 9 0
median and interquartile range
The only measure of center that can be found for both quantitative and qualitative data is the ______________.
mode
A frequency distribution lists the _________ of occurrences of each category of data, while a relative frequency distribution lists the _________ of occurrences of each category of data.
number; proportion
Do people walk faster in an airport when they are departing (getting on a plane) or after they have arrived (getting off a plane)? An interested passenger watched a random sample of people departing and a random sample of people arriving and measured the walking speed (in feet per minute) of each. What type of study design is being performed?
observation study
Many people believe that students gain weight as freshmen in college. To determine if this is true, a student randomly sampled 100 freshmen. Each was weighed when college started in the fall and again when they left for home after the spring term. Should a paired t-test or a two-sample t-test be used to determine if students weigh more at the end of their freshman year compared to the beginning of their freshman year, on average?
paired t-test
A _________________ is a numerical measurement describing some characteristic of a population.
parameter
A ________________ variable classifies individuals based on some attribute or characteristic.
qualitative
A ________________ variable counts or measures something and has numeric values.
quantitative
A(n) __________ is a numerical measure of the outcome of a probability experiment.
random variable
In a boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left whisker, the distribution is skewed
right
The claim being assessed in a hypothesis test is called _____________.
the null hypothesis.
What does the standard error of the distribution of sample means estimate?
the standard deviation of the distribution of sample means
The simple linear regression model is of the form A=B+C(xi)+D. What does A represent in the model?
yi
If the null hypothesis is true, what will the chi-square statistic equal?
zero
The standard normal probability distribution has a mean of _______ and a standard deviation of _______.
zero; one
The expression zα denotes the z-score with an area of _______ to its right.
α
The expression zα/2 denotes the z-score with an area of _______ to its right.
α/2
The simple linear regression model is of the form A=B+C(xi)+D. What does B represent in the model?
β0
The simple linear regression model is of the form A=B+C(xi)+D. What does D represent in the model?
εi
Professional baseball players have a mean salary of $3.340 million (as of last year's opening day). Of course, the salaries for baseball players vary. If you had to guess, which of the following seems most reasonable for the IQR of professional baseball player salaries?
$1,500,000
Brett is a huge sports fan. He hypothesized half of sports fans liked football the best, 25% liked baseball the best, 15% liked basketball the best, and 5% liked hockey the best, and the rest liked some other sport the best. He surveyed 500 sports fans and asked what sport they liked the best. Which of the following is the way to calculate the number of these 500 sports fans expected to say that basketball is their favorite sport if the null hypothesis is true?
(500)(0.15)
A nutritionist wants to estimate the difference between the percentage of men and women who have high cholesterol. What sample size should be obtained if she wishes the estimate to be within 2 percentage points with 90% confidence, assuming the following? (a) She uses the estimates of 18.8% male and 20.5% female from the National Center for Health Statistics. (b) She does not use any prior estimates.
(a) n=n1=n2=2135 (b) n=n1=n2=3382
A fire insurance company wants to examine the relationship between the amount of fire damage in major residential fires to the distance between the burning house and the nearest fire station. A sample of 15 recent fires in a particular city was taken. The amount of damage to the house and the distance from the burning house to the nearest fire station were recorded. Part of the output from a simple linear regression analysis is given below. Which of the following is a correct interpretation of R-square? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S = 9.8101 R-Sq = 26.0%
26% of the variation in the amount of damage to a house is explained by a simple linear regression with the distance of the burning house from the nearest fire station as the explanatory variable.
Concern over the weather associated with El Nino has increased interest in the possibility that the climate on earth is getting warmer. The most common theory relates an increase in atmospheric levels of carbon dioxide (CO2), a greenhouse gas, to increases in temperature. A regression analysis of the mean annual CO2 concentration (in parts per million) in the atmosphere at the top of Mauna Loa in Hawaii and the mean annual air temperature (in degrees Celsius) over both land and sea across the globe for 37 years was performed. Assume all conditions are met for simple linear regression inference. What percent of the variation in average annual air temperatures is explained by the regression analysis with annual CO2 levels over Mauna Loa as the explanatory variable? Predictor Coef SE Coef T P Constant 16.301 6.015 distance 3.554 1.663 S = 9.8101 R-Sq = 26.0%
33.4%
Which of the following statements is not equivalent to the others?
35% of individuals who have never married are male.
According to the Empirical Rule, 95% of the area under the normal curve is between μ−2σ and μ+2σ. What percent of the area under the normal curve is between μ and μ+2σ?
47.5%
A student was wondering if students at her university arrived on campus each day the same way as another university. At the other university, 60% drove, 30% biked or walked, and the other 10% arrived using other means of transportation. The student randomly sampled 150 students one afternoon at her university and asked how they arrived at campus that day. Which hypothesis test should the student use to determine if students at her university arrive to campus in the same proportion as the other university?
Chi-square goodness of fit test
A voter was interested in comparing the proportion in favor of national health care between people who say they are Republicans, Democrats, and Independents. From each party, she randomly selected 50 people registered as a member of that party in her county and asked whether or not they were in favor of a national health care program. Which of the following hypothesis tests should this voter use?
Chi-square test of homogeneity
Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each day, which measure of central tendency better describes the typical number of text messages per day? 21 22 24 26 26 29 32 32 33 88
Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.
Which is greater in a normal distribution, the mean or the median? Explain.
Neither; the mean and median are always equal in a normal distribution, since it is symmetric.
Suppose a systematic random sample of amusement park visitors is taken by selecting the 9th visitor to walk through the gates on a given day and every 15th visitor after that until 500 visitors have been surveyed. Would this constitute a simple random sample? Why or why not?
No, because every group of 500 visitors does not have the same chance of being selected for the sample.
A student randomly sampled 15 senior male students and 15 senior female students and found their grade point average through their junior year. She obtained the accompanying scatterplot. Can the correlation coefficient be used to describe the strength of the relationship between these two variables? Image: (Concept hw 4, question 23)
No, because sex is a categorical variable.
Determine whether the distribution is a discrete probability distribution. If not, state why. x 0 10 20 30 40 50 P(x) 0.2 0.2 0.2 0.2 0.2 0.2
No, because the probabilities do not sum to 1.
The probability that a randomly selected adult in a particular community is a smoker is 20%. The probability that a randomly selected adult in the community is a smoker, given that the adult earns more than $75,000 per year, is 10%. Are the events "is a smoker" and "earns more than $75,000 per year" independent? Explain.
No, because the probability of smoking is different for people who earn over $75,000 per year, the events are not independent.
Suppose a fair die is rolled ten times and the result is recorded each time. Does this constitute a binomial experiment? Why or why not?
No, because there are more than two outcomes for each trial.
Cards are drawn with replacement from a standard deck until a king is drawn. Does this constitute a binomial experiment? Why or why not?
No, because there is not a fixed number of trials.
Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph line ends are above the x axis) (Concept hw 7, question 21)
No, because this graph increases as the value of x becomes very large.
Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph looks symmetrical but the ends of the line go below the x axis) (Concept hw 7, question 20)
No, because this graph is not always above the x-axis.
Days before a presidential election, a nationwide random sample of registered voters was taken. Based on this random sample, it was reported that "52% of registered voters plan on voting for Robert Smith with a margin of error of ±3%." The margin of error was based on a 95% confidence level. Can we say with 95% confidence that Robert Smith will win the election if he needs a simple majority of votes to win?
No, because 50% is within the bounds of the confidence interval.
Can the variance of a data set ever be negative? Explain.
No; since the variance is based on the squared deviations from the mean and N, it cannot be negative.
Determine whether the graph can represent a Normal density function or explain why it cannot. (Graph is skewed to the right)
No; this graph is not symmetric.
Which two graphs allow the reader to retrieve the original list of data?
Stem-and-leaf plots and dotplots