Final Exam #1
A researcher wished to compare the average amount of time spent in extracurricular activities by high school students in a suburban school district with that in a school district of a large city. The researcher obtained simple random samples of students from both district types, and a table summarizing the results is below: Sample Mean Sample s.d. Sample size Suburban 6 hours 3 hours 60 students Large City 4 hours 2 hours 40 students Let μ 1 and μ 2 represent the mean amount of time spent in extracurricular activities per week by the populations of all high school students in the suburban and city school districts, respectively. Which of the following is a 90% confidence interval for μ 1 − μ 2 ?
(1.17 hours, 2.83 hours). Clearly, this a confidence interval for a difference of two means. The key question when working with two means is: are the samples dependent or independent? If they are dependent, then we use a T-Interval (Paired t in SALT) on the differenced data. If they are independent, then we use a 2-SampTInterval (Two sample t in SALT). In dependent samples, the individuals in one group dictate/influence the individuals in the other group - the samples depend on each other. The clearest case of dependent samples is when the same individual provides both a before and after measurement - since the individual must be the same in each sample for the difference to be meaningful, this is a dependent sample. In independent samples, the samples do not depend on each other. Are the samples in this question dependent or independent??
A least-squares regression line was fitted to the weights versus age of a group of many young children. The equation of the line is y^=16.60+0.65t, where y^ is the predicted weight in pounds and t is the age of the child in months. A 20-month-old child in this group has an actual weight of 25 pounds. Which of the following is the residual weight, in pounds, for this child?
-4.60 residual = actual value of response variable for an observation - predicted/expected value of the response variable for the same observation. In notation, we have: residual = y-y^ The response variable is represented on the y-axis of the scatterplot. To obtain the predicted/expected value of the response variable for an observation, we utilize the least-squares regression line. This line is of the form y^= ax + b, where a is the slope of the line and b is the y-intercept. Insert the actual value of the explanatory variable, x, for an observation into the least-squares regression equation in order to obtain the predicted value of the response for that observation. The actual value of y for a specific observation should be stated in the problem or given in the data set.
The weights of packages delivered by a parcel service are normally distributed with mean 10.5 pounds and standard deviation 4.0 pounds. What is the probability that a randomly selected package will weigh exactly 12.0 pounds?
0.00 The probability that you get any single value from a continuous distribution is 0, since there is no area that can be calculated at the point.
You send a survey to a simple random sample of 150 people at the university. The typical response rate to surveys has been 50%. Assuming this rate still holds and the actions of each person related to the survey are independent of the others, what is the probability that 65 or fewer people will reply?
0.060 Remember, the binomial is a *special* discrete distribution in that it has calculator shortcuts that can assist with probabilities and it has some simple formula shortcuts for its mean (μ =np), variance (σ^2 = np(1-p)), and standard deviation (σ = square root of np(1-p) The hardest part will be recognizing the binomial!! Watch for a fixed number of trials where you are counting the number of successes.
A diagnostic test for the presence of a virus has a 0.005 probability of producing a false positive (test indicates virus is present, when, in fact, it is not). Test results for different individuals are independent. If the 140 employees of a medical clinic are tested and all 140 are known to be free of the virus, what is the probability that at least one false positive will occur?
0.5043 There are two approaches to this question that are equally valid: 1) Use probability theory. The phrase "at least" can point us to the complement. The complement of an event is everything other than the event. If the event we are trying to find is that there is at least one false positive (which means 1, 2, 3, ..., or 140 false positives), then the complement is that there are no false positives. We can find the probability of no false positives and then subtract it from 1 in order to find the probability that there is at least 1 false positive. 2) Use the binomial distribution. Let X = the number of people out of 140 that have a false positive. Then, X is a counting distribution - we are counting the number of successes (false positives) in a fixed number of trials (140 employees). The variable, X, has a binomial distribution with n = 140 and p= 0.005. We can translate the question to P(X≥1) and then solve using technology.
Debbie and Eric are anxiously awaiting word on whether they have gotten into medical school. Debbie guesses that her probability of getting in is 0.60 and that Eric's is 0.50. Assuming that the outcomes for Debbie and Eric are independent, what is the probability that at least one of them gets into medical school?
0.8 At least one of the two get in if Debbie gets in or Eric gets in (or both). Thus, this probability can be represented with an "or". Let E = {Debbie gets into medical school} and F = {Eric gets into medical school}. P(E or F) = P(E) + P(F) - P(E and F) We are given that P(E) = 0.60 and P(F) = 0.50. How do we obtain the "and" probability? There are two rules for "and" probability calculations - the general multiplication rule (P(E and F) = P(E) x P(F|E)) and the multiplication rule for independent events (P(E and F) = P(E) x P(F)). We are told that these events are independent, so we are fortunate that we can use the easier multiplication rule for independent events. See if that can help you arrive at the correct answer here. OR The phrase "at least" can point us to the complement. The complement of an event is everything other than the event. If the event we are trying to find is that Debbie or Eric get into medical school, then the complement is that neither of them get into medical school. We could find the probability that Debbie does not get in and Eric does not get in. After finding this, we would subtract it from 1 in order to find the probability that at least 1 of them gets in.
A dean of the College of Business (CoB) wanted to know whether the mean GPA of business majors was different from the mean GPA of students in the College of Arts and Sciences (CAS). The dean surveyed a random sample of 20 students in the CoB and a random sample of 20 students in the CAS. Boxplots for the samples indicate no outliers. Normal probability plots for these samples are below: Which of the following is the appropriate inferential technique for the dean's question?
2-sample T-Test The key question when working with two means is: are the samples dependent or independent? If they are dependent, then we use a T-Interval (Paired t in SALT) on the differenced data. If they are independent, then we use a 2-SampTTest (Two sample t in SALT). In dependent samples, the individuals in one group dictate/influence the individuals in the other group - the samples depend on each other. The clearest case of dependent samples is when the same individual provides both a before and after measurement - since the individual must be the same in each sample for the difference to be meaningful, this is a dependent sample. In independent samples, the samples do not depend on each other.
The following table gives the grams of fat and number of calories per 100 grams for several fast food products: Based on the least-squares regression line, what is the predicted calories per 100 grams of a fast food item that has 14.25 grams of fat per 100 grams?
258.4 To obtain the predicted/expected value of the response variable for an observation, we utilize the least-squares regression line. This line is of the form y^=ax+b, where a is the slope of the line and b is the y-intercept. Plug the actual value of the explanatory variable (x) into the least-squares regression equation, and that will yield the predicted value of the response variable (y^).
Suppose 74.9% of all students on campus have a job. From a sample of 84 students on campus, 64.3% of the students sampled have a job. Which of the following is true?
74.9% is a parameter and 64.3% is a statistic. A parameter describes a population; a statistic describes a sample
Do people feel hungrier after sampling a healthy food? A researcher investigating this question assigned volunteers to one of three groups: a group given a new health bar ("Healthy"), a group given the same health bar but told it was a candy bar ("Candy"), and a group given no snack at all ("No Snack"). All volunteers were asked to rate their hunger level on a 7-point scale (1-not at all hungry, 7-very hungry). The output below is from a hypothesis test performed in Minitab, a statistical software: Using a 5% significance level, which of the following is the correct conclusion of the hypothesis test?
At least one group had a different mean hunger rating.
The histogram below shows the number of miles per gallon achieved on the highway for a random sample of compact cars from the model year 2005. What is the sample size?
Between 40 and 79 "Frequency" means count. We see frequency labeled on the y-axis, so we can use that to count/estimate the number of cars in each class. For example, there appear to be approximately 4 cars that get between 16.5 and 19.5 mpg. We should add up the number of cars in each class to obtain the total sample size, or the count of all the cars.
In a study of students at Lewis & Clark Community College, a professor compared chosen seat location (front, middle, back) of students in mathematics courses to their overall GPA. The Minitab output analyzing the data is given below. SF25.PNG The professor tested to see whether the mean GPA is the same for the three locations or not. Which of the following is the null hypothesis for the appropriate test? (a) The variable follows the given distribution. (b) The variables are independent. (c) The variables are dependent. (d) μ 1 = μ 2 = μ 3
D
Suppose a random sample of size 50 is drawn from a population distribution that is skewed left with a mean of 0.67 and a standard deviation of 0.14. Which of the following three graphs shows the approximate sampling distribution of sample means for all samples of size 50?
Plot A. Recall the following details surrounding the sampling distribution of sample means: Shape: approx. Normal if (1) n≥30 or (2) the population is normally distributed. Center: μx¯=μx Spread: σx¯=σxn The spread gets smaller and smaller as the sample size increases. The higher our sample size, the closer our results should be to the true average, or mean.
When Ty Cobb had a batting average of 0.420 in 1911, the mean batting average was 0.266 with a standard deviation of 0.0371. When Ted Williams had a batting average of 0.406 in 1941, the mean batting average was 0.267 with a standard deviation of 0.0326. Assuming that batting average are always approximately normal, which hitter was better relative to his peers?
Ted Williams was better. The phrase "...relative to..." should point you to the calculation of z-scores. A z-score is defined as: z= x−μ/σ or z= x−x¯/s The z-score represents the number of standard deviations that an observation is from the mean.
Five years ago, the mean household expenditure for energy was $1,493. An economist believes that the average has increased since then. In a simple random sample of 35 households, the economist found a mean expenditure for energy of $1,618 with a standard deviation of $321. The P-value for the appropriate hypothesis test was 0.0275. What is the correct interpretation of the P-value?
There is a 0.0275 probability of obtaining a sample mean of $1,618 or higher, assuming the population mean is $1,493. The P-value tells us the probability, assuming the null hypothesis is true, of obtaining a sample that is as extreme as or more extreme than the sample obtained.
Emily has asthma, a condition where fluid builds in the air ducts of the lungs. Each day she tests her lung capacity by breathing into a device that measures her lung capacity, which varies daily. If daily readings for a week result in an average of 140 ml with a standard deviation of 2 ml, which of the following represents the 99% confidence interval for her mean lung capacity assuming that the population of measurements is normal?
(137.2, 142.8) We can calculate confidence intervals using technology (TI, Excel, SALT). One sample mean => T-Interval One sample proportion => 1-PropZInterval Two means (dependent samples) => T-Interval on the differenced data Two means (independent samples) => 2-SampTInterval Two proportions => 2-PropZInterval
Real estate exam scores are normally distributed with a mean of 430 and a standard deviation of 20. What proportion of the scores fall between 398 and 406?
0.060
A bag of assorted candy contains the following proportions of six candies: What is the probability of picking a Tootsie Pop?
0.10 The probabilities for a legitimate discrete probability distribution must sum to 1.
The following distribution represents the number of YumYum bars purchased per day from a single vending machine: Yum Yum bars 0,1,2,3,4,5 Probability 0.06, 0.58, 0.22, 0.10, 0.03, 0.01. What is the probability that more than 2 YumYum bars are purchased on a randomly selected day?
0.14
Tiffany is giving presents to her mother and father for a special occasion. Her mother is usually very pleased with gifts; liking them 80% of the time. Her father is harder to please; liking his gifts only 30% of the time. If her parents reactions are independent, what is the probability that exactly one (not both) of Tiffany's parents will like their gift?
0.62 Independence is a valuable property in probability calculations, as it allows us to calculate the intersection (the "and" probability) of two events by multiplying the probabilities of the two events together. If E and F are independent events, then:P(E and F)=P(E)×P(F)
10% of patients who try a particular medication experience side effects. Let X represent the number of patients from a sample of size 15 that experience side effects while on a particular medication. What is the standard deviation of X?
1.16 Remember, the binomial is a *special* discrete distribution in that it has calculator shortcuts that can assist with probabilities and it has some simple formula shortcuts for its mean (μ =np), variance (σ^2 = np(1-p)), and standard deviation (σ = square root of np(1-p) The hardest part will be recognizing the binomial!! Watch for a fixed number of trials where you are counting the number of successes.
Suppose that you wish to conduct a study to determine the proportion of college students who have hypertension (high blood pressure). What is the minimum sample size that would be needed for the estimate to be within 3 percentage points of the population proportion with 95% confidence, assuming no prior estimate is available?
1068 NOTE: for 95% confidence, the value of the z-score in the equations below is always 1.96. The Peck textbook hard codes the 1.96 into the formula, so you may not be accustomed to seeing a z-score in that formula.*** Sample size formulas: One mean n≥( s⋅zα/2 all / E)^2 One proportion with prior estimate: n≥p^(1−p^)(zα/2 all / E)^2 without prior estimate: n≥ 0.25 (zα/2 all / E)^2 Two proportions with prior estimates: n≥(p1^(1−p^1)+p2^(1−p^2))(zα/2 all / E)^2 without prior estimates: n≥0.50(zα/2 all / E)^2 To find the z-score in these formulas, you can use the invNorm function on the TI calculator. We place the confidence level in the middle of the distribution. That leaves some remaining area that is split evenly between the two tails of the distribution. Label all of these areas so that you can use invNorm consistently with your calculator.
In years past, the mean household expenditure for energy was $1,493. An economist believes that the average expenditure for energy is different now. To test her belief, she takes a random sample of 35 households and produces the following output (some pieces of which have been replaced with question marks): What is the margin of error for the 95% confidence interval?
110.3 A confidence interval can be represented as: point estimate ± margin of error The point estimate is the starting point of the interval, and it can be found right in the middle of the interval. If the interval is for a mean, then the point estimate is a sample mean. If the interval is for a proportion, then the point estimate is a sample proportion. The margin of error is added and subtracted to the point estimate in order to obtain the interval. So, the interval is two margins of error wide. A confidence interval is often presented with two numbers in parenthesis, which are calculated as (point estimate - margin of error, point estimate + margin of error). If the point estimate and either the lower bound or upper bound of the confidence interval are known, then it should be possible to determine the margin of error. Confidence interval: (L, U) => point estimate = L+U/2 => margin of error = U-L/2 = point estimate -L = U - point estimate
The following data represent the cost of a one night stay in Brand X and Brand Y hotels for a random sample of 10 cities. What is the lower bound of a 90% confidence interval for the mean difference (Brand X - Brand Y) in price?
33.5 The key question when working with two means is: are the samples dependent or independent? If they are dependent, then we use a T-Interval (Paired t in SALT) on the differenced data. If they are independent, then we use a 2-SampTTest (Two sample t in SALT). In dependent samples, the individuals in one group dictate/influence the individuals in the other group - the samples depend on each other. The clearest case of dependent samples is when the same individual provides both a before and after measurement - since the individual must be the same in each sample for the difference to be meaningful, this is a dependent sample. In independent samples, the samples do not depend on each other.
Clay is a star linebacker on a professional football team. He weighed himself three times a day for 4 days, and the sample of weights produced a mean of 327.9 pounds and a standard deviation of 10.5 pounds. A normal probability plot suggests the population of weights is approximately normal. If Clay estimates his weight using a 90% confidence interval, what is the margin of error of the confidence interval?
5.44
The magnitude of earthquakes in a region is normally distributed with mean 5.1 and standard deviation 0.4. What magnitude would place an earthquake in the top 8% of earthquakes?
5.66
Radon measurements at a particular location follow a normal distribution with a mean of 4.1 picocuries (pCi) and a standard deviation of 0.2 pCi. What proportion of all radon measurements taken at this location are below 4.15 pCi's?
59.8% For the normal distribution, we have two types of questions: Find the area/probability/proportion => for the TI, these are normalcdf questions. Find the z-score/percentile/weight/time/x-axis value => for the TI, these are invNorm questions.
An economist believes that the percentage of urban households with internet access is greater than the percentage of rural households with internet access. They obtain a simple random sample of 800 urban households and find that 338 of them have internet access. They also obtain a random sample of 750 rural households and find that 292 of them have internet access. Which of the following represents the P-value for the appropriate hypothesis testing using this sample data?
9.2% "Percentage" is interchangeable with proportion. In this question, then, we are comparing two proportions. Our only hypothesis test for two proportions is the 2-PropZTest (Two sample proportion in SALT). Pay attention to how you enter the samples, as it will guide the direction in your alternative hypothesis.
A study is conducted to better understand how the age of a car impacts its price. Treating age as the predictor/explanatory variable, data on 10 cars of a specific model yield a coefficient of determination of: R^2 =0.64 Which of the following is an appropriate interpretation of this value?
Approximately 64% of the variation in price is explained by the least-squares regression line. There are two statistics that are regular reported for least-squares regression: 1. The linear correlation coefficient (r). This value describes the direction and strength of the linear relationship between two variables. The values of r range from -1 to +1. The closer this value is to -1 or +1, the stronger the relationship between the variables. 2. The coefficient of determination (R^2). The values of R^2 range from 0 to This value reveals the percentage of total variation in the response variable that is explained by the least-squares regression line. It should be noted that R^2 = (r)^2.
An advertisement for a home being sold contains the variables "square footage" and "average monthly utility costs". Which of the following best describes the two variables?
Both variables are numerical. Numerical variables are (1) numbers (2) that provide meaningful results when mathematical operations are performed on them.
Thirteen brands of cigarettes are tested and the level of tar and nicotine are measured. Using tar as the predictor/explanatory variable, a regression analysis is performed and yields the results given below. Regression analysis: nicotine(mg) versus tar (mg) S equals 0.112120 R squared equals 93.4% are squared adjacent equals 92.8%. What is an appropriate interpretation of the slope of the regression equation?
For each additional 1 mg of tar, the level of nicotine is predicted to increase by 0.0575. The least-squares regression line is of the form y^= ax + b, where a is the slope of the line and b is the y-intercept. The slope tells us the predicted change (could be an increase or a decrease) in the response variable for each additional unit of the explanatory variable, x The y-intercept tells us the predicted value of the response variable when the explanatory variable is 0.
The weather forecast predicts that there is a 30% probability of rain. Which of the following options best reflects the meaning of this probability?
If we looked at all days with this same prediction, it will have rained on about 30% of them.
The Centers for Disease Control and Prevention (CDC) reported that the diastolic blood pressures of adult women in the U.S. are normally distributed with mean 80.5 and standard deviation 9.9. What interval represents the middle 86% of diastolic blood pressures for all adult women in the U.S.?
NOT (69.8,91.2) For the normal distribution, we have two types of questions: Find the area/probability/proportion => for the TI, these are normalcdf questions. Find the z-score/percentile/weight/time/x-axis value => for the TI, these are invNorm questions. This question is a bit more challenging. Note that we are looking for the two blood pressures that would separate the middle 86% of blood pressures from the other 14% (100-86) of blood pressures. Given that we are looking for blood pressures, this would be answered with the invNorm on the TI. For the normal distribution, we have two types of questions: Find the area/probability/proportion => for the TI, these are normalcdf questions. Find the z-score/percentile/weight/time/x-axis value => for the TI, these are invNorm questions.
A sociologist wanted to determine the percentage of current U.S. residents that only speak English at home. The 2000 Census Supplementary Survey (CSS) reported this percentage as 82.4%. What is the smallest sample size that should be obtained to have a 95% confidence interval with a margin of error of 2%, assuming the sociologist uses the CSS estimate?
NOT 1692 NOTE: for 95% confidence, the value of the z-score in the equations below is always 1.96. The Peck textbook hard codes the 1.96 into the formula, so you may not be accustomed to seeing a z-score in that formula.*** Sample size formulas: One mean n≥(s∗zα/2E)2 One proportion with prior estimate: n≥p^(1−p^)(zα/2E)2 without prior estimate: n≥0.25(zα/2E)2 Two proportions with prior estimates: n≥(p1^(1−p^1)+p2^(1−p^2))(zα/2E)2 without prior estimates: n≥0.50(zα/2E)2 To find the z-score in these formulas, you can use the invNorm function on the TI calculator. We place the confidence level in the middle of the distribution. That leaves some remaining area that is split evenly between the two tails of the distribution. Label all of these areas so that you can use invNorm consistently with your calculator. ***NOTE: for 95% confidence, the value of the z-score in the equations below is always 1.96. The Peck textbook hard codes the 1.96 into the formula, so you may not be accustomed to seeing a z-score in that formula.*** Sample size formulas: One mean n≥(s∗zα/2E)2 One proportion with prior estimate: n≥p^(1−p^)(zα/2E)2 without prior estimate: n≥0.25(zα/2E)2 Two proportions with prior estimates: n≥(p1^(1−p^1)+p2^(1−p^2))(zα/2E)2 without prior estimates: n≥0.50(zα/2E)2 To find the z-score in these formulas, you can use the invNorm function on the TI calculator. We place the confidence level in the middle of the distribution. That leaves some remaining area that is split evenly between the two tails of the distribution. Label all of these areas so that you can use invNorm consistently with your calculator.
In testing hypotheses, which of the following would be strong evidence against the null hypothesis?
Obtaining data that produces a small P-value. The P-value approach includes comparing the P-value to the significance level. If the P-value is less than the significance level, then the decision is to reject the null hypothesis (P−value<α⟹ Reject H0). If the P-value is greater than the significance level, then the decision is to not reject the null hypothesis (P−value>α⟹ Do NOT Reject H0).
Peri and Quentin are both trying to estimate the mean weight of UWEC students' backpacks. Peri takes a random sample of 20 students, weighs their backpacks, and computes the 95% confidence interval. Quentin takes a random sample of 40 students, weighs their backpacks, and computes the 90% confidence interval. Whose confidence interval, Peri's or Quentin's, is narrower?
Quentin's A confidence interval can be represented as: point estimate ± margin of error The width of a confidence interval is determined by the margin of error. There are three items that impact the margin of error: The confidence level (the higher the confidence level, the higher the margin of error - if you want to be more confident that the true population parameter is contained in your interval, then your interval must be wider). The sample size (the higher the sample size, the lower the margin of error - larger sample sizes should get our sample statistic closer to the true parameter, meaning we do not require as much margin for error). The sample standard deviation (the higherthe sample standard deviation, the higherthe margin of error - if there is more variation in our sample, then we want a wider interval to account for it).
Machines used to fill plastic bottles with a soft drink vary in the amount put into each bottle. A particular machine is known to have a mean of 2.0 liters and a standard deviation of 0.05 liters. A random sample of 45 bottles is taken by a quality control manager and the volume of each bottle is measured. Which of the following options describes the sampling distribution of sample means of size 45?
Shape: approx. normal; Center: 2.0L; Spread: 0.0075L Recall the following details surrounding the sampling distribution of sample means: Shape: approx. Normal if (1) n≥30 or (2) the population is normally distributed. Center: μx¯=μx Spread: σx¯=σx/ square root n The spread gets smaller and smaller as the sample size increases. The higher our sample size, the closer our results should be to the true average, or mean. Recall the following details surrounding the sampling distribution of sample means: Shape: approx. Normal if (1) n≥30 or (2) the population is normally distributed. Center: μx¯=μx Spread: σx¯=σx/ square root n The spread gets smaller and smaller as the sample size increases. The higher our sample size, the closer our results should be to the true average, or mean.
To determine the flow characteristics of oil through a valve, the inlet oil temperature is measured in degrees Fahrenheit. A researcher wishes to determine if the mean oil inlet temperature is less than 98 degrees. Twelve randomly selected readings were taken and the sample mean and sample standard deviation were 93.333 and 3.393, respectively. A normal probability plot based on the sample shows it is safe to assume the population distribution is approximately normal, and there are no outliers in the boxplot. Which of the following options regarding the appropriate hypothesis test is correct?
The alternative hypothesis is H1:μ<98 and the appropriate test statistic is t with 11 degrees of freedom. The direction of the alternative can be found by looking for the researcher's claim/belief/interest in the question. Here, the researcher "wishes to determine" if the mean temperature is less than 98 degrees - the alternative hypothesis should be consistent with this. A hypothesis test for one mean is a T-Test (One sample t in SALT). A hypothesis test for one proportion is a 1-PropZTest (One sample proportion in SALT). Which test should you perform? For a hint, look to the sample. If you are provided a sample mean, then you are going to test the population mean with a T-Test. If you are provided a sample proportion or the results are described as x out of n (e.g., 9 out of 10 dentists), then you are going to test the population proportion with the 1-PropZTest. For the t-distribution, the degrees of freedom are equal to the sample size minus one (d.f. = n-1).
Experiments on learning in animals sometimes measure how long it takes mice to find their way through a maze. The mean time is 18 seconds for one particular maze. A researcher thinks that a loud noise will cause the mice to complete the maze faster. She measures how long each of 10 mice takes with a noise as stimulus. The sample mean is 16.5 seconds. What is the alternative hypothesis in words?
The mean time to complete the maze is less than 18 seconds.
Least-squares regression is performed, treating the number of absences as the predictor variable and the final grade in the class as the response variable. The resulting least-squares regression line is: y^=−2.8x+88.7 Which of the following is an appropriate interpretation of the y-intercept?
The predicted final grade of students who miss 0 classes is 88.7% The least-squares regression line is of the form y^=ax+b, where a is the slope of the line and b is the y-intercept. The slope tells us the predicted change (could be an increase or a decrease) in the response variable for each additional unit of the explanatory variable, x. The y-intercept tells us the predicted value of the response variable when the explanatory variable is 0.
Based on the scatter diagram below, which of the answer choices provides the most appropriate interpretation of the linear correlation coefficient?
The variables, x and y, have a fairly strong, positive linear relationship.
A random sample of golf scores gives the following summary statistics: n=20 x¯=84.5 s=11.4 Min.Lower Quart. (Q1)MedianUpper Quart. (Q3)Max.56 78 86 91 112 What can be said about the number of outliers?
There are at least two outliers
A medical researcher is interested in the mean hemoglobin reading (in grams per deciliter) of surgical patients. She randomly selects 9 patients, obtains data on their hemoglobin reading, and uses a statistical software to produce the following output: The null and alternative hypotheses are displayed in the software output (look for phrase "Test of..."). At a 1% significance level, what is the appropriate conclusion of the researcher's test?
There is evidence that the mean hemoglobin reading of surgical patients is different from 14. A hypothesis test for one mean is a T-Test (One sample t in SALT). A hypothesis test for one proportion is a 1-PropZTest (One sample proportion in SALT). For a hint on which to perform, look to the sample. If you are provided a sample mean, then you are going to test the population mean with a T-Test. If you are provided a sample proportion or the results are described as x out of n (e.g., 9 out of 10 dentists), then you are going to test the population proportion with the 1-PropZTest. For the t-distribution, the degrees of freedom are equal to the sample size minus one (d.f.=n−1).
Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. That is, that the distribution of absences is uniform across the days of the work week. Suppose a sample of 70 absent days was taken and the days absent were distributed as follows: The appropriate hypothesis test yields a test statistic of 0.7143. Is there evidence to suggest that the distribution of employee absences is not uniform? Said differently, is there evidence to suggest the proportion of absences is not the same for all work days?
There is not evidence, at the 10% significance level, to suggest the distribution of employee absences is not uniform. A test for how well the data follows...how well the data fits...how good the fit...it is a χ2 GOODNESS OF FIT test. To carry out this test, you need to make room for three lists on the calculator. The first list should contain the observed frequencies which should be provided in the question. The second list should contain the expected proportions from the null. In the case the null hypothesis is the uniform distribution, all proportions should be equal. If it were binomial, then we would use the binomial distribution to calculate expected proportions. The third list should be the expected frequencies, which are determined by multiplying the proportions in the second list by the total number of observations. By executing this test, you should be able to match the test statistic provided and the P-value will be provided in the calculator output right alongside the test statistic. You'll need to remember that the degrees of freedom for the goodness of fit test are equal to the number of categories minus one (d.f.=# of categories−1).
The results of a recent national survey reported that 70% of Americans recycle at least some of the time. As part of their final project in statistics class, Nayla and Roberto survey 5 random students on campus and ask them if they recycle at least some of the time. They then repeat this experiment 1000 times. The results of their research is as follows: Number who recycle 012345 Frequency to, 25, 138, 306, 359, 170 Nayla and Roberto want to see if there is evidence to support the belief that the random variable X, the number of students out of 5 who recycle at least some of the time, follows the following historical distribution Number who recycle 012345 probability 0.00243, 0.02835, 0.1323, 0.3078, 0.36015, 0.16807 the appropriate hypothesis test yields a test statistic of 0.7539. What is the conclusion for this test using a 5% significance level?
There is not sufficient evidence to suggest that the distribution of X follows the historical distribution. Central to this problem is whether the data suggests that the distribution follows the historical pattern. A test for how well the data follows...how well the data fits...how good the fit...it is a χ 2 GOODNESS OF FIT test. To carry out this test, you need to make room for three lists on the calculator. The first should contain the observed frequencies which are provided by the top table. The second should contain the expected proportions from the null. These come from the second table. The third should be the expected frequencies, which are determined by multiplying the proportions in the second list by the total number of observations. By executing this test, you should be able to match the test statistic provided and the P-value will be provided in the calculator output right alongside the test statistic. Once you have the P-value, compare it to the significance level to draw the appropriate conclusion. P -value < α ⟹ Reject H 0 ⟹ "There is sufficient evidence..." P value greater than a, do not reject, there is not sufficient evidence
A bottling company needs to produce bottles that can hold 12 ounces of liquid for a local beer maker. Periodically, the company gets complaints that their bottles are not holding enough liquid. To test this claim, the bottling company randomly samples 15 bottles and finds the average amount of liquid held by the bottles is 11.90 ounces and the standard deviation is 0.20 ounces. It is safe to assume that the population sampled is normally distributed. Using a 5% significance level, which of the following is the correct conclusion from the appropriate hypothesis test?
There is sufficient evidence to suggest the bottles are not holding enough liquid. To make a conclusion from a hypothesis test, you need a test statistic and/or P-value. Use technology to find these pieces. Here, you will notice there is one sample (of 15 bottles) and you are given the average for the sample - what test should be performed? The P-value approach includes comparing the P-value to the significance level. If the P-value is less than the significance level, then the decision is to reject the null hypothesis ( P − v a l u e < α ⟹ Reject H 0 ) . If the P-value is greater than the significance level, then the decision is to not reject the null hypothesis ( P − v a l u e > α ⟹ Do NOT Reject H 0 ) . Conclusions should follow this general pattern: "There (is/is not) sufficient evidence, at the α level of significance, to suggest (H1 in words)." There is evidence if we reject the null; there is not evidence if we do not reject the null. Hint: "do not reject" means "there is not evidence" - not pairs with not.
The following data represent the number of fatalities involving children under the age of 13 and a motor vehicle, by type of fatality and gender in 2013: Type of fatality passenger, male= 588 female = 568 pedestrian male= 215 female = 114 Bicycle male= 79 female= 14 A hypothesis test is conducted to determine whether type of fatality and gender are independent at the 5% significance level. The test has a test statistic of 55.63. What is the proper conclusion of the test?
Variables are dependent Like the last question, this test for independence yields a chi-square test statistic. You can find the P-value using the χ 2 -cdf (Chi Square Distribution in SALT). For a chi-square test, we always calculate the area to the right of the test statistic. You'll need to remember that the degrees of freedom for the chi-square test for independence are equal to the number of rows minus one times the number of columns minus one ( d . f . = ( # of rows − 1 ) × ( # of columns − 1 ) ) . You do NOT include "TOTAL" rows or "TOTAL" columns in your counts. The alternative way to find the P-value is to conduct the appropriate test on your calculator. Here, you enter the raw data (no total rows or columns) into a matrix on your calculator, and then complete the χ 2 -Test . Once you have the P-value, compare it to the significance level to draw the appropriate conclusion. P -value < α ⟹ Reject H 0 ⟹ "There is sufficient evidence..." P -value > α ⟹ Do NOT Reject H 0 ⟹ "There is NOT sufficient evidence..."
A histogram of the amount of time you have to spend waiting for the bus, in minutes, for 100 total mornings is shown below. Which of the following are plausible values for the mean (x¯) and median (M)?
X= 5.2, M= 5.6 The relationship between the mean and the median can tell us about the shape of a distribution, and vice versa: mean < median⟹skewed LEFT mean > median⟹skewed RIGHT mean ≈ median⟹symmetric
A medical researcher is interested in determining if there is an association between adults over 50 who walk regularly and level of blood pressure. A random sample of 236 adults over 50 is selected and the results are as follows: Are the conditions for the researcher's hypothesis test satisfied?
Yes, because all of the expected frequencies are greater than 5. Pay attention to the presentation of the data here. We have one variable comprising the rows of the table, and another variable comprising the columns, with counts for the combinations of the variables in a table. The only time we see such a presentation is with the χ2-Test. In this question, you are asked about the conditions tied to the chi-square test for independence. Those conditions are: All expected frequencies must be greater than or equal to 1. No more than 20% of the expected frequencies can be less than 5.
For which of the following is a binomial distribution a reasonable probability model?
You buy a single Powerball ticket each week, and x is the number of times you win a prize in one year. The binomial is a discrete counting distribution. Generally, the distribution applies when counting the number of successes in a fixed number of trials. There are four criteria for the binomial distribution: Fixed number of trials (n). Trials are independent (outcome of an individual trial does not impact the outcome of any other trials). Two possible outcomes - success or failure. The probability of success is constant. Watch for a fixed sample size where you are given a single probability of "success". If, in that question, you can convince yourself that you are counting the number of times something happens in a fixed sample size, you probably want to lean into the binomial. Be sure to check all criteria to confidently utilize the binomial distribution.