Stats final MA 180 UAB
Suppose the equation of a least-squares regression line is y^= −3.17−2.4x. What can be said about the y-intercept?
-3.17
Suppose two events E and F are disjoint. What is P(E and F)?
0
The probability of observing a particular value of a continuous random variable _______.
0
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation from the mean?
32%
Can a qualitative variable have values that are numeric? Why or why not?
yes, numeric values can exist without counting or measuring something. (Rate from 1 to 5)
An experiment was performed to look at the reflectivity of different paints used on roads. Four paints were used (call them Paint 1, Paint 2, Paint 3, and Paint 4). Twenty-four sections of roads with similar travel patterns and weather patterns were used. Each type of paint was randomly assigned to one of the 24 sections of road so that each paint type was used on 6 different road sections. The percent of reflectivity after 6 months was determined and recorded. How many degrees of freedom does the F-statistic have in this problem?
3 numerator and 20 denominator
John performed a one-sample z-test for proportions and obtained a p-value of 0.35. John decided to reject the null hypothesis. What is the probability John made a Type I Error?
0.35
After constructing any relative frequency distribution, what should be the sum of the relative frequencies?
1 or 100%
Suppose the probability that a randomly selected man, aged 55-59, will die of cancer during the course of the year is 300/100,000. How would you find the probability that at least 1 man out of 1,000 of this age will die of cancer during the course of the year?
1-(0.997)^1000
What critical value should be used to construct a 90% confidence interval for the population mean when the population standard deviation is known?
1.645
What critical value should be used to construct a 95% confidence interval for the population mean when the population standard deviation is known?
1.96
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation above the mean?
16%
What critical value should be used to construct a 99% confidence interval for the population mean when the population standard deviation is known?
2.58
According to the Empirical Rule, 95% of the area under the normal curve is within two standard deviations of the mean. What percent of the area under the normal curve is more than two standard deviations from the mean?
5%
A random sample of 25 students in an Introductory Statistics course were asked how many hours of sleep they got last night. The average of these 25 students was 5.4 hours with a standard deviation of 1.3 hours. Suppose all conditions are met for inference using the one-sample t-methods. Calculate the upper bound of a 95% confidence interval for the mean number of hours students in the Introductory Statistics course slept last night.
5.9366 hours [stat crunch: Stats > t Tests > One Sample > with summary > input values]
A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. A hypothesis test is performed. Suppose the null hypothesis was rejected with a p-value of 0.0002. The power of the test was 0.90. What type of error could be made and what is the probability of making that error?
A Type I error could be made with a probability of 0.0002.
A confidence interval for a population mean __________.
A confidence interval for a population mean gives possible values the true population mean will be with a certain level of confidence.
critical value is _____________.
A critical value is the number of standard errors (or standard deviations) to move from the mean of a sampling distribution to correspond to a specified level of confidence.
Which distribution shape (skewed left, skewed right, or symmetric) is most likely to result in the mean being substantially smaller than the median?
A distribution that is skewed left will likely have a mean that is smaller than the median since the extreme values in the tail tend to pull the mean to the left.
Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression analysis?
A linear regression analysis relies on a straight line being fit between the points on a scatterplot.
When should a paired t-test be performed instead of a two-sample t-test?
A paired t-test should be performed instead of a two-sample t-test when each observation in one group has a dependence on a particular observation in the other group.
When should a paired t-test be performed?
A paired t-test should be performed when the variable of interest is quantitative, there are two groups being compared, and the samples taken are dependent.
Which of the following statements is true about a normal density curve as sigmaσ increases? A) The curve becomes more spread out. B) The curve becomes less spread out. C) There is no change in the spread of the curve. D) There is not enough information to determine the effect on the spread of the curve.
A) The curve becomes more spread out.
t is hypothesized that 50% of Americans attend church regularly. Which of the following would be an example of making a Type I Error?
A study was conducted that had evidence to reject the null hypothesis. In reality, half of Americans actually do attend church regularly.
What critical value should be used to construct a 90% confidence interval for the population mean when the population standard deviation is known?
z = 1.645
Which of the following statements is not a requirement for a probability density function or state that they all are. A) The curve must be symmetric and centered at zero. B) The total area under the curve must equal one. C) Every point on the curve must be on or above the x-axis. D) These are all requirements for a probability density function
A) The curve must be symmetric and centered at zero.
Identify which of the following statements about the graph of a probability density function is true or state that they are both true or neither are true. A) The graph must always be on or above the horizontal axis. B) The graph must always be to the right of the vertical axis. C) Both of the first two statements are true. D) Neither of the first two statements is true.
A) The graph must always be on or above the horizontal axis.
An investigator conducts an experiment with four treatment groups. The response variable is growth of a plant during the experiment. She performs an F-test. What is the null hypothesis the researcher is testing with the F-test?
All four treatment groups have the same average growth of the plants during the experiment.
Which of the following statements is not true about binomial probability distributions?
As the probability of success increases, the probability distribution for a binomial variable becomes bell shaped.
An investigator conducts an experiment with four treatment groups. The response variable is growth of a plant during the experiment. She performs an F-test. What is the alternative hypothesis the researcher is testing with the F-test?
At least one treatment group has a different average growth of plants during the experiment, but not all four necessarily have different mean growths.
A p-value is the probability _____________.
A p-value is the probability of observing the actual result, a sample mean, for example, or something more unusual just by chance if the null hypothesis is true
Identify which of the following is not a property of the standard normal curve. A) The mean, median, and mode are all equal to zero. B) It has inflection points at μ ± 2σ. C) It is symmetric about its mean of zero, and has standard deviation equal to 1. D) As the value of z increases, the graph approaches, but never equals, zero.
B) It has inflection points at μ ± 2σ.
Which of the following would increase the width of a confidence interval for a population mean? A) Decrease the sample standard deviation. B) Increase the sample size C) Increase the level of confidence D) All of the above
C) Increase the level of confidence
Which of the following is a property of the standard normal curve, but not necessarily a property of every normal curve? A) The mean, median, and mode are all equal. B) The area under the curve is one. C) The mean is zero and the standard deviation is one. D) The curve is symmetric about the mean.
C) The mean is zero and the standard deviation is one.
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.
Researchers conducted a study and obtained a p-value of 0.75. Based on this p-value, what conclusion should the researchers draw?
Fail to reject the null hypothesis but do not accept the null hypothesis as true either.
Type II error
Failing to reject a false null hypothesis
Which of the following statements about probability is not true?
For any event E, 0less than<P(E)less than<1, where P(E) is the probability of event E.
It is recommended that adults get 8 hours of sleep each night. A researcher hypothesized college students got less than the recommended number of hours of sleep each night, on average. The researcher randomly sampled 50 college students and calculated a sample mean of 7.9 hours per night. The researcher performed a hypothesis test. What is the null hypothesis?
H0: μx=8 hours per night
Alex hypothesized that, on average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is Alex's alternative hypothesis?
H1: μ<2 hours per week per credit
Why does the formula for calculating the sample variance, involve squaring the difference between each value and the mean?
If the differences were not squared, the sum of all deviations from the mean would always be zero since the positive deviations are balanced by the negative deviations
Why does the formula for calculating the sample variance, why do we divide by n-1 instead of n
If the formula divided by n, the sample variance would be biased and consistently underestimate the population variance
Suppose every student in a class is surveyed and it is found that 75% of the class plans to take another math class. It is reported that 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Inferential statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.
A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. He noticed that 2 out of 15 of his friends in the class dropped before the midterm. Based on his sample, he performed a hypothesis test. Is the hypothesis test a one-tailed or two-tailed test?
It is a one-tailed test since the alternative hypothesis states that the parameter is greater than the hypothesized value.
Suppose the equation of a least-squares regression line is y^ = −3.17 − 2.4x. What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
When analyzing two quantitative variables, what is the first thing that should be done?
Make a scatterplot.
The probability that a randomly selected adult in a particular community is a smoker is 20%. The probability that a randomly selected adult in the community is a smoker, given that the adult earns more than $75,000 per year, is 10%. Are the events "is a smoker" and "earns more than $75,000 per year" independent? Explain.
No, because the probability of smoking is different for people who earn over $75,000 per year, the events are not independent.
A survey found that 5% of adults have not visited a dentist in the last five years. Suppose you ask 50 adults selected at random if they have visited a dentist in the last five years. Should a normal distribution be used to approximate the distribution of the random variable x that counts the number of adults who have not visited a dentist in the last five years?
No; since npless than<5, the normal distribution should not be used.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in mind, which variable would go on the y-axis in the scatterplot?
Number of calories goes on the y-axis, since it is the response variable.
In a normal distribution, approximately 68% of the area under the normal curve is within how many standard deviation(s) of the mean?
One
The ____ is/are the entire group of individuals or items being studied
Population
In regression, what is the proportion of variation in the response variable that is explained by the regression model called?
R^2
Type 1 error
Rejecting a true null hypothesis
Suppose you want to know if more technical service calls are made to homes with cable television or with satellite dish television. Should you use frequencies or relative frequencies to make the comparison? Why?
Relative frequencies should be used since there is likely a difference in the number of users of cable and satellite television. If you make comparisons using frequencies, the results can be very misleading for different population sizes.
Jan performed a study and obtained a p-value of 1.24. What conclusion should Jan make?
She made an error since it is not possible to get a p-value of 1.24.
Which measure of center must be equal to an actual data value? Explain why.
Since the mode is the most frequent observation that occurs in the data set, it must be an actual value from the data set
Which of the following statements correctly describes the complement of event E?
The complement of event E is the set of outcomes which are in the sample space but not in event E.
April calculated a correlation coefficient between sex and GPA as −0.25. She said there is a weak correlation between a person's sex and their GPA. Which of the following is an appropriate comment about April's statement?
The correlation coefficient does not make sense to describe the relationship between a categorical and quantitative variable.
What is the definition of the correlation coefficient?
The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.
A collection of data on class sizes at a community college produces the five-number summary below. Comment on the shape of the distribution of class sizes. Min=12 Q1=22 Q2=35 Q3=38 Max=40
The distribution appears to be skewed left since the median is further from the first quartile than the third quartile. Also, the left whisker would be longer than the right whisker in a boxplot for the data.
The following five-number summary represents the annual snowfall totals for a Midwest town for the last 75 years. Comment on the shape of the distribution of snowfall totals. Min=14 Q1=17 Q2=21 Q3=29 Max=38
The distribution appears to be skewed right since the median is closer to the first quartile than the third quartile. Also, the right whisker tends to be longer than the left in a box plot of a right skewed test
How is the best-fitting line between the points in a scatterplot defined?
The line that gives the smallest sum of the squared vertical distances between each point and the line
What is the mean of a probability distribution?
The mean is the expected value of the random variable.
Identify which statement about the mean of a discrete random variable is not true or state that they are all true.
The mean must be a possible value of the random variable.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.
Suppose x(bar)=60, H0:μx=50, HA:μx>50, and the p-value from a one-sample test is 0.04. What does this p-value mean?
The probability of getting a sample mean of 60 or more if the true population mean is 50 is 0.04.
When looking at a scatterplot of two quantitative variables, what do we typically look for?
The relationship between the two variables and if there are any deviations from the pattern (outliers or clusters of points, for example).
Is the average body temperature of humans really 98.6°F? After sampling 15,600 healthy people from around the country, researchers found a sample mean of 98.5°F. The p-value was 0.0001. Which of the following is true?
The results are "statistically significant" because the sample size was quite large and the p-value was quite small.
Describe the sample variance in words rather than with a formula.
The sample variance is the sum of the squared deviations from the mean, divided by (nminus−1).
The least-squares regression equation y^ = 33.967 + 11.358x, what is 11.358
The slope of the least squares regression line
Identify the requirements for a discrete probability distribution.
The sum of the probabilities must equal one. Each probability must be between zero and one inclusive
Which of the following is NOT a condition of the Analysis of Variance model?
The treatment group means must fall on a straight line.
Which of the following is not a criterion for the binomial distribution?
The trials must be dependent.
What is wrong with the following definition of the correlation coefficient? The correlation coefficient measures the strength and direction of the linear relationship between two variables.
The two variables must be quantitative.
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
Researchers timed 21 subjects as they tried to complete paper-and-pencil mazes. Each subject attempted a maze both with and without the presence of a floral aroma. Subjects were randomized with respect to which trial they did first. Suppose a paired t-test is to be performed to determine whether there is evidence to indicate that the time to complete the maze is faster in scented trials compared to unscented trials, on average. The p-value from the paired t-test is 0.11. Which of the following is the most appropriate conclusion based on this p-value?
There is not sufficient evidence to indicate that the individuals complete mazes faster with a floral aroma present compared to when no floral aroma is present, on average.
A certain marathon has had a wheelchair division since 1977. An interested fan wondered who is faster: the men's marathon winner or the women's wheelchair marathon winner, on average. A paired t-test was performed, and the p-value was found to be 0.001. Which of the following is the correct conclusion?
There is sufficient evidence to indicate that the men's running winning time and the women's wheelchair winning time each year are different, on average.
Brett is a huge sports fan. He hypothesized half of sports fans liked football the best, one-quarter liked baseball the best, 15% liked basketball the best, and 5% liked hockey the best, and the rest liked some other sport the best. He surveyed 100 sports fans and asked what sport they liked the best. Assuming all conditions are satisfied, which of the following tests should Brett use to test his hypothesis?
The goodness-of-fit chi-square test
Why does sample size need to be accounted for in the t-distribution?
The t-distribution changes for different sample sizes.
Suppose you want to calculate the z-score for your height. How will the z-scores compare if you use your height in inches verses centimeters?
The z-scores will be the same regardless of the unit used for your height because z-scores are unitless.
n a normal distribution, approximately 99.7% of the area under the normal curve is within how many standard deviation(s) of the mean?
Three
Explain how to find the mean of a discrete random variable.
To find the mean of a random variable, multiply each value of the random variable by its probability and then add those products.
True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.
True. A relative frequency histogram will have different scale on the y axis but the same shape as a regular histogram
In a normal distribution, approximately 95% of the area under the normal curve is within how many standard deviation(s) of the mean?
Two
A research organization keeps track of what citizens think is the most important problem facing the country today. They randomly sampled a number of people in 2003 and again in 2009 using a different random sample of people in 2009 than in 2003 and asked them to choose the most important problem facing the country today from the following choices, war, economy, health care, or other. Which of the following is the correct test to use to determine if the distribution of "problem facing this country today" is different between the two different years?
Use a chi-square test of homogeneity.
When is it appropriate to use the pooled two-sample t-methods?
Use the pooled two-sample t-methods when the samples come from different populations with the same, or nearly the same, standard deviations.
A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a 99% confidence interval of (17.3,22.5) hours/week. In the context of the problem, which of the following interpretations is correct?
We are 99% sure that the average amount of time spent studying among graduate students at this student's school is between 17.3 and 22.5 hours per week
In a chi-square test, when would the null hypothesis be true?
When all observed counts are the same as their expected counts
When will a chi-square statistic be 0?
When all observed counts are the same as their expected counts
When are conclusions said to be "statistically significant"?
When the p-value is less than a given significance level
Elmo likes music. He wondered if listening to music while studying will improve scores on an exam. Fifty students who were to take the midterm in a week agreed to be part of a study. Half were randomly assigned to listen to classical music while studying for the exam. The other half were told not to listen to any music while studying for the exam. A hypothesis test is to be performed to determine if the average scores of those listening to music while studying for the exam were higher than those who did not listen to any music while studying for the exam. Which of the following hypothesis tests should be used?
a two-sample t-test
The _________________ is/are a subset of the population that is being studied.
sample
The probability of obtaining x successes in n independent trials of a binomial experiment is given by P(x)=xnCxp^x(1−p)^n−x, where p is the probability of success. What does the n−x represent in the formula?
the number of failures
The F-statistic in a one-way Analysis of Variance problem has how many numerator degrees of freedom?
the number of groups being compared minus 1
The probability of obtaining x successes in n independent trials of a binomial experiment is given by P(x)=nCxp^x(1−p)^n−x, where p is the probability of success. What does nCx represent in the formula?
the number of ways to get x successes in n trials
The probability of obtaining x successes in n independent trials of a binomial experiment is given by P(x)=nCxp^x(1−p)^n−x where p is the probability of success. What does (1−p)^n−x represent in the formula?
the probability of failure raised to the number of failures
The probability of obtaining x successes in n independent trials of a binomial experiment is given by P(x)=nCxp^x(1−p)^n−x where p is the probability of success. What does the p Superscript p^x represent in the formula?
the probability of success raised to the number of successes
Describe the sample standard deviation in words rather than with a formula.
the sample standard deviation is the square root of the quotient of the sum of the squared deviation from the mean and (n-1)
What does the standard error of the distribution of sample means estimate?
the standard deviation of the distribution of sample means
The F-statistic in a one-way Analysis of Variance problem has how many denominator degrees of freedom?
the total sample size of all groups combined minus the number of groups being compared
It is assumed that approximately 15% of adults in the U.S. are left-handed. Consider the probability that among 100 adults selected in the U.S., there are at least 30 who are left-handed. Given that the adults surveyed were selected without replacement, can the probability be found by using the binomial probability formula with x counting the number who are left-handed? Why or why not?
Yes, because the 100 adults represent less than 5% of the U.S. adult population, the trials can be treated as independent.
Cuckoos lay their eggs in the nests of other (host) birds. The eggs are then adopted and hatched by the host birds, but the potential host birds lay eggs of different sizes. A random sample of sparrow host eggs and wagtail host eggs was taken and the length of the cuckoo eggs for each host was recorded. Based on the sample data, suppose a 95% confidence interval for the difference in mean lengths of cuckoo eggs (sparrow hosts−wagtail hosts) is (−0.6, −0.1) mm. Is there evidence at the 5% significance level to indicate that cuckoos do change the size of their eggs between sparrow and wagtail hosts, on average?
Yes, since 0 is not between the lower and upper bounds of the confidence interval.
A professor wondered if there was a difference in the proportion of students who dropped math classes between females and males. The professor randomly selected 20 math classes around campus and recorded the gender of the individual and whether or not a student enrolled in the class at the beginning of the term dropped the class at some point during the term. Assuming all conditions are satisfied, which of the following tests should the researcher use?
two-sample z-test for proportions