MA180 Final
Suppose two events E and F are disjoint. What is P(E and F)?
0
The probability of observing a particular value of a continuous random variable _______.
0
What critical value should be used to construct a 90% confidence interval for the population mean when the population standard deviation is known?
z=1.645
For any set of data, at least _______ of the data will be within two standard deviations of the mean. For a bell-shaped distribution, approximately _______ of the data will be within two standard deviations of the mean.
3/4, 95%
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation from the mean?
32%
John performed a one-sample z-test for proportions and obtained a p-value of 0.35. John decided to reject the null hypothesis. What is the probability John made a Type I Error?
0.35
In a normal distribution, approximately 68% of the area under the normal curve is within how many standard deviation(s) of the mean?
1
After constructing any relative frequency distribution, what should be the sum of the relative frequencies?
1 or 100%
Suppose the probability that a randomly selected man, aged 55-59, will die of cancer during the course of the year is 300/ 100,000 How would you find the probability that at least 1 man out of 1,000 of this age will die of cancer during the course of the year?
1-(0.997)^1000
A hungry college student just finished eating a large cheese and pepperoni pizza from Doug's Pizza Palace and felt there wasn't enough cheese and pepperoni on it as he has had large pizzas with more cheese and pepperoni on it from this place in the past. This led him to wonder what the average weight of a large cheese and pepperoni pizza was at Doug's Pizza Palace. Doug ordered a large cheese and pepperoni pizza from Doug's Pizza Palace on 15 randomly selected days during a particular term and weighed each before eating it. Suppose all conditions are met for inference using the one-sample t-methods. The student constructed a 95% confidence interval for the population mean using the one-sample t-methods. How many degrees of freedom does the t critical value have?
14
According to the Empirical Rule, 68% of the area under the normal curve is within one standard deviation of the mean. What percent of the area under the normal curve is more than one standard deviation above the mean?
16%
In a normal distribution, approximately 95% of the area under the normal curve is within how many standard deviation(s) of the mean?
2
Can pleasant smells improve learning? Researchers timed 21 subjects as they tried to complete paper-and-pencil mazes. Each subject attempted a maze both with and without the presence of a floral aroma. Subjects were randomized with respect to whether they did the scented trial first or second. Suppose a paired t-test is to be performed to determine whether there is evidence to indicate that the time to complete the maze is faster in scented trials compared to unscented trials, on average. The summary statistics for the difference in time to complete the maze (in seconds) between the unscented and scented trials (unscented-scented) are x equals 3.85 and s equals 13.01. How many degrees of freedom does the t-statistic have?
20
In a normal distribution, approximately 99.7% of the area under the normal curve is within how many standard deviation(s) of the mean?
3
According to the Empirical Rule, 95% of the area under the normal curve is within two standard deviations of the mean. What percent of the area under the normal curve is more than two standard deviations from the mean?
5%
For any set of data, at least _______ of the data will be within three standard deviations of the mean. For a bell-shaped distribution, approximately _______ of the data will be within three standard deviations of the mean.
8/9, 99.7%
A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. A hypothesis test is performed. Suppose the null hypothesis was rejected with a p-value of 0.0002. The power of the test was 0.90. What type of error could be made and what is the probability of making that error?
A Type I error could be made with a probability of 0.0002
A confidence interval for a population mean __________.
A confidence interval for a population mean gives possible values the true population mean will be with a certain level of confidence.
A critical value is _____________.
A critical value is the number of standard errors (or standard deviations) to move from the mean of a sampling distribution to correspond to a specified level of confidence.
Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression analysis?
A linear regression analysis relies on a straight line being fit between the points on a scatterplot.
When should a paired t-test be performed instead of a two-sample t-test?
A paired t-test should be performed instead of a two-sample t-test when each observation in one group has a dependence on a particular observation in the other group.
When should a paired t-test be performed?
A paired t-test should be performed when the variable of interest is quantitative, there are two groups being compared, and the samples taken are dependent.
Researchers conducted a study and obtained a p-value of 0.75. Based on this p-value, what conclusion should the researchers draw?
Fail to reject the null hypothesis but do not accept the null hypothesis as true either.
An investigator conducts an experiment with four treatment groups. The response variable is growth of a plant during the experiment. She performs an F-test. What is the alternative hypothesis the researcher is testing with the F-test?
At least one treatment group has a different average growth of plants during the experiment, but not all four necessarily have different mean growths.
A p-value is the probability _____________.
A p-value is the probability of observing the actual result, a sample mean, for example, or something more unusual just by chance if the null hypothesis is true.
If two data sets use different units of measure, which should you calculate to compare the variability of the two sets of data?
Coefficient of variation
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics: The results of the class sample are described without making any generalizations about the population of all the students at the school.
What must be true for a sample to be considered a simple random sample?
Every member (or sample) must have the same chance of being selected as every other member (or sample of the same size).
Which of the following statements is true concerning random and simple random samples?
Every simple random sample is also a random sample.
A paired t-test should be performed instead of a two-sample t-test when each observation in one group has a dependence on a particular observation in the other group.
For any event E, 0<P(E)<1, where P(E) is the probability of E.
It is recommended that adults get 8 hours of sleep each night. A researcher hypothesized college students got less than the recommended number of hours of sleep each night, on average. The researcher randomly sampled 50 college students and calculated a sample mean of 7.9 hours per night. The researcher performed a hypothesis test. What is the null hypothesis?
H0:ux=8 hours per night
Alex hypothesized that, on average, students study less than the recommended two hours per credit hour each week outside of class. Which of the following is Alex's alternative hypothesis?
H1:u<2 hours per week per credit
Why does the formula for calculating the sample variance, involve squaring the difference between each value and the mean?
If the differences were not squared, then the sum of all deviations from the mean would always be zero since the positive deviation are balanced by the negative deviations.
Why does the formula for calculating the sample variance, involve division by n-1 instead of n?
If the formula involved division by n, the sample variance wold be biased and consistently underestimate the population variance.
Which of the following would increase the width of a confidence interval for a population mean?
Increase the level of confidence
Suppose every student in a class is surveyed and it is found that 75% of the class plans to take another math class. It is reported that 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Inferential statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.
Suppose the equation of a least-squares regression line is y=-3.17-2.4x What can be said about the y-intercept?
It is -3.17
A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. He noticed that 2 out of 15 of his friends in the class dropped before the midterm. Based on his sample, he performed a hypothesis test. Is the hypothesis test a one-tailed or two-tailed test?
It is a one-tailed test since the alternative hypothesis states that the parameter is greater than the hypothesized value
It is hypothesized that 50% of Americans attend church regularly. Which of the following would be an example of making a Type I Error?
It is hypothesized that 50% of Americans attend church regularly. Which of the following would be an example of making a Type I Error?
Suppose the equation of a least-squares regression line is y=-3.17-2.4x What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
When analyzing two quantitative variables, what is the first thing that should be done?
Make a scatterplot.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in mind, which variable would go on the y-axis in the scatterplot?
Number of calories goes on the y-axis, since it is the response variable.
In regression, what is the proportion of variation in the response variable that is explained by the regression model called?
R^2
Suppose you want to know if more technical service calls are made to homes with cable television or with satellite dish television. Should you use frequencies or relative frequencies to make the comparison? Why?
Relative Frequencies should be used since there is a likely difference in the number of users of cable and satellite television. If you make comparisons using frequencies, the results can be very misleading for different population sizes.
Jan performed a study and obtained a p-value of 1.24. What conclusion should Jan make?
She made an error since it is not possible to get a p-value of 1.24.
Which measure of center must be equal to an actual data value? Explain why.
Since the mode is the most frequent observation that occurs in the data set, it must be an actual value from the data set.
If a professor adds 10 points to each student's final exam score, how will it affect the distribution of final exam scores?
The center will change, but the shape and the spread will remain the same.
Which of the following statements correctly describes the complement of event E?
The complement of event E is the set of outcomes which are in the sample space but not in event E.
April calculated a correlation coefficient between sex and GPA as minus 0.25. She said there is a weak correlation between a person's sex and their GPA. Which of the following is an appropriate comment about April's statement?
The correlation coefficient does not make sense to describe the relationship between a categorical and quantitative variable.
What is the definition of the correlation coefficient?
The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.
Which of the following statements is true about a normal density curve as o increases?
The curve becomes more spread out.
Identify which of the following statements is not a requirement for a probability density function or state that they all are.
The curve must be symmetric and centered at zero.
Identify which of the following statements about the graph of a probability density function is true or state that they are both true or neither are true.
The graph must always be on or above the horizontal axis.
Which of the following statements is true about a normal density curve as o increases?
The height of the curve decreases.
How is the best-fitting line between the points in a scatterplot defined?
The line that gives the smallest sum of the squared vertical distances between each point and the line
Which of the following is a property of the standard normal curve, but not necessarily a property of every normal curve?
The mean is zero and the standard deviation is one.
Identify which statement about the mean of a discrete random variable is not true or state that they are all true.
The mean must be a possible value of the random variable.
What does it mean to say that the trials in a binomial experiment are independent of each other?
The outcome of one trial does not affect the outcomes of the other trials.
Suppose x=60, Ho: u=50, HA: ux<50, and the p-value from a one sample test is 0.04. That does this p-value mean?
The probability of getting a sample mean of 60 or more if the true population mean is 50 is 0.04.
When looking at a scatterplot of two quantitative variables, what do we typically look for?
The relationship between the two variables and if there are any deviations from the pattern (outliers or clusters of points, for example).
Is the average body temperature of humans really 98.6degrees After sampling 15,600 healthy people from around the country, researchers found a sample mean of 98.5degrees The p-value was 0.0001. Which of the following is true?
The results are "statistically significant" because the sample size was quite large and the p-value was quite small.
Describe the sample standard deviation in words rather than with a formula.
The sample standard deviation is the square root of the quotient of the sum of the squared deviations from the mean and (n-1).
Describe the sample variance in words rather than with a formula.
The sample variance is the sum of the squared deviations from the mean, divided by (n-1)
Identify the requirements for a discrete probability distribution.
The sum of the probabilities must equal one. Each probability must be between zero and one inclusive.
Which of the following is NOT a condition of the Analysis of Variance model?
The treatment group means must fall on a straight line.
Which of the following is not a criterion for the binomial distribution?
The trials must be dependent.
What is wrong with the following definition of the correlation coefficient? The correlation coefficient measures the strength and direction of the linear relationship between two variables.
The two variables must be quantitative.
Suppose you want to calculate the z-score for your height. How will the z-scores compare if you use your height in inches verses centimeters?
The z-scores will be the same regardless of the unit used for your height because z-scores are unit less.
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income.
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
Which distribution shape (skewed left, skewed right, or symmetric) is most likely to result in the mean being substantially smaller than the median?
There is no way to predict the relationship between the mean and median based on the shape of the distribution.
Is the length of 3/4" screws different than 3/4", on average? A random sample of 3/4" screws produced the following 95% confidence interval for the mean length of 3/4" screws: (0.748,0.754) in inches. Which of the following is true?
There is not enough evidence at the 5% significance level to indicate the mean length of 3/4" screws is different than 3/4".
Researchers timed 21 subjects as they tried to complete paper-and-pencil mazes. Each subject attempted a maze both with and without the presence of a floral aroma. Subjects were randomized with respect to which trial they did first. Suppose a paired t-test is to be performed to determine whether there is evidence to indicate that the time to complete the maze is faster in scented trials compared to unscented trials, on average. The p-value from the paired t-test is 0.11. Which of the following is the most appropriate conclusion based on this p-value?
There is not sufficient evidence to indicate that the individuals complete mazes faster with a floral aroma present compared to when no floral aroma is present, on average.
Clifford likes dogs. He wondered how much dog owners spend on their dogs in a year. He hypothesized that dog owners spend more than $1000 a year on their dogs, on average. He sampled 75 dog owners in a local community and found that these 75 dog owners spent an average $1075 on their dogs in a year. Suppose ox-$175.Assume that these 75 dog owners are representative of all dog owners in terms of amount spent on their dogs in a year. The p-value is 0.0001. What conclusion should be made?
There is strong evidence to reject the null hypothesis.
A certain marathon has had a wheelchair division since 1977. An interested fan wondered who is faster: the men's marathon winner or the women's wheelchair marathon winner, on average. A paired t-test was performed, and the p-value was found to be 0.001. Which of the following is the correct conclusion?
There is sufficient evidence to indicate that the men's running winning time and the women's wheelchair winning time each year are different, on average.
Brett is a huge sports fan. He hypothesized half of sports fans liked football the best, one-quarter liked baseball the best, 15% liked basketball the best, and 5% liked hockey the best, and the rest liked some other sport the best. He surveyed 100 sports fans and asked what sport they liked the best. Assuming all conditions are satisfied, which of the following tests should Brett use to test his hypothesis?
The goodness-of-fit chi-square test
Why does sample size need to be accounted for in the t-distribution?
The t-distribution changes for different sample sizes.
Explain how to find the mean of a discrete random variable.
To find the mean of a random variable, multiply each value of the random variable by its probability and then add those products.
True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.
True. A relative frequency histogram will have a different scale on the y-axis but the same shape of a regular histogram.
A research organization keeps track of what citizens think is the most important problem facing the country today. They randomly sampled a number of people in 2003 and again in 2009 using a different random sample of people in 2009 than in 2003 and asked them to choose the most important problem facing the country today from the following choices, war, economy, health care, or other. Which of the following is the correct test to use to determine if the distribution of "problem facing this country today" is different between the two different years?
Use a chi-square test of homogeneity.
When is it appropriate to use the pooled two-sample t-methods?
Use the pooled two-sample t-methods when the samples come from different populations with the same, or nearly the same, standard deviations.
A graduate student wanted to estimate the average time spent studying among graduate students at her school. She randomly sampled graduate students from her school and obtained a 99% confidence interval of (17.3,22.5) hours/week. In the context of the problem, which of the following interpretations is correct?
We are 99% sure that the average amount of time spent studying among graduate students at this student's school is between 17.3 and 22.5 hours per week.
What is the mean of a probability distribution?
What is the mean of a probability distribution?
In a chi-square test, when would the null hypothesis be true?
When all observed counts are the same as their expected counts
When will a chi-square statistic be 0?
When all observed counts are the same as their expected counts
When are conclusions said to be "statistically significant"?
When the p-value is less than a given significance level
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
Which of the following statements is not true about binomial probability distributions?
Which of the following statements is not true about binomial probability distributions?
t is assumed that approximately 15% of adults in the U.S. are left-handed. Consider the probability that among 100 adults selected in the U.S., there are at least 30 who are left-handed. Given that the adults surveyed were selected without replacement, can the probability be found by using the binomial probability formula with x counting the number who are left-handed? Why or why not?
Yes, because the 100 adults represent less than 5% of the U.S. adult population, the trials can be treated as independent.
Can a qualitative variable have values that are numeric? Why or why not?
Yes: it is possible to have numeric variables that do not count or measure anything, and, as a result, are qualitative rather than quantitative.
Elmo likes music. He wondered if listening to music while studying will improve scores on an exam. Fifty students who were to take the midterm in a week agreed to be part of a study. Half were randomly assigned to listen to classical music while studying for the exam. The other half were told not to listen to any music while studying for the exam. A hypothesis test is to be performed to determine if the average scores of those listening to music while studying for the exam were higher than those who did not listen to any music while studying for the exam. Which of the following hypothesis tests should be used?
a two-sample t-test
n 1993, the British Medical Journal published an article titled, "Is Friday the 13th Bad for Your Health?" Researchers in Britain examined how Friday the 13th affects human behavior. One question was whether people tend to stay at home more on Friday the 13th. The accompanying data give the number of cars passing Junctions 9 and 10 on the M25 motorway for consecutive Fridays (the 6th and 13th) for five different time periods. Assuming all conditions for inference are met, which test is appropriate to use to answer the researcher's question of interest?
paired t-test
The _________________ is/are the entire group of individuals or items being studied.
population
The _________________ is/are a subset of the population that is being studied.
sample
The F-statistic in a one-way Analysis of Variance problem has how many numerator degrees of freedom?
the number of groups being compared minus 1
What does the standard error of the distribution of sample means estimate?
the standard deviation of the distribution of sample means
The F-statistic in a one-way Analysis of Variance problem has how many denominator degrees of freedom?
the total sample size of all groups combined minus the number of groups being compared
The probability that a randomly selected adult in a particular community is a smoker is 20%. The probability that a randomly selected adult in the community is a smoker, given that the adult earns more than $75,000 per year, is 10%. Are the events "is a smoker" and "earns more than $75,000 per year" independent? Explain.
No, because the probability of smoking is different for people who earn over $75,000 per year, the events are not independent.
A survey found that 5% of adults have not visited a dentist in the last five years. Suppose you ask 50 adults selected at random if they have visited a dentist in the last five years. Should a normal distribution be used to approximate the distribution of the random variable x that counts the number of adults who have not visited a dentist in the last five years?
No; since np<5, the normal distribution should not be used.
uckoos lay their eggs in the nests of other (host) birds. The eggs are then adopted and hatched by the host birds, but the potential host birds lay eggs of different sizes. A random sample of sparrow host eggs and wagtail host eggs was taken and the length of the cuckoo eggs for each host was recorded. Based on the sample data, suppose a 95% confidence interval for the difference in mean lengths of cuckoo eggs (sparrow hosts-wagtail hosts) is - 0.6,minus 0.1) mm. Is there evidence at the 5% significance level to indicate that cuckoos do change the size of their eggs between sparrow and wagtail hosts, on average?
Yes, since 0 is not between the lower and upper bounds of the confidence interval.
A professor wondered if there was a difference in the proportion of students who dropped math classes between females and males. The professor randomly selected 20 math classes around campus and recorded the gender of the individual and whether or not a student enrolled in the class at the beginning of the term dropped the class at some point during the term. Assuming all conditions are satisfied, which of the following tests should the researcher use?
two-sample z-test for proportions