Stats Final Exam
Which of the following is a good confounding variable for the relationship between the rate of ice cream consumption and the number of sunburns?
Hot temperatures.
In the presence of no small or large outliers, the 'whiskers' on a box-and-whisker plot are:
Min, Max
Group 1 Questions
...
Group 3 Questions
...
Group 2 Questions
....
In a sample of 289 New Yorkers, researchers examined the impact of income on educational outcomes. The cases are:
289 New Yorkers.
Which of the following is a possible bootstrap sample from an original sample of 3, 4, 2, 7, 9.
7, 4, 3, 2, 7
Question 1:2
A recent study* examining the link between schizophrenia and culture interviewed 60 people who had been diagnosed with schizophrenia and who heard voices in their heads. The participants were evenly split between the US, India, and Ghana, and each was interviewed to determine whether the voices were mostly negative, mostly neutral, or mostly positive. The results are shown in the table below. *Bower, B., "Hallucinated voices' attitudes vary with culture," Science News, December 10, 2014. (a) What proportion of all the participants felt that the voices are mostly negative? (b) What proportion of all US participants felt that the voices are mostly negative? What proportion of non-US participants felt that the voices are mostly negative? (d) What proportion of participants hearing positive voices are from the US? (e) Does culture appear to be associated with how voices are perceived by people with schizophrenia?
Question 1:1
Does belief in one true love differ by education level? The two-way table below shows results for these two variables. A person's education is categorized as HS (high school degree or less), Some (some college), or College (college graduate or higher). (a) Find the proportion who agree that there is only one true love for each education level. Does there seem to be an association between education level and agreement with the statement? If so, in what direction? (b) What proportion of people participating in the survey have a college degree or higher? (c) What proportion of the people who disagree with the statement have a high school degree or less?
Consider the following equation: TEMP = 37.7 + 0.23 CHIRPS What is the interpretation for the coefficient on CHIRPS?
For each additional chirp, the temperature increases by 0.23.
In the regression line, y-hat = a + bx, how should we interpret the value of a?
It is the predicted y value when x = 0.
A statistic is a number that is:
computed from data in a sample.
In the presence of small and large outliers, the 'whiskers' on a box-and-whisker plot are:
Q1-1.5IQR, Q3+1.5IQR
Question 2:1
Question: Life expectancy for all the different countries in the world ranges from a low of only 45.6 years (in Sierra Leone) to a high of 83.8 years (in Hong Kong). Life expectancies are clustered at the high end, with about half of all countries having a life expectancy between about 74 and the maximum of 83.8. A few countries, such as Sierra Leone, have a very low life expectancy. The full dataset is in AllCountries which can be found at the following link: http://www.lock5stat.com/datapage2e.html (Links to an external site.) (a) What is the shape of the distribution of life expectancies for all countries? Provide the histogram created in excel - remember, to produce a histogram you should create bins with a fixed width between the min and max. A good bin width here would be 5. (b) Calculate the mean and median. What does the difference between the mean and the median indicate about the data?
II Group 1
Use technology to find the regression line to predict Y from X based on the data given below: Provide the equation of the regression line and give an interpretation of the slope and intercept. Using the excel Regression command in the Data Analysis Toolpak, we get that the intercept is 0.395 and the slope coefficient on X is 0.349. Thus, the equation of the regression line is: Answer: Yˆ= 0.395 + 0.349 × X The slope coefficient means that for a unit change in X, we can expect a 0.349 change in Y. The intercept (0.395) is the predicted value of Y when X = 0.
Question
Use technology to find the regression line to predict Y from X based on the data given below: Provide the equation of the regression line and give an interpretation of the slope and intercept. Using the excel Regression command in the Data Analysis Toolpak, we get that the intercept is 47.333 and the slope coefficient on X is 1.857. Thus, the equation of the regression line is: Yˆ = 47.333 + 1.857 × X (2) The slope coefficient means that for a unit change in X, we can expect a 1.857 change in Y. The intercept (47.333) is the predicted value of Y when X = 0.
Question 3:1
Use the StudentSurvey dataset from the link: http://www.lock5stat.com/datapage2e.html (Links to an external site.) We want to examine the distribution of student SAT scores using the five-number summary and additional statistics. (Use the variable called SAT, NOT the VerbalSAT or MathSAT variables.) Min Q1 Median Q3 Max IQR Q1 - 1.5*IQR Q3 + 1.5*IQR (a) Calculate the statistics above. 6 (b) Are there any outliers? How did you determine this? (c) Create a box-and-whisker plot of the SAT variable and submit it on WileyPLUS.
One of the proper visualizations for a *single* categorical variable is:
a bar chart
When the data has a quantitative variable measured for two groups (a categorical variable), the statistic to compare across groups would be:
a difference in means
An appropriate summary statistic for a *single* categorical variable is:
a frequency table.
One of the proper visualizations for the relationship between a categorical and quantitative variable is:
a set of box plots.
One of the proper visualizations for two categorical variables is:
a side-by-side bar chart.
Generalizable results come from
a simple random sample.
An appropriate summary statistic for two categorical variables is:
a two-way table
A bootstrap sample can be generated for:
any type of statistic.
A confounding variable
effects both the explanatory and response variable
A simple random sample is important in statistics because:
it provides generalizable results.
The least squares line is the line which:
minimizes the sum of squared residuals.
The margin of error:
reflects the precision of the sample statistic.
A parameter is a number that describes:
some aspect of a population
What is the general equation for a 95% confidence interval?
statistic ± 2*SE
A bootstrap distribution is obtained by:
taking repeated random samples from the original sample.
The range is given by:
the difference between the min and the max.
The deviation in standard deviation measures:
the distance from each point to the mean.
A sampling distribution is:
the distribution of sample statistics.
The 95 percent rule refers to: Note: SD means Standard Deviation.
the number of observations two SDs from the mean.
Cases are
the participants in the sample
The standard error of a statistic, SE, is:
the standard deviation of the sample statistic.
The 20th percentile means:
the value at which 20 percent of observations are below.
The residual for each data point is:
y - yhat