Math 180 Test 4
In regression, what is the proportion of variation in the response variable that is explained by the regression model called?
R squared
The F-statistic in a one-way Analysis of Variance problem has how many denominator degrees of freedom?
the total sample size of all groups combined minus the number of groups being compared
When analyzing two quantitative variables, what is the first thing that should be done?
make a scatterplot.
The F-statistic in a one-way Analysis of Variance problem has how many numerator degrees of freedom?
the number of groups being compared minus 1
An experiment was performed to look at the reflectivity of different paints used on roads. Four paints were used (call them Paint 1, Paint 2, Paint 3, and Paint 4). Twenty-four sections of roads with similar travel patterns and weather patterns were used. Each type of paint was randomly assigned to one of the 24 sections of road so that each paint type was used on 6 different road sections. The percent of reflectivity after 6 months was determined and recorded. How many degrees of freedom does the F-statistic have in this problem?
3 numerator and 20 denominator
Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression analysis?
A linear regression analysis relies on a straight line being fit between the points on a scatterplot.
An investigator conducts an experiment with four treatment groups. The response variable is growth of a plant during the experiment. She performs an F-test. What is the null hypothesis the researcher is testing with the F-test?
All four treatment groups have the same average growth of the plants during the experiment.
In a goodness-of-fit test, what is the expected value of each category, if we are not testing that the sample data fit a uniform distribution (we are not assuming that the outcomes are equally likely)?
E=np p= probability of each category n=sample size
An investigator conducts an experiment with four treatment groups. The response variable is growth of a plant during the experiment. She performs an F-test. What is the alternative hypothesis the researcher is testing with the F-test?
At least one treatment group has a different average growth of plants during the experiment, but not all four necessarily have different mean growths.
What's a good method for testing whether different medical treatments produce different results?
Construct a table of treatments vs. results, and then preform a test of independence
What kind of test is a Goodness-of-Fit test?
Goodness-of-fit is a right-tailed chi-square test
What is being tested in a test of independence?
If the row and column variable of a contingency table are independent (two-way frequency)
What is being tested in a Goodness-of-Fit test?
If the sample data fits a claimed distribution
Say we conduct a one-way ANOVA test and determine that the null hypothesis should be rejected. What does this signify for the means under consideration in the test?
If we reject the null hypothesis it means at least one of the populations have a different mean.
Suppose the equation of a least-squares regression line is y hat =−3.17−2.4x. What can be said about the y-intercept?
It is −3.17.
Suppose the equation of a least-squares regression line is y hat =−3.17−2.4x. What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
Can linear regression analysis determine if two variables are causally related? Why or why not?
NO! Linear regression analysis can only tells us if two variable are correlated
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in mind, which variable would go on the y-axis in the scatterplot?
Number of calories goes on the y-axis, since it is the response variable.
What kind of test is an ANOVA test?
One-way ANOVA is a right-tailed F test
What are you testing when you use oneway ANOVA?
One-way ANOVA tests if three or more populations have the same mean
What kind of test is a test of independence?
Test of independence is a right-tailed chi-square test
April calculated a correlation coefficient between sex and GPA as −0.25. She said there is a weak correlation between a person's sex and their GPA. Which of the following is an appropriate comment about April's statement?
The correlation coefficient does not make sense to describe the relationship between a categorical and quantitative variable.
What is the definition of the correlation coefficient?
The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.
How is the best-fitting line between the points in a scatterplot defined?
The line that gives the smallest sum of the squared vertical distances between each point and the line
Why is it called a one-way analysis of variance when you are testing means?
The name comes from the use of variances to determine if the differences among the mean is large. It's called one-way because there is only one category the observations are grouped by.
When looking at a scatterplot of two quantitative variables, what do we typically look for?
The relationship between the two variables and if there are any deviations from the pattern (outliers or clusters of points, for example).
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The least-squares regression equation is ModifyingAbove y with caretyequals=33.967plus+11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. In this equation, what is 11.358?
The slope of the least-squares regression line
Which of the following is NOT a condition of the Analysis of Variance model?
The treatment group means must fall on a straight line.
What is wrong with the following definition of the correlation coefficient? The correlation coefficient measures the strength and direction of the linear relationship between two variables.
The two variables must be quantitative.
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
Given r = 0.1, what does that tell you about the graph of the paired data?
There is probably not correlation, so the scatter plot wouldn't look like a straight line pattern.
Brett is a huge sports fan. He hypothesized half of sports fans liked football the best, one-quarter liked baseball the best, 15% liked basketball the best, and 5% liked hockey the best, and the rest liked some other sport the best. He surveyed 100 sports fans and asked what sport they liked the best. Assuming all conditions are satisfied, which of the following tests should Brett use to test his hypothesis?
The goodness-of-fit chi-square test
A research organization keeps track of what citizens think is the most important problem facing the country today. They randomly sampled a number of people in 2003 and again in 2009 using a different random sample of people in 2009 than in 2003 and asked them to choose the most important problem facing the country today from the following choices, war, economy, health care, or other. Which of the following is the correct test to use to determine if the distribution of "problem facing this country today" is different between the two different years?
Use a chi-square test of homogeneity.
In a chi-square test, when would the null hypothesis be true?
When all observed counts are the same as their expected counts
When will a chi-square statistic be 0?
When all observed counts are the same as their expected counts
Exactly what is r and what does it tell us?
r is the linear correlation coefficient for a sample. It tells us how well the paired data 'fit' a straight line.
When r is close to 1, what is probably true? What about when r is close to -1? When r is close to 0?
r= 1 -> strong positive correlation so the data 'fits' a straight line. r=0 -> no correlation r= -1 -> strong negative correlation