Stats Midterm 1
Suppose the equation y = 3.45 - 2.58x represents a valid regression equation and X can be used to predict Y. From this information, we know that X and Y have _____________ correlation.
A negative
A flat histogram (with a line straight across) contains no variability whatsoever, according to our definition.
False
If variables A and B are related in a certain way in a two way table (with 2 variables), no matter how many other variables you look at in addition to these two, the relationship will always stay the same.
False
In a boxplot you can tell the exact pattern of the data set (beyond just whether the data is skewed or symmetric.)
False
Suppose the correlation between two variables X and Y is .8. That means the correlation between Y and X is -.8.
False
Suppose the correlation between yards rushing and yards passing is .6. That means the correlation between feet rushing and feet passing is .6 x 12 (since you multiply yards by 12 to convert to feet).
False
The median must be one of the numbers in the data set.
False
The standard deviation has no units.
False
You can compare two marginal distributions to see if the corresponding two variables are related.
False
What is the X variable in an experiment
Independent variable
If you add 10 to every value of a data set, what happens to the standard deviation?
It stays the same
Suppose your data represent revenues from a group of 20 stores in a retail chain across the country, and revenue is measured in millions of dollars. The standard deviation of this data set would also be measured in millions of dollars.
True
The slices on a pie chart represent relative frequencies.
True
The starting point can affect the way a graph looks.
True
Bob wants to estimate the percentage of people who own a dog in his town, and he goes to all the apartment buildings to carry out his survey. He leaves out all the houses in the town. What kind of bias is this?
Bias due to undercoverage
Which type of graph is best for COMPARING two or more quantitative data sets, a boxplot or a histogram?
Boxplot
Boxplot A and Boxplot B are drawn on the same axes. The box part of Boxplot A is shorter in length than the box part of Boxplot B. What can you tell about the two data sets?
Can't tell anything
In the hospital example in your lecture notes, you found hospital A didn't do as well as hospital B when all the data was in one two-way table, but when an additional variable was examined, hospital A was better in all cases. What was that additional (confounding) variable that ended up reversing the results?
Condition of Patient
What should the residual plot look like if the regression line fits the data well?
Random patterns, no fan shapes, points fall around y=0
Suppose: 1. all the points on a scatterplot lie perfectly on a straight line going uphill 2. the mean of X and the mean of Y are both 2 3 the standard deviations of X and Y are exactly the same. Can you find the equation of the best fitting line with this information? (Hint: Think of the '5 number' way of finding the best-fitting line.)
Yes
What kind of sample occurs when you put an ad in the newspaper and ask readers to take your survey?
A self-selected sample
What is the most common observational study?
A survey
Your boss gives you the following regression equation. Selling price = $5,240 + $33.80 (Number of Square Feet). How do you interpret the slope for this equation?
As square feet increase by 1, selling price increases by $33.80
If a residual is negative, then that data point lies _________________ the regression line.
Below
A bar graph using StatCrunch is a good way to visualize a marginal or a conditional distribution.
True
What is the standard deviation of the data set 1, 1, 1, 1?
0
If 2 corresponding conditional distributions are different from each other in a two-way table, we know that the variables are related.
True
When a difference in treatment is decided to be due to more than random chance, what do you call the results?
Statistically significant
If a data set is skewed to the left, how will the mean and median compare?
The mean will be less then the median
What does SSE stand for?
sum of squares for error
If you could choose four numbers from 1, 2, 3, 4 and repeated numbers were allowed (such as 1, 1, 3, 2), which set of four numbers would give you the largest standard deviation? (No calculations needed.
1, 1, 4, 4
A five number summary contains the min, max, Q1, Q3, and what other value?
Median
Which of the following summary measures cannot be directly calculated from a boxplot? Mean, sample size, SD
None
Which of the following can never be negative? Mean, SD, median
SD
An outlier in a data set can significantly affect the value of the mean but not the median.
True
A __________ distribution summarizes the information from one variable ONLY, without considering ANY information from another variable.
Marginal
A researcher is trying to predict the linear relationship between January revenue and yearly revenue for her company. The correlation turns out to be .60. How does she interpret this correlation?
Moderate positive linear relationship
Suppose 35% of the women in a poll of Americans support candidate A for president (and 65% do not.) These results make up what kind of distribution?
Conditional distribution of support (yes/no) given women
Mike marks down the gas mileage of his two cars every time he fills them up with gas for 6 months straight. At the end he notes that his Mustang gets better mileage than his Corvette. Is this an experiment or an observational study?
Observational Study
As we heard in lecture, the "average distance from the mean" is measured by the __________________________.
Standard Deviation
Smoker Nonsmoker Total Male 125_____ 376 Female 104 233 337 Total 229 484 713 Above is a two-way table examining the relationship between gender and whether or not a person smokes. What number belongs in the missing cell?
251
A listing of all possible values in a data set and how often they occurred is called a data _____________________.
Distribution
If you add the same positive number to every value of a data set what happens to the standard deviation?
Does not change
If the mean of a data set is large, the standard deviation has to be large also.
False
If there are a few very small values in a data set compared to the rest of the data, the mean will be larger than the median.
False
Suppose the correlation between X =price of a gallon of gasoline and Y = price of a gallon of milk is r = .30 Should we go on and try to make predictions for milk prices using gasoline prices using a straight line?
False
Your boss gives you the following regression equation. X = square feet and Y = selling price Selling price = $5,240 + $33.80 (Number of Square Feet). Does it make sense to interpret the Y-intercept for this equation?
False
Which is more affected by skewness, the IQR or standard deviation?
SD
Correlation is affected by outliers.
True
Suppose 35% of the women in a poll support candidate A for president (and 65% do not.) In the same poll 45% of the men support candidate A for president (and 55% do not). Are gender and support (yes/no) related?
True
You can have two data sets with the same mean but different standard deviations.
True
A two-way table allows us to examine the relationship between _____________________ variables.
two categorical
The personnel department keeps records on all employees in a company. Here is the information they keep in one of their data files: Employee identification number Last name First name Middle initial Department Number of years with the company Salary ($) Education Level (high school, some college, or college degree) Age (years) Which of the following combinations of variables would be appropriate to examine with a scatterplot?
Age and Salary
Suppose you have 4 data sets whose scatterplots all show possible linear relationships. The four data sets have correlations of -0.10, +0.25, -0.90, and +0.80, respectively. Which of the correlations shows the strongest linear relationship?
-0.90
A researcher is trying to use January temperatures to predict latitude. This means January temperature is the X (independent) variable and latitude is the Y (dependent) variable.
True
A two-way table has this name because it contains rows going one way, and columns going the other way.
True
Which type of graph is made from the 5-number summary?
boxplot