Stats Midterm
What is the range of correlation?
-1 to 1
What is the correct interpretation of a correlation of -0.87?
A moderately strong negative linear relationship
If given "and" probabilities, should you use a table or a tree?
A table. If given only marginal and conditional, use a tree
What is a confounding variable?
A variable you did not include in the study that may have had an effect on the results
The personnel department keeps records on all employees in a company. Here is the information they keep in one of their data files: Employee identification number Last name First name Middle initial Department Number of years with the company Salary Education (coded as high school, some college, or college degree) Age Which of the following combinations of variables would be appropriate to examine with a scatterplot? - Salary and First Name - Age and Salary - Education and Age - Number of years with the company and Education
Age and Salary. Scatterplots only show relationships between quantitative variables.
What is sample space?
All possible outcomes. Events are subsets of sample space.
If you add 10 to every value of a data set, which of the following will also increase by 10? -Both the median and mean will increase by 10 - The median - The mean - Neither the median or the mean will increase by 10
Both the median and mean will increase by 10
Suppose the correlation between X =price of a gallon of gaspline and Y = price of a gallon of milk is r = .40. Then the correlation between the price of a HALF gallon of milk and the price of a HALF gallon of gas must be r = .4/2 = .20.
False. Correlation doesn't change if units change.
Suppose 40% of all OSU students own a Tablet PC and an iPhone. Write this as a probability.
P(PC and iPhone) = 0.40
How do you calculate IQR?
Q3 - Q1
What are the possible design flaws of a survey?
Question Wording and Type of Survey
If you multiply data by a # < 1, what happens to standard deviation?
S decreases
What is a strong correlation? Moderate? Weak?
Strong = +-0.7 Moderate = +-0.5 Weak = +-0.3
What is the most common observational study?
Surveys
What are the three types of shape for a histogram?
Symmetric, skewed right (more data on the left) and skewed left (more data on the right)
What is the coefficient of determination?
The % of variability in r that is explained by x; R-squared The higher R-squared the more changes in x explains variability in y
How do you know if a table shows joint distributions?
The columns do not add up to 100%
In computer data, which data is the bo, or y-intercept?
The constant coefficient. The other is the slope
What is a response variable?
The dependent variable
What is a factor?
The independent variable
A two-way table allows us to examine the relationship between categorical variables.
True
If A and B are independent, all you need is P(A) and P(B) to calculate P(A or B).
True
If there are a few very large values in a data set compared to the rest of the data, the mean will be larger than the median.
True
In thinking about the 5-number summary, the percentage of data below Q1 and above Q3 combined is the same as the percentage of data in the IQR.
True
Suppose your data represent revenues from a group of 20 stores in a retail chain across the country, and revenue is measured in millions of dollars. The first quartile of this data set would also be measured in millions of dollars.
True
True or False: If conditional distribution is the same as the marginal distribution, there is no relationship between the two.
True.
Suppose 20% of OSU students are business majors, and of the business majors, 60% have internships over the summer. Of those who are not business majors, 30% have internships over the summer. What percentage of ALL students have internships over the summer?
Use the Law of Total Probability. P(B) P(I I B) + P(NB)P(I I NB) = 0.36
What are confounding variables?
Variation present but accounted for that can affect results
Does standard deviation have units?
Yes, the same as the original data
How do you compute percent of difference?
(1st-2nd)/2nd = %; means whatever condition is being tested is % more likely on 1st than 2nd.
Which point is on every least square regression line?
(X bar, Y bar)
Suppose you have 4 data sets whose scatterplots all show possible linear relationships. The four data sets have correlations of -0.10, +0.25, -0.90, and +0.80, respectively. Which of the correlations shows the strongest linear relationship?
-0.90
Suppose a school figures that 70% of adults will purchase a candy bar from a 6th grader during a fund-raiser. A sixth grader randomly selects 10 adults. What's the chance that at least one of them will buy a candy bar?
1 - (1-0.70^10)
What is the complement of neither?
At least one
What is the complement of at most 1?
Both
Which of the following is NOT a typical way that experimenters use to avoid bias in experiments? A. Randomly assign subjects to treatments B. Make sure the researcher does not know which subject got which treatment C. Randomly select the subjects to participate in the experiment D. All of the above
C.
What are the advantages of the boxplot?
Can immediately see median and IQR and whether or not data is skewed
What are the weaknesses of boxplots?
Can't see the symmetry or modes
What is slope?
Change in y variable/1 unit change in x variable
Suppose you have the following probability distribution. What is the name of this distribution? Agree (n=100) Male = 0.45 Female = 0.55
Conditional distribution of gender given agree
What are the three types of biased sample in a survey?
Convenience, Volunteer, and Undercoverage
How do you get conditional distributions from a two way table?
Divide joint ("and") distribution by marginal distribution
What is the difference between observational studies and experiments?
Experiments give treatments, observational studies do not
A conditional distribution summaries the information from one variable ONLY, without considering ANY information from another variable.
False
A researcher is trying to determine the January temperature in regions of the United States using the degrees of latitude. After collecting data, she creates a scatterplot. Given the relationship the researcher is trying to predict, the latitude is the dependent variable and the temperature is the independent variable.
False
Your boss gives you the following regression equation. Selling price = $5,240 + $33.80 (Number of Square Feet). What is the correct interpretation of the slope of this equation?
For every additional square foot, we expect a home's selling price to increase by $33.80.
Which of the following summary measures can be directly calculated from a boxplot? - IQR - Standard deviation - Mean - Sample size
IQR
What are the criteria for a good experiment?
Makes comparisons Avoids bias Collect enough data
If data is skewed right, how does mean relate to median?
Mean is greater than (>) median
If data is skewed left, how does mean relate to median?
Mean is less than (<) median
What is the 5 number summary for boxplots?
Minimum, Q1, Q2, Q3, and Maximum
What is the complement of "at most 1"?
More than 1
Can standard deviation be negative?
Never
Do bigger boxes mean more data on boxplots?
No
Does correlation have units?
No
Does switching x and y change correlation?
No
If x is out of the range of the data, can you still make predictions?
No
Bob and Bill live in an apartment together. Bob is in the apartment 30% of the time overall. But when Bill is in the apartment, Bob is only there 10% of the time. Let Event A = "Bill is in the apartment", and let Event B = "Bob is in the apartment." Are Events A and B independent?
No. P(B) = 0.30 and P(B I A) = 0.10. They are not equal, meaning they are dependent.
What does n stand for in the correlation equation?
Number of pairs of data
A veterinarian collects data on 100 of his patients who come in every year for their annual check-ups. After 5 years, he compares the health status of the dogs to the cats. What type of study is this?
Observational study
What are residuals?
Observed y - predicted y. If line fits well, residuals should have no pattern, no fanning out, no unusually large values of a residual (outliers in y direction), or no influential points (outliers in x direction)
What is Simpson's Paradox?
Originally comparing a variable gets one set of results and then, these results may be reversed if a 3rd variable gets involved The most informed data set is the one that is the furthest broken down
Suppose 70% of Facebook users have Twitter accounts. Write this as a probability.
P( Twitter I Facebook) = 0.70
Suppose 40% of OSU students have internships over the summer, and of those who have internships, 60% of them are business majors. What percentage of all OSU students are business students and have internships over the summer?
P(Business and Internship) = 0.40 * 0.60 = 0.24
If you multiply data by a # > 1, what happens to standard deviation?
S increases
If you add the same number to all data points, what happens to standard deviation?
The differences between the data will be the same, so S is the same
What is the Interquartile Range?
The distance taken up by the middle 50% of the data. If high concentration of data in the middle, IQR is small
If a data set is skewed to the left, how will the mean and median compare?
The mean will be less than the median
What do you use as the center in skewed data?
The median
How do you measure variability?
The more data concentrated away from the center, variability increases -or- the bigger the gap between the mean and median, the more variability
How do we interpret the y-intercept?
There must be data near x=0 and x=0 must make sense
An experimenter compares a single brand of popcorn to see how much popcorn is popped using different time settings on the same microwave. The time settings are 1.5 minutes, 2 minutes, 2.5 minutes, and 3 minutes. In this situation, what is the factor?
Time setting
What are the possible implementation flaws of a survey?
Timing, Nonresponse, and Response Bias
The mean is influenced by outliers (values that are much larger or much smaller than the rest of the data.)
True
True or False: Correlation only relates 2 quantitative variables and shows linear relationships only.
True
Can standard deviation equal zero?
Yes, when all data is same and equals sample mean
If 60% of male-owned businesses are successful in their first year, and 60% of female-owned businesses are successful in their first year, are gender and having a successful business in their first year independent?
Yes. P(Business given Male) = P(Business given not Male)
If the conditional distribution is different from the corresponding marginal distribution in a two-way table, we know that the variables are NOT related.
False
A flat histogram contains no variability whatsoever, according to our definition.
False.
The units of r, the correlation coefficient, are the same as the X variable.
False. Correlation has no units.
If you switch X and Y the sign of the correlation changes.
False. Switching X and Y does not affect correlation
If the correlation coefficient, r, between two variables is zero, we can conclude that there is no relationship between the two variables.
False. There is no linear relationship, but there might be other kinds of relationships.
A STAT 1430 student is interested in examining the relationship between the number of bedrooms in a home and it's selling price. After downloading a valid data set from the internet, the student creates a scatterplot and calculates the correlation. The correlation value they calculate is 0.67. This implies that the selling price of a house tends to increase as the number of bedrooms increases.
True
Your boss gives you the following regression equation. Selling price = $5,240 + $33.80 (Number of Square Feet). The residuals have units of dollars.
True
Suppose the equation Y = 3.45 - 2.58 (X) represents a valid regression equation: From this information, we know that X and Y have a negative correlation.
True, because slope is negative
True or False: If conditional distributions are close/same, then there is no relationship between the two.
True. If they are different, then a relationship exists
Bob wants to do a telephone survey based on 100 people. Knowing that some people won't answer the phone, he selects a random sample of 200 names to be safe, so if someone isn't home, he can just call the next person on the list. He continue this way until he gets 100 responses. Will this sampling method create bias in Bob's data?
Yes
Is correlation affected by outliers or skewedness?
Yes, because formula for r includes means and SDs which are affected.
A manager of a retail store is interested in the relationship between a person's annual income and their total purchase amount. Could he measure this relationship by finding the correlation?
Yes, because income and total purchase amount are quantitative variables.
Is standard deviation affected by outliers and skewness?
Yes, because it depends on sample mean which is
Outliers significantly affect the value of the median.
False
Your boss gives you the following regression equation. Selling price = $5,240 + $33.80 (Number of Square Feet). It makes sense to interpret the Y-intercept for this equation.
False
Thomas wants to know what percentage of females buy his product. Thomas knows his customers are 50% male and 50% female, and that 30% of all his customers buy his product. He collects data on people who have already bought his product, and asks their gender. He finds that 70% of the people who bought his product were female. What method would you use to answer Thomas' original question?
Bayes Rule
Boxplot A and Boxplot B are drawn on the same axes. If Boxplot A is shorter in length than boxplot B, it also has to contain less data than Boxplot B.
False
Changing the number of bins will never change the shape of a histogram.
False