Stats Exam 1
Suppose P(A) = .2, P(B) = .4 and P(A|B) = .3. Find P(A and B).
.12
Suppose P(A) = .2 and P(A|B) = .2 so A and B are independent. Find P(A|Not B).
.2
Suppose 80% of students wear backpacks. You randomly choose 2 students. What is the chance that exactly one of them is wearing a backpack?
.32
If r = .81, what is the value of the coefficient of determination?
.66
Suppose 40% of OSU students have internships over the summer, and of those who have internships, 60% of them are business majors. What percentage of all OSU students are business students and have internships over the summer?
24%
Suppose 20% of OSU students are business majors, and of the business majors, 60% have internships over the summer. Of those who are not business majors, 30% have internships over the summer. What percentage of ALL students have internships over the summer?
36%
A list of the data that occurred in a data set and how often it occurred is called what?
A data distribution
Making comparisons, avoiding bias, and having enough data are the three criteria for what, according to our notes?
A good experiment
The equation of a regression line is Y = 5 + 10X where X = hoursstudied and Y = exam score.
As study time increases by 1 hour exam score increases by 10 points.
If you add 10 to every value of a data set, which of the following will also increase by 10?
Both the median and mean will increase by 10.
What type of variable was not accounted for in an experiment but affected the results?
Confounding variable
Going to the oval and asking OSU students for their opinion on tuition is what type of sample?
Convenience sample
A researcher is trying to determine the January temperature in regions of the United States using the degrees of latitude. After collecting data, she creates a scatterplot. Given the relationship the researcher is trying to predict, the latitude is the dependent variable and the temperature is the independent variable. (T/F)
False
If you switch X and Y the sign of the correlation changes. (T/F)
False
Outliers significantly affect the value of the median.
False
Simpson's paradox always happens when you a dd a third variable to your two-way table. (T/F)
False
The sample size can be found by looking at the data in a boxplot.(T/F)
False
The units of r, the correlation coefficient, are the same as the X variable. (T/F)
False
There are different amounts of data in each section of a boxplot. (T/F)
False
Which statistic measures the distance between the middle 50% of the data? (T/F)
False
You can never interpret the Y-intercept of the regression line. (T/F)
False
Residuals are found by taking
Observed - predicted
If you are comparing two conditional distributions, (for example Opinion given Male compared to Opinion given Female), and the results are the same, what do you conclude?
Opinion and gender are independent.
Which of the following is the same as P(A or B)?
P(A or B or both) AND P(At least one)
How do we use correlation and coefficient of determination (R-squared?)
R-squared measures any kind of relationship; correlation only measures linear relationships.
What type of bias is minimized by making a survey confidential and/or anonymous?
Response bias
Which measure of variability measures the concentration of the data around the mean?
Standard deviation
When a difference found in the results is larger than what we think is due to chance, what do we call the results?
Statistically Significant
What type of sample compares subgroups within the population?
Stratified random sample
What does SSE stand for?
Sum of Squares for Error
Which of the following summary measures can be directly calculated from a boxplot?
The IQR
Which statistic measures the distance between the middle 50% of the data?
The IQR
The variable whose effects you want to study in an experiment is called the what?
The factor
If a data set is skewed to the left, how will the mean and median compare?
The mean will be less than the median.
Which measure of center splits the ordered data in half?
The median
OSU wanted to research how much money students spent on textbooks each semester. From a random sample of 200 students, they found that the average amount spent on textbooks for a semester is $300 and the distribution is skewed right. This indicates that:
The median amount spent on textbooks would be less than $300.
If you are comparing two conditional distributions, (for example Purchase Given Saw the Ad compared to Purchase given Didn't See the Ad), and the results are the same, what do you conclude?
There is no relationship between the two variables.
A two-way table allows us to examine the relationship between categorical variables. (T/F)
True
Correlation is affected by outliers and skewness. (T/F)
True
If A and B are independent, all you need is P(A) and P(B) to calculate P(A or B). (T/F)
True
If there are a few very large values in a data set compared to the rest of the data, the mean will be larger than the median. (T/F)
True
If you switch X and Y the slope and Y-intercept of the regression line will change. (T/F)
True
Suppose your data represent revenues from a group of 20 stores in a retail chain across the country, and revenue is measured in millions of dollars. The first quartile of this data set would also be measured in millions of dollars. (T/F)
True
The mean is influenced by outliers (values that are much larger or much smaller than the rest of the data.) (T/F)
True
Undercoverage happens during what stage of the sampling process?
When the sample is being selected.
A manager of a retail store is interested in the relationship between a person's annual income and their total purchase amount. Could he measure this relationship by finding the correlation?
Yes, because income and total purchase amount are quantitative variables.
Suppose you give 10 people a taste test where they each try samples of two different brands of soda. You randomize the order in which the soda samples are given to the participants. After drinking both samples, they tell you which soda they liked best. What is a/the factor in this experiment?
Brand of Soda
A confidential survey is one in which they cannot link you to your data even if they wanted to. (T/F)
False
A flat histogram contains no variability whatsoever, according to our definition. (T/F)
False
An anonymous survey is one in which they can link you to your data but they promise that they won't do so. (T/F)
False
An influential point is defined to be a point with a large residual. (T/F)
False
Boxplot A and Boxplot B are drawn on the same axes. If Boxplot A is shorter in length than boxplot B, it also has to contain less data than Boxplot B. (T/F)
False
Changing the number of bins will never change the shape of a histogram. (T/F)
False
For A and B to be independent, we need P(A|B) to equal P(A^c |B). (T/F)
False
IQR is affected by outliers and skewness. (T/F)
False
If a point has a negative residual that means the point lies above the line. (T/F)
False
If the coefficient of determination is .81, the value of the correlation must be .9. (T/F)
False
If the conditional distribution is different from the corresponding marginal distribution in a two-way table, we know that the variables are NOT related (T/F)
False
If the correlation coefficient, r, between two variables is 0, we can conclude that there is no relationship between the two variables. (T/F)
False
Suppose you have events A and B. Which of the following is the same as the probability of at least one?
P(A or B)
Suppose 70% of Facebook users have Twitter accounts. Write this as a probability.
P(Twitter | Facebook) = 0.70
Suppose you have 4 data sets whose scatterplots all show possible linear relationships. The four data sets have correlations of -0.10, +0.25, -0.90, and +0.80, respectively. Which of the correlations shows the strongest linear relationship?
-.90
The variable that represents the outcome being measured in an experiment is called what?
Dependent Variable
What does it mean for a sample to be truly random, according to our notes?
Every sample of the same size has the same chance of being selected.
A conditional distribution summaries the information from one variable ONLY, without considering ANY information from another variable. (T/F)
False
Which type of distribution shows the overall percentage in each of the 4 cells of a two-way table?
Joint Distribution
What type of relationship does data with a correlation of -.5 have?
Moderate downhill linear relationship
Which of the following is the complement of "at most 1"?
More than 1
Bob puts an ad in the school newspaper asking people to go to a certain website and take a survey. What type of sample will Bob get?
A self-selected sample.
What do we call the four cells inside of a two-way table of probabilities?
Joint Probabilities
In a histogram, the horizontal (X) axis is the variable you are measuring, and the vertical (Y) axis is the number or percentage of individuals in each group. (T/F)
True
In thinking about the 5-number summary, the percentage of data below Q1 and above Q3 combined is the same as the percentage of data in the IQR. (T/F)
True
The difference between an experiment and an observational study is an experiment randomly assigns subjects to treatments. (T/F)
True