Stat 1430 Midterm OSU
Which type of probabilities are in each of the 4 cells of a two-way table?
"And" probabilities
Interquartile Range (IQR)
- Calculated by finding the first and third quartiles - Q3-Q1 - Not as affected by outliers
Correlation
- Denoted by "r" - Can only be between -1 and 1 - Gives strength and direction of LINEAR relationship - Doesn't change if you switch X and Y, but slope does - Coefficient of determination is r squared
Standard Deviation
- Distance measurement, how far each statistical point is from the mean - Susceptible to outliers
The AND Probability
- Notation = P(A and B) - The probability that both events A and B occur (at the same time) - Part = Those with both characteristics A and B - Whole = Everyone
The Marginal Probability
- Notation = P(A) - The probability of event 'A' out of everything is ____ - Part = Those with characteristic 'A' - Whole = Everybody (Bottom right of two-way table)
The Conditional Probability
- Notation = P(A|B) - The probability of event 'A' given that event 'B' has occurred - Part = Those with characteristic 'A' - Whole = Those with characteristic 'B'
Two-Way Tables
- One variable is displayed across the rows, one variable is displayed down the columns (Categorical variables) - Each variable can be displayed using a single bar/pie chart, called the marginal distribution of ________ - The outside of the table are the marginal totals - The inside of the table are the AND totals
The Median
- Taking the middle number - Not as affected by outliers
The Mean (Average)
- The sum of all of the observations divided by how many you had - Susceptible to outliers
If you add the same value to every single number in a data set, the standard deviation changes by the same value. (T/F)
False
The sample size can be found can be found by looking at the data in a boxplot. (T/F)
False
The timing of a question CANNOT affect the results. (T/F)
False
If the coefficient of determination is .81, the value of the correlation must be .9 (T/F)
False (it could be negative)
What is a residual?
How far off each data point is from our regression line
Compliments of Events
If P(A) is the chance that event A occurs, we say that P(A^c) is the chance that event A does not occur - P(A^c) = 1-P(A)
What is nonresponse bias?
Individuals chosen for the sample do not reply
What is an independent variable?
It is what is being recorded and set in the trial, treatment 'A' vs treatment 'B'
Which type of distribution shows the overall percentage in each of the 4 cells of a two-way table?
Joint distribution
What type of relationship does data with a correlation of -.5 have?
Moderate downhill linear relationship
The equation of a regression line is Y=20+5x, where x=hours studied, and y=exam score. Study time ranged from 8-15 hours. Should we interpret the y-intercept?
No
Are disjoint events independent?
No, if we know event "A" occurred, we know that event "B" could not have occurred as well
What is a confounding variable?
Not recorded in the trial, but could affect the outcome
Residuals are found by taking
Observed - Predicted
If you are comparing two conditional distributions, for example, Opinion given male vs Opinion given female, and the results are the same, what can you conclude?
Opinion and gender are independent
How to calculate AND probabilities
P(A and B) = P(A) * P(B|A) or P(B) * P(A|B)
A and A^c are disjoint events. (T/F)
True
A confounding variable can cause the results of a two-way table to reverse when it is added to the data set. (T/F)
True
What are Disjoint Events?
Two events are called disjoint events if they do not occur at the same time
What type of relationship does data with a correlation of -.3 have?
Weak downhill linear relationship
Conditional Probability Example
What is the probability that an OSU athlete is on the football team? - P(football | athlete)= # of players/# of athletes - P(F|A) = 85/1000 = 0.085 --> 8.5%
Marginal Probability Example
What is the probability that an OSU student is on the football team? - P(football)= # of players/# of students - P(football)= 85/50,000 = 0.0017 --> 0.17%
IQR is affected by outliers. (T/F)
False
If the correlation is 0 you know there is no relationship between X and Y.
False
If r=-.7 what is the value of the coefficient of determination?
.49, it is r squared
If r=.81 what is the value of the coefficient of determination?
.66, just .81 squared
Making comparisons, avoiding bias, and having enough data are the three criteria for what?
A good experiment
What is a categorical variable?
A measurement that must be placed in a "group", like color or occupation
Why are residuals important?
A pattern in our residuals indicates that some other factor may be at work (A nonlinear relationship of some kind)
What is a volunteer sample?
A.K.A. self selected bias, people can choose whether or not to answer the survey
What is a dependent variable?
A.K.A. the response variable, it is what is being recorded or the outcome of the trial
What is a quantitative variable?
Any kind of measurement that lies on the number line, like height, speed, weight
Going to the oval and asking OSU students for their opinion on tuition is what type of sample?
Convenience sample
What is a stratified random sample?
Divide populations into subgroups of interest and randomly sample from the groups (like certain seating sections in a football stadium)
What is a convenience sample?
Does not employ any randomization strategy, basically "Asking the first 20 people you see"
What is a simple random sample?
Every person in population has the same chance of being selected (like a random # generator)
What does it mean for a sample to be truly random?
Every sample of the same size has the same chance of being selected
A confidential survey is one in which they cannot link you to your data. (T/F)
False
An anonymous survey is one in which they can link you to your data but they promise they won't do so.
False
What is response bias?
People may have a motive to respond to the survey inaccurately
What type of bias is minimized by making a survey confidential and/or anonymous?
Response bias
What is undercoverage bias?
Some groups in the population are left out of the process of choosing people (When a group of people are inadequately represented)
When a difference found in the results is larger than what we think is due to chance, what do we call the result?
Statistically significant
SSE is equal to what?
The Sum of Squares for Error for ANY line going through the data.
The variable whose effects you want to study in an experiment is called the what?
The factor