STAT 1430
Types of Probability
"AND" = joint probability, or A and B to occur "OR" = Probability of A or B to both occur Marginal = Probability of a single event Conditional = Probability of A given B has occurred
Which type of probabilities are in each of the 4 cells of a two-way table of probabilities?
"And" probabilities
Box Plots notes
- Cant tell what the sample size is - Bigger boxes DO NOT mean more data - boxplots can be horizontal or vertical - You CAN NOT see the mean - there is always 25% of data in each section of a boxplot you are interested in how concentrated the data are within each section - Strength= compare data sets, shows skewedness vs, symmetric shapes, easy to determine center and variability, - Weakness= Cant tell exact shape
P(D|M) P(M|D)
- Out of democrats, who are male -Out of male, who are democrats Probability of A given B has occurred A is what we want to know, B is what we know
standard deviation properties
- Same unit as original data - Never negative - Can equal Zero - Is affected by outliers and Skewness - Multiplying the same number to all values changes standard deviation - Adding a number does not
If you switch X and Y, which of the following will change?
- The slope of the regression line - The Y-intercept of the regression line
Suppose you are using cereal price to predict milk price; they have a correlation of -.70. The average cereal price is $3.00 with standard deviation $0.50 and the average milk price (gallon) is $2.50 with standard deviation $0.25. What is the slope of the regression line?
-.35
A business has 3 branches, A, B, and C. Branch A gets 20% of the business, Branch B gets 50%, and Branch C gets 30%. We know the following information: Branch A: chance of running out of single dollars in a day is .15 Branch B: chance of running out of single dollars in a day is .05 Branch C: chance of running out of single dollars in a day is .10. What is the chance that you go to Branch A and they will have run out of single dollars? Choose the closest answer
.03 .20x.15=.03
A business has 3 branches, A, B, and C. Branch A gets 20% of the business, Branch B gets 50%, and Branch C gets 30%. We know the following information: Branch A: chance of running out of single dollars in a day is .15 Branch B: chance of running out of single dollars in a day is .05 Branch C: chance of running out of single dollars in a day is .10. What is the chance that you go to any branch of this business and they will have run out of single dollars? Choose the closest answer
.10
A business has 3 branches, A, B, and C. Branch A gets 20% of the business, Branch B gets 50%, and Branch C gets 30%. We know the following information: Branch A: chance of running out of single dollars in a day is .15 Branch B: chance of running out of single dollars in a day is .05 Branch C: chance of running out of single dollars in a day is .1 Suppose a Branch has run out of single dollars and you get the phone call. Is it most likely to be Branch A, Branch B, or Branch C?
A or C
Standard deviation has no units.
False
Suppose the correlation between X and Y is .3. If you double all the X values and double all the Y values, the correlation between 2X and 2Y is .6.
False
There can be different amounts of data in each section of a boxplot
False
Correlation measures the strength and direction of any relationship between X and Y.
False - Correlation measures the size and direction of a relationship between two or more variables
If you add the same value to every single number in a data set, the standard deviation also changes by that same value.
False Adding doesnt do anything
Undercoverage means you had alot of nonresponse in your sample.
False Undercoverage = a subgroup of the population is excluded from the very beginning - example: want OSU student opinion on tuition: -took a random sample from dorms want all students opinions you have to do it a different way issues: - sampling procedure is used and only represent the remaining population without the subgroup (excluded all students who don't live in dorms)
The 4 cells of a two way table contain conditional probabilities.
False they are joint distributions
The best fitting line has an SSE of Zero
False, SSE should be small but never equal zero
Which of the following statistics can be negative?
If data permits: Correlation Slope of the regression line Y-intercept of the regression line
If there is no relationship between two variables in a two-way table, then the two variables are said to be:
Independent
The five-number summary of a single data set of 100 numbers would be which of the following?
Min Q1 Median Q3 Max
Suppose 40% of the new employees at your company are males and 60% of the "old employees" are males. Are gender and type of employee (new/old) independent?
No
The equation of a regression line is Y = 20 + 5X where X = hours studied and Y = exam score. Study time data ranged from 8 to 15 hours. Should we interpret the Y-intercept here?
No. You should not interpret the Y-intercept in this situation.
Suppose 40% of the new employees at your company are males and 30% of the "old employees" are males. What percentage of ALL the employees are male?
Not enough information
Table
P(A and B)
P(B and A)
P(A and B)/P(B) or P(B) P(A|B)
"OR" Probability/ at least one
P(A or B)= P(A)+P(B)-P(A and B)
P(A and B) (dependent events)
P(A) x P(B|A)
Law of Total Probability
P(A)=P(A and B)+P(A and not B) on a table, if we add down or across that is the law of total probability.
Tree
P(A)P(B|A)
Complement rue
P(A^C)=1-P(A) - probability of what you want - probability of what you do not want
Bob is a telemarketer and he makes a sale 20% of the time. Suppose he makes ten calls. What is the chance he makes AT LEAST ONE sale?
P(At least 1) = 1 - P(None)= 1 - P(No No No .... No )= 1 - P(No)^10= 1 - .8^10=.8926
Bayes Rule
P(A|B) = P(B|A)P(A)/P(B) We have P(B|A) but we want P(A|B)
Definition of Conditional Probability
P(B | A) = P(A and B) / P(A) probability of B given A has occurred
"AND" probability /joint distribution Example
P(F)=.6 P(M)=.4 P(Yes|F)= .25 P(Yes|M)=.30 P(F and Y)=P(F)P(Y|F) .6x.25=.15 P(M and Y)=P(M)P(Y|M) .4x.3=.12
A confounding variable can cause the results of a two-way table to reverse when it is added to the data set.
True
A listing of all the possible values of a data set and how often they occur is called a distribution.
True
The median is not affected by outliers
True
what it means to be disjoined
Two events, say A and B, are defined as being disjoint if the occurrence of one precludes the occurrence of the other; that is, - they have no common outcome.
Quartiles
Values that divide a data set into four equal parts Q1= 25th percentile Q2= 50th per Q3= Median IQR= Q3-Q1= 75th Quartile
P(A) = .20, P(B) = .30, P(A and B) = .06. Are A and B independent?
Yes
An experiment gives 3 different dosage levels of a drug to 3 groups of people. The first dosage level is a fake pill (or placebo) for comparison. We measure the blood pressure of the participants before and after the study and write down the amount by which blood pressure changed. What is the response variable?
blood pressure change
Which is best if you want to compare several data sets regarding shape, center, and variability?
boxplot
Your company operates in 4 regions and your boss numbers them 1, 2, 3 4. Is this variable quantitative or categorical?
categorical
The second level branches on a tree are marginal probabilities.
false it is the first branch
Which is better to use to see the most clear pattern in the data?
histogram
If the correlation is .2 what does that tell you about using a regression line to fit your data?
it's a weak positive linear relationship, do not proceed with a regression line
Undercoverage
leaving a group out occurs when some groups in the population are left out of the process of choosing the sample
self-selected sample
members of a population can volunteer to be in the sample EX. Make an ad and people reach out to you if they want to participate
A correlation of -.6 is considered to be what?
moderately strong .1-.5= Weak .7-.10=Strong
You randomly choose 100 students from Stat 1350 to take a survey. 60 of them take the survey. What can occur with the other 40 people?
nonresponse bias
convience sample
only members of a population who are easy to reach are selected EX. Catching people at the union bc its close by to where you live
response variable (dependent variable)
result or change that occurs due to the experimental variable what comes out of the experiment
median greater than mean
skewed left
mean greater than median
skewed right
Which of the following is in the same units as the original data?
standard deviation Q1 y-intercept of the regression line
Experiements are.. than observational study
stronger
A boxplot is a one-dimensional graph
true
If the median is closer to Q1 than it is to Q3 then the data is skewed right.
true If median was closer to Q3 itll be skewed left
A longer box in the boxplot means more variability in the data.
true variability = how spread out the data is
Confidentiality is ___ than anonymity
weaker
Simpson's Paradox
when averages are taken across different groups, they can appear to contradict the overall averages
IQR is affected by outliers.
False
If the correlation is 0 you know there is no relationship between X and Y.
False
Correlation is in the same units as X and Y.
False
P(A) = .3, P(B|A) = .4, P(B) = .5 What is P(A and B)?
.12 P(A and B)= P(A) P(B|A)
If A and B are independent events with P(A) = 0.20 and P(B) = 0.60, then P(A|B) is:
.20 P(A)P(B)/P(B)
P(A) = .2 and P(B) = .3. Suppose A and B are INDEPENDENT . What is P(A or B)?Choose the closest answer.
.4 1. P( A or B) = P(A)+P(B)-P( A and B) 2. Find P( A and B) independent = P(A) x P(B) 3. .3x.2-.06=.44
If the stock went up today, what is the chance it also went up yesterday? Choose the closest answer.
.4 P(A and B) / total for A
if r = -.7, what is the value of the coefficient of determination?
.49
What is the chance that they did the SAME THING on both days? Choose the closest answer.
.5 P(A and B) + P(Not A and Not B) / total for everything
P(A) = .2 and P(B) = .3. Suppose A and B are DISJOINT. What is P(A or B)?Choose the closest answer.
.5 P(A or B) disjoint = P(A)+P(B)
P(A and B) disjoint
0
A company owner has 12 sales representatives and she finds a .85 correlation between years in sales and number of sales. The regression analysis is above. What is the slope of the regression line?
49.41
What % of people were employed and 25 or over? 70/100 70/200 70/150
70/200 were looking for % of PEOPLE , so it has to be total (200)
The third quartile is the same thing as the _____________ percentile
75th
What % of employed people were 18-25? 80/200 80/150 80/100 None of the other choices is correct
80/150 were looking for % of EMPLOYED total has to be 150
What is the conditional distribution of age for those who are employed? 80/150 and 70/150 80/100 and 20/100
80/150 and 70/150 conditional distribution= a distribution of values for one variable that exists when you specify the values of other variables.
What is statistical significance?
A result due to more than chance
A business has 3 branches, A, B, and C. Branch A gets 20% of the business, Branch B gets 50%, and Branch C gets 30%. We know the following information: Branch A: chance of running out of single dollars in a day is .15 Branch B: chance of running out of single dollars in a day is .05 Branch C: chance of running out of single dollars in a day is .10 Which Branch is most likely to run out of single dollars in a day?
C they have a higher % of ppl coming in and they have a .03% chance
What is qualitative data?
Data based off things other than numbers height, age, etc..
marginal distribution
Distribution of values of that variable among all individuals described by the table. P(A) P(B) P(A') P(B')
How to determine if P(A and B) are independent
Events A and B are independent if the equation - P(A and B) = P(A) · P(B) - P(A|B)=P(A) - P(A|B)=P(A| not B) - Random sample
What is the statistical definition of a random sample?
Every sample of that same size has an equal chance of being selected.
What does it mean for a sample to be truly random, according to our notes?
Every sample of the same size has the same chance of being selected.
A confidential survey is one in which they cannot link you to your data.
False
A flat histogram indicates no variability in the data.
False
Bob picks a name from the phone book using a random number generator, and then takes the first 100 names that come after that to make a sample. Is Bob's sample random?
False
Extrapolation is what?
Plugging in X values outside the range of the data
Units of residuals
Same as the units of Y Residual= Observational value of Y- Predicted value of Y
You send out an email to all the students in Stat 1430 and you tell them to go to your website and do a survey. 100 students come forward. What kind of sample is this?
Self-selected sample
Response Bias
Someone answers inaccurately a systematic pattern of incorrect responses in a sample survey
Which measure of variability measures the concentration of the data around the mean?
Standard deviation
If you want to ask the question: "How is the view from your seat?" where your population is the OSU's football stadium, what kind of sample should you use?
Stratified Random Sample
Complement rule example At most
Telemarketer P(yes)=10% makes 3 calls what is the chance of getting at most 1 yes? P(NO)= 1-P(yes).10 P(NO)=.9 (.9)^3 = P(no YESes) =.729 or 72.9% (.1)(.9)(.9)x3= .243 P(at most 1 yes) =.729+.243= 9.72
If you are predicting gas price using temperature, which is the X variable?
Temperature
SSE is equal to what?
The Sum of Squares for Error for any line going through the data.
conditional distribution
The distribution of one variable restricted to a single row (or column) of another variable in a two way table. A conditional distribution is found by dividing the values in the row (or column) by the row (or column) total. - Pie chart
When finding the correlation if you are given R-squared, you take the square root first. Then what do you look at to determine the sign for the correlation?
The sign on the slope