stat 1430 - homework/practice midterm/old tests for midterm
false
If A and B are independent then P(A or B) becomes P(A)+P(B)-P(A)P(B).
mean < median
If a data set is skewed to the left, how will the mean and median compare? Correct!
points fall around the horizontal line Y = 0, random, no fan shapes
What should the residual plot look like if the regression line fits the data well?
The sign on the slope
When finding the correlation if you are given R-squared, you take the square root first. Then what do you look at to determine the sign for the correlation?
Whether gender of driver is related to the color being red (or not)
Which of the following questions can be answered by using a two-way table?
sample size, mean, SD
Which of the following summary measures cannot be directly calculated from a boxplot?
bayes
Which technique should you use when you have P(B|A) and some other probabilities but you are looking for P(A|B)?
box plot
Which type of graph is made from the 5-number summary?
SD
Which is more affected by skewness, the IQR or standard deviation?
SD
Which of the following can never be negative?
indep
If there is no relationship between two variables, then the
ye
P(A) = .20, P(B) = .30, P(A and B) = .06. Are A and B independent?
cant tell
Suppose 40% of the new employees at your company are males and 30% of the "old employees" are males. What percentage of ALL the employees are male?
no
Suppose 40% of the new employees at your company are males and 60% of the "old employees" are males. Are gender and type of employee (new/old) independent? Correct Answer
P(have disease | test positive)=.90
Suppose 90% of the patients who test positive for a disease actually have the disease. Write this as a probability.
false
Suppose the correlation between X and Y is .3. If you double all the X values and double all the Y values, the correlation between 2X and 2Y is .6.
false
Suppose the correlation between two variables X and Y is .8. That means the correlation between Y and X is -.8.
false
Suppose the correlation between yards rushing and yards passing is .6. That means the correlation between feet rushing and feet passing is .6 x 12 (since you multiply yards by 12 to convert to feet).
more than 9
Suppose you make 10 telemarketer calls. Which of the following is the complement of "at most" 9 sales?
true
Suppose your data represent revenues from a group of 20 stores in a retail chain across the country, and revenue is measured in millions of dollars. The standard deviation of this data set would also be measured in millions of dollars.
no, AND
The 4 cells of a two way table contain conditional probabilities.
SSE is equal to what?
The Sum of Squares for Error for any line going through the data.
moderate positive linear relationship
The correlation turns out to be .60
tru
You cannot see the mean on a boxplot.
two categorical
A two-way table allows us to examine the relationship between _____________________ variables.
random
All good samples are
true
An outlier in a data set can significantly affect the value of the mean but not the median
false
The sample size can be found by looking at the data in a boxplot.
falsw
The second level branches on a tree are marginal probabilities.
true
The wording of a survey question can affect the results
false
There can be different amounts of data in each section of a boxplot.
smallest
To find the best fitting line, you find the line with the ________ SSE.
true
Conditional probabilities are present on a tree.
false
Conditional probabilities can be found in the cells of a two-way table.
What type of variable was not accounted for in an experiment but affected the results?
Confounding variable
true
Correlation is affected by outliers.
no, no units
Correlation is in the same units as X and Y.
no, linear
Correlation measures the strength and direction of any relationship between X and Y.
16
Two randomly chosen customers are buying laundry soap at the store. 40% of the laundry soap on the shelves is Tide brand. What is the chance that both of them purchase Tide laundry soap?
flase
Undercoverage means you had alot of nonresponse in your sample.
A result due to more than chance
What is statistical significance?
Every sample of that same size has an equal chance of being selected.
What is the statistical definition of a random sample? Choose the best answer.
for disjoint
When is P(A or B) = P(A) + P(B)?
Mean of X, Mean of Y, SD of X, SD of Y, and r.
Which five descriptive statistics do you need to find the equation of the best fitting line?
boxplot
Which is best if you want to compare several data sets regarding shape, center, and variability?`
sd
Which measure of variability measures the concentration of the data around the mean?
boxplot
Which type of graph of quantitative data fits the following description: It shows skewed vs. symmetric shapes; it's easy to determine center and variability; it's good for skewed data sets; and it's easy to compare data sets:
false
You can compare two marginal distributions to see if the corresponding two variables are related.
no
You can never interpret the Y-intercept of the regression line.
nonresponse bias
You randomly choose 100 students from Stat 1350 to take a survey. 60 of them take the survey. What can occur with the other 40 people?
self-selected
You send out an email to all the students in Stat 1430 and you tell them to go to your website and do a survey. 100 students come forward. What kind of sample is this?
categorical
Your company operates in 4 regions and your boss numbers them 1, 2, 3 4. Is this variable quantitative or categorical?
Statistically significant
a difference in treatment is decided to be due to more than random chance
Extrapolation is what?
Plugging in X values outside the range of the data
same as data
SD units
As square feet increase by 1, selling price increases by $33.80
Selling price = $5,240 + $33.80 How do you interpret the slope for this equation?
has same as data
Standard deviation has no units.
true
Starting with the multiplication rule, you can show that P(B|A)=P(A and B)/P(A)
When a difference found in the results is larger than what we think is due to chance, what do we call the results?
Statistically significant
SSE
Sum of Squares for Error
12
Suppose 20% of OSU students are business majors, and of the business majors, 60% have internships over the summer. What percentage of OSU are business majors and have internships over the summer?
Conditional distribution of support (yes/no) given women
Suppose 35% of the women in a poll of Americans support candidate A for president (and 65% do not.) These results make up what kind of distribution?
true
Suppose 35% of the women in a poll support candidate A for president (and 65% do not.) In the same poll 45% of the men support candidate A for president (and 55% do not). Are gender and support (yes/no) related?
negative
Suppose the equation y = 3.45 - 2.58x represents a valid regression equation and X can be used to predict Y. From this information, we know that X and Y have _____________ correlation.
P(Female and Voting for A) = 0.30
Suppose the probability of someone being female and voting for Candidate A is 30%. What is the notation for this probability?
true
The median is not affected by outliers
he variable whose effects you want to study in an experiment is called the what?
factor
f you add the same value to every single number in a data set, the standard deviation also changes by that same value.
false
50th percentile, median, Q2
five number summary contains the min, max, Q1, Q3, and what other value
highest Q3
highest boxplot top
Which is better to use to see the most clear pattern in the d
histogram
Which statistic measures the distance between the middle 50% of the data? a.
iqr
symmetric box plot
median line in middle
False
median must be one of the numbers in the data set
What type of relationship does data with a correlation of -.5 have?
mod downhill linear
surve
most common observational study
Simpson's paradox always happens when you add a third variable to your two-way table.
no
What type of bias is minimized by making
repsonce
quantitatve
scatterplot
less on bottom, line lower on box
skew right
true
slices on a pie chart represent relative frequencies
0
standard deviation of the data set 1, 1, 1, 1
true
starting point can affect the way a graph looks
SD
the "average distance from the mean" is measured by the
In a histogram, the horizontal (X) axis is the variable you are measuring, and the vertical (Y) axis is the number or percenta
true
yea, is possible
two data sets with the same mean but different standard deviations
boxplot
type of graph is best for COMPARING two or more quantitative data sets, a boxplot or a histogram?
independent
x variable in an experiment
true
If you multiply every single number in a data set by the same value, the standard deviation is also multiplied by that same value.
same
If you add the same positive number to every value of a data set what happens to the standard deviation?
temp
If you are predicting gas price using temperature, which is the X variable?
.12
P(A) = .3, P(B|A) = .4, P(B) = .5 What is P(A and B)?
.2
P(A) = .5, P(B) = .4 and P(A and B) = .1 What is P(B|A)?
false
The difference between an experiment and an observational study is an experiment randomly assigns subjects to treatments.
true
The difference between an experiment and an observational study is an experiment randomly assigns subjects to treatments.
the 5 numbers that are marked off on a boxplot
The five-number summary of a single data set of 100 numbers would be which of the following?
IQR is affected by outliers.
false
same
If you add 10 to every value of a data set, what happens to the standard deviation?
yes
A longer box in the boxplot means more variability in the data.
law of total probability
If you sum down the first column of values in this table to get .1 + .3 = .4, which statistical technique are you using behind the scenes?
Stratified Random Sample
If you want to ask the question: "How is the view from your seat?" where your population is the OSU's football stadium, what kind of sample should you use?
condition of patient
In the hospital example in your lecture notes, you found hospital A didn't do as well as hospital B when all the data was in one two-way table, but when an additional variable was examined, hospital A was better in all cases. What was that additional (confounding) variable that ended up reversing the results?
self-selected sample
What kind of sample occurs when you put an ad in the newspaper and ask readers to take your survey
BELOW
If a residual is negative, then that data point lies _________________ the regression line.
it's a weak positive linear relationship, do not proceed with a regression line
If the correlation is .2 what does that tell you about using a regression line to fit your data?
false
If the mean of a data set is large, the standard deviation has to be large also
.4
If the stock went up today, what is the chance it also went up yesterday? Choose the closest answer.
false
If there are a few very small values in a data set compared to the rest of the data, the mean will be larger than the median
false
If variables A and B are related in a certain way in a two way table (with 2 variables), no matter how many other variables you look at in addition to these two, the relationship will always stay the same.
1144
If you could choose four numbers from 1, 2, 3, 4 and repeated numbers were allowed (such as 1, 1, 3, 2), which set of four numbers would give you the largest standard deviation
false
If you have all the information filled in a two-way table, you can fill in all the information on a tree. But not the other way around
.4
P(A) = .2 and P(B) = .3. Suppose A and B are independent. What is P(A or B)?Choose the closest answer.
true
P(A) = P(A and B) + P(A and Bc) where Bc means "B complement"
The sum of the squared residuals equals SSE.
. How to the residuals relate to the SSE?
Residuals are found by taking
Observed - predicted
responce bias
3. When an individual in the sample responds but does not give the correct data, this is called:
75th
3rd, third
If you are predicting U.S. movie box office revenue by using Opening Weekend Revenue, which variable is X and which is Y
Opening Weekend Revenue is X and U.S. Box office revenue is Y.
marginal
A __________ distribution summarizes the information from one variable ONLY, without considering ANY information from another variable.
true
A bar graph using StatCrunch is a good way to visualize a marginal or a conditional distribution.
yes
A boxplot is a one-dimensional graph
false
A confidential survey is one in which they cannot link you to your data even if they wanted to.
false
A confidential survey is one in which they cannot link you to your data.
tru
A confounding variable can cause the results of a two-way table to reverse when it is added to the data set.
moderate
A correlation of -.6 is considered to be what?
false
A flat histogram (with a line straight across) contains no variability whatsoever, according to our definition.
false
A flat histogram indicates no variability in the data.
distribution
A listing of all possible values in a data set and how often they occurred is called a data
tru
A listing of all the possible values of a data set and how often they occur is called a distribution.
False
A researcher is trying to use January temperatures to predict latitude. This means January temperature is the X (independent) variable and latitude is the Y (dependent) variable.
no
Bob and Bill live in an apartment together. Bob is in the apartment 30% of the time overall. But when Bill is in the apartment, Bob is only there 10% of the time. Are Bob and Bill independent in terms of being in the apartment?
false
Bob picks a name from the phone book using a random number generator, and then takes the first 100 names that come after that to make a sample. Is Bob's sample random?
bias from undercoverage
Bob wants to estimate the percentage of people who own a dog in his town, and he goes to all the apartment buildings to carry out his survey. He leaves out all the houses in the town. What kind of bias is this? Correct!
cant tell
Boxplot A and Boxplot B are drawn on the same axes. The box part of Boxplot A is shorter in length than the box part of Boxplot B. What can you tell about the two data sets?
What does it mean for a sample to be truly random, according to our notes?
Every sample of the same size has the same chance of being selected.
true
If 2 corresponding conditional distributions are different from each other in a two-way table, we know that the variables are related. Correct!
yes
If 60% of male-owned businesses are successful in their first year, and 60% of female-owned businesses are successful in their first year, are gender and having a successful business in their first year independent?
false
In a boxplot you can tell the exact pattern of the data set (beyond just whether the data is skewed or symmetric.)
simpsons
In our lecture notes is an example involving two hospitals, A and B. If you compare patient outcomes for the hospitals, B is safer (has a lower death rate). But if you look only at the patients in poor condition, A is safer, and if you only look at the patients in good condition, A is safer. What is going on with this example?a. Bayes Rule
not indepedent
In the 2016 Presidential election, 63% of females voted and 59% of males voted. Does this mean that gender and voting are independent or not independent?
61
In the 2016 Presidential election, 63% of registered voting females actually voted and 59% of registered voting males actually voted. If research shows that 45% of registered voters are female and 55% of registered voters are male, what percentage of ALL registered voters actually voted in 2016?
good exp
Making comparisons, avoiding bias, and having enough data are the three criteria for what, according to our notes?
observation
Mike marks down the gas mileage of his two cars every time he fills them up with gas for 6 months straight. At the end he notes that his Mustang gets better mileage than his Corvette. Is this an experiment or an observational study?
Suppose you have events A and B. Which of the following is the same as the probability of at least one?
P(A or B)
If you are comparing two conditional distributions, (for example Opinion given Male compared to Opinion given Female), and the results are the same, what do you conclude?
a.Opinion and gender are independent. b
Which type of probabilities are in each of the 4 cells of a two-way table of probabilities?
and
Which of the following is (are) true?
If A and B are independent, then P(A|B) = P(A)
false
If the coefficient of determination is .81, the value of the correlation must be .9.
`tru
If the median is closer to Q1 than it is to Q3 then the data is skewed right. (lower line)