Stat 1430 Final
Normal Distribution Shape
-continuous distribution, x from negative infinity to infinity -we can't integrate f(x), so we need to use the Normal Table to calculate the the area underneath the curve
Suppose 1063 people out of a random sample of 2054 adults report that they play video games. What is the standard error of p hat?
.0010
If X is a continuous random variable with f(x) = ½ for 0<x<2, what is the probability that X = 1?
0
continuous random variable
If you cannot tell me what the next highest value is, the variable is this
Suppose X = 3 (constant). Find the mean of X
Mean (3) = Mean(0X + 3) = 0 + 3 = 3
If X has a normal distribution with variance 10 and Y has a normal distribution with variance10 and X, Y are independent, what is the variance of X - Y?
Var(X-Y) = Var(X) + Var(Y) = 10 + 10 = 20
Suppose X = 3 (constant). Find the standard deviation of X.
Variance(3) = Var(0X+3) = 0^2 x Var(X) = 0 SD (3) = square root of 0 = 0 (This should make sense since a constant like 3, has no deviation)
normal approximation
When our sample size is large, we can approximate the binomial distribution
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Calculate the test-statistic. Show formula and all calculations.
Z = (xbar - muo)/(square root of sigma/sq root of n) = (32-28)/(10/square root 40) = 2.53
The formula for a confidence interval for p involves a Z-value. Why is this? (Assume n is large.)
because of the Central Limit Theorem
P(Z < -2.74) = P(Z > 2.74)
true
Binomial Requirements
1.) Two possible outcomes: success and failure 2.) Independent Trials 3.) Fixed number of trials set beforehand 4.) Fixed probability of success for each trial
Suppose you want to estimate the percentage of all OSU students who will take classes this summer. You take a random sample of 200 OSU students and find that 50 of them will take classes this summer. What is the statistic in this problem?
25%
If X is random variable with a standard deviation of 10, then 3X has a standard deviation equal to
30
a population has a mean of 45 and a standard deviation of of 9. a random sample size of 81 is taken from the population. the mean and standard deviation of x bar are?
45 and 1
Suppose the time to serve a customer (X) has a normal distribution with mean 5 minutes and standard deviation 2 minutes. You want to investigate the 5% of customer service times that were the longest. What cutoff point are you looking for?
95th percentile for X
discrete vs. continuous
Discrete: ∑ sum up several probabilities Continuous: ∫ integrate density function over a range f(x) for continuous is analogous to P(x) for discrete f(x) is a DENSITY and can be greater than 1
The Normal Table
Edges: z values Interior: Probability to the left of the given z under the normal curve
Suppose your p-value in a hypothesis test is .055. Using the standards from this class what do you conclude?
Fail to reject Ho
If X is a continuous random variable and its probability density function is f(x), then we know that the values of f(x) must always lie between 0 and 1.
False (can be greater than 1)
If you want to estimate the population mean, which technique do you use?
Find a confidence interval for μ
Suppose you have a multiple choice test and each question has 4 possible answers. If someone guesses, they would be expected to get 25% of the problems right in the long term. You believe your students did better than just guessing on your exam. If you conducted a hypothesis test for this, what would your hypotheses be?
Ho: p = .25 and Ha: p > .25
Suppose a random variable X is continuous with density function f(x) and x is any number greater than 0. You want to find P(X≤5). What do you do?
Integrate f(x) from 0 to 5.
Suppose X is the salary of an employee at your company. X has a mean of $50,000 and standard deviation $5,000. Everyone at your company gets a $10,000 bonus, but no raise. What's the mean and SD of the new salaries?
Mean(X + 10,000) = Mean(X) + 10,000 = 50,000 + 10,000 = $60,000Var (X + 10,000) = Var(X) = 5,000^2 = 25,000,000 (units of variance don't make sense)SD(X + 10,000) = SD(X) = $5,000
Suppose X is the salary of an employee at your company. X has a mean of $50,000 and standard deviation $5,000. Everyone at your company gets a $20% raise. What is the mean and SD of the new salaries?
Mean(X) = 50,000 Note: A raise of 20% gives a new salary of X + .20X = 1.20XMean (1.20X) = 1.20xMean(X) = 1.20(50,000) = $60,000Var(1.20X) = 1.20^2 x Var(X) = 1.44 x (5,000^2) = 36,000,000 (units of variance don't make sense)SD(1.20X) = sq root of 36,000,000 = $6,000
If X has a normal distribution with mean 10 and Y has a normal distribution with mean 10, what is the mean of X - Y?
Mean(X-Y) = Mean(X)-Mean(Y)=10-10 = 0
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Did we need the Central Limit Theorem to do any calculations or draw conclusions in the above problems regarding the average monthly internet access fee?
No because X had a normal distribution to start with
true/false: We may get different conclusions for the same problem if hypotheses are different. For example: conclusion for H0: μ=8 and HA: μ<8 might be different with conclusion for H0: μ=8 and HA: μ≠8.
True
Suppose you want to estimate the percentage of all OSU students who will take classes this summer. You take a random sample of 200 OSU students and find that 50 of them will take classes this summer. What is the population in this problem?
all OSU students
example of a random variable
height of a student in a class (We don't know before we measure the student what his or her height is, but we have an idea of what values are more likely than others)
discrete random variables
if you can tell the next highest possible value, the variable is this
Suppose you have a confidence interval for the population mean. If the population standard deviation were to increase (and everything else stayed the same) then the margin of error for your confidence interval would:
increase
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Up to this point you were testing to see if the average fee is more than $28 per month. Now suppose you want to test whether or not the average fee is $28. (Assume all the other information given in the original problem stays the same.) What would happen to each of the following items under this new testing scenario? Circle your answers; no calculations or explanations needed. The p-value would
increase (double)
The Milbert Marketing Group recently conducted a study of buying habits of the residents of Apex, North Carolina. From the Apex telephone directory they randomly selected 200 individuals and asked them how much they spent on purchases of DVD movies in the past month. They found that these individuals had spent an average of $32 with a margin of error of $4.4 (constructed using 95% confidence.) If X has a Normal Distribution, then the (sampling) distribution of x bar is
is exactly normal for any n
If you increase n, what happens to the margin of error of your confidence interval for p?
it decreases
x has some unknown distribution, what do we know about the distribution of x bar?
it has an approximate normal distribution if n is large enough
What happens to the standard deviation (aka standard error) of x bar if you have to reduce the sample size?
it increases
As n increases, what happens to mu of x bar
it stays the same
We collected some data and wanted to know if the data came from a normal distribution. We made a normal probability plot and it showed a straight line. What does this tell us?
it tells our data comes from a normal distribution
you can use a normal distribution to approximate a binomial distribution when which of the following conditions are met?
np greater than or equal to 10, n(1-p) greater than or equal to 10
If you roll a die 100 times and find the average, and I roll a die 200 times and find the average, who is more likely to have an average that is greater than 5?
you (Because you roll fewer times, your results have a higher chance to be further away from the mean (3.5). The more times you roll the closer to 3.5 your results are likely to be.)
Which of the following is a test statistic?
z score
Suppose the total number of calls coming into a single helpline has a mean of 10 per hour with standard deviation 5. What is the mean and SD of the total number of calls per hour if there are twice as many helplines as before?
Let X = number of calls per hour per helplineMean(2X) = 2 x Mean(X) = 2x10 = 20 Var(2X) = 4xVar(X) = 4 x 5^2 = 4x25 = 100 (note: variance = SD squared)SD (2X) = sq root of 100 = 10 calls
Suppose the total number of calls coming into a single helpline has a mean of 10 per hour with standard deviation 5. What is the mean and SD of the total number of calls per hour if there are 2 more helplines than before?
Let X = number of calls per hour per helplineMean(X + 2) = Mean(X) + 2 = 10+2 = 12 callsVar(X+2) = Var(X) = 5^2 = 25 (variance = std deviation squared)SD (X+2) = sq root of 25 = 5 calls
Suppose X is the salary of an employee at your company. X has a mean of $50,000 and standard deviation $5,000. Everyone at your company gets BOTH a 20% raise AND a $10,000 bonus. What's the mean and SD of the new salaries?
Mean(1.20X + 10,000) = 1.20Mean(X) + 10,000 = 60,000 + 10,000 = 70,000Var(1.20X + 10,000) = 1.20^2 x Var(X) + 0 = 36,000,0000 (units don't make sense)SD(1.20X + 10,000) = $6,000
A random variable X has a normal distribution with a mean of 40 and a variance of 25. Given that X = 20, its corresponding z- score is
-4.0
Increasing Sample Size of Sampling Distributions
-Intuitively, if you find the averages of larger samples, these averages will be less spread out than single observations -eventually, as you continue to increase n , you will sample the whole population each time, which would make sigma x bar=O
What's the same between Continuous and Discrete?
-Probability/Density always greater than 0 -Area under the curve/columns sums to 1
The variance of X+Y equals the variance of X plus the variance of Y if X and Y are independent.
true
true/false: A 95% confidence interval is wider than a 90% confidence interval if all else remains the same.
true
true/false: A large p-value means you have little evidence against Ho.
true
true/false: A p-value can change if you take a new sample.
true
true/false: The Central Limit Theorem only applies to the SHAPE of the distribution of X-bar, not to its mean or standard error.
true
true/false: The smaller α (alpha) gets, the harder it is to reject Ho
true
true/false: There is a 100% chance that your sample mean lies within your 95% confidence interval.
true
true/false: we use statistics to measure population parameters. not the other way around
true
Suppose you want to estimate the percentage of all American families planning a vacation for the summer. Your confidence interval is 30% to 40%. What was your value of p hat?
unknown
Central Limit Theorem
-The Central Limit Theorem is what tells us that our averages will be normally distributed if we take large samples -It's not something we calculate -We use this theorem to set up Hypothesis Tests -We use the normal distribution because of the way the sampling distribution is constructed
Sampling Distribution Notation
-X: a single observation of our variable -X-bar: our discovered average based on our sample -mu of x: the expected value of x -mu of x bar: the expected value of x bar (which is equal to the expected value of each observation (mu of x) -mu: the actual mean of the population
The owner of a restaurant claims that the time a customer takes to decide what they want on the menu (X) has a mean of 3.2 minutes. (Assume X has a normal distribution with standard deviation .8 minutes.) You believe the mean is more than 3.2 minutes. Your sample of 25 customers chosen at random has a mean time of 3.6 minutes. What is your p-value?
0.0062
The Milbert Marketing Group recently conducted a study of buying habits of the residents of Apex, North Carolina. From the Apex telephone directory they randomly selected 200 individuals and asked them how much they spent on purchases of DVD movies in the past month. They found that these individuals had spent an average of $32 with a margin of error of +- $4.4 (constructed using 95% confidence.) In this instance the population of interest is
All residents of Apex, NC.
Bob's Z-score on an exam is 0.25. What is the correct interpretation? Circle one
Bob's score is 0.25 standard deviations above the mean
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Estimate the average monthly internet access fee based on the information given in the original problem. Show any formulas needed, show all work and use proper units.
CI for mean: xbar +/- 1.96 (sigma/square root of n)= 32 +/- 1.96(10/square root of 40)(go back to info before pbm 12)= 32 plus or minus 3.10 = (28.90, 35.10) dollars
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Suppose the p-value for this problem turns out to be .0057 (do not calculate or dispute). Based on this p-value, what do you conclude about Ho and about the magazine's claim?
Conclusion about Ho (and explain why)Our p-value = .0057 is less than our cutoff (significance level) alpha= .05Conclusion: reject Ho. Conclusion about magazine's claim (in common language - 10 words or less)"Based on our data, we reject the claim that the monthly internet access fee for all households is $28. We conclude it's more than that." (from our Ha)
the uniform distribution
Continuous distribution Shape: Horizontal straight line-This is because the density is the same for each point
which of the following statements about margin of error are false a.Increasing the sample size increases the margin of error. b.Increasing the population standard deviation increases the margin of error. c.Increasing the confidence level increases the margin of error.
a
A survey is conducted to study the backgrounds of professional golfers. A random sample of 30 professional golfers was surveyed. The question they were asked was: "Did your parents play golf?" 10% of the golfers said yes. true/false: the 10% in this problem is population parameter
false (statistic)
true/false: x bar = mu of x
false (the mean of x bar equals mu, not x bar itself)
the mean of x is equal to x bar
false (the mean of x is equal to the mean of x bar)
A survey is conducted to study the backgrounds of professional golfers. A random sample of 30 professional golfers was surveyed. The question they were asked was: "Did your parents play golf?" 10% of the golfers said yes. Let X be the number of professional golfers that said YES to your survey question. What is the name of the distribution for X?
binomial distribution
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Up to this point you were testing to see if the average fee is more than $28 per month. Now suppose you want to test whether or not the average fee is $28. (Assume all the other information given in the original problem stays the same.) What would happen to each of the following items under this new testing scenario? Circle your answers; no calculations or explanations needed. The alternative hypothesis would:
change
Suppose your confidence interval for the percentage of all American families planning a vacation for the summer is 30% to 40%. Now suppose the media reported that 50% of American families go on vacation during the summer, would you agree or disagree with them, based on your data?
disagree
A p-value in hypothesis testing means the same thing as the sample proportion.
false
f you have a Z-value of Z = 0.6, that means 60% of the data lie below you.
false
if you take a random sample of 50 M&M's and record the number of M&Ms of each color, you have a binomial distribution.
false
true/false: you need the CLT to be able to say that the standard deviation of x bar equals (sigma divided by square root of n)
false (CLT only applies to shape)
true/false: Let's say we have a random sample of n=61 and are testing a two sided hypothesis. The calculated z value is .04, is this sufficient evidence to reject the null hypothesis? Why or why not?
false (Z is the test-statistic, not the p-value. If Z is small, your data is close to the claim in Ho, so not much evidence against it.)
true/false: If you are not able to prove the alternative hypothesis, this means the null hypothesis is correct.
false (just means you don't have enough evidence against it)
true/false: if someone claims the population mean is 5 and you believe it's greater than 5, you write the following hypotheses: Ho: x-bar=5 and Ha: x-bar > 5
false (x-bar is about the data, need mu's)
is p hat the sample proportion or the population proportion
sample proportion
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Up to this point you were testing to see if the average fee is more than $28 per month. Now suppose you want to test whether or not the average fee is $28. (Assume all the other information given in the original problem stays the same.) What would happen to each of the following items under this new testing scenario? Circle your answers; no calculations or explanations needed. The significance level would
stay the same
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Up to this point you were testing to see if the average fee is more than $28 per month. Now suppose you want to test whether or not the average fee is $28. (Assume all the other information given in the original problem stays the same.) What would happen to each of the following items under this new testing scenario? Circle your answers; no calculations or explanations needed. The test statistic (Z) would:
stay the same
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) Up to this point you were testing to see if the average fee is more than $28 per month. Now suppose you want to test whether or not the average fee is $28. (Assume all the other information given in the original problem stays the same.) What would happen to each of the following items under this new testing scenario? Circle your answers; no calculations or explanations needed. Your decision about whether or not to reject Ho in this particular problem would:
stay the same
The Milbert Marketing Group recently conducted a study of buying habits of the residents of Apex, North Carolina. From the Apex telephone directory they randomly selected 200 individuals and asked them how much they spent on purchases of DVD movies in the past month. They found that these individuals had spent an average of $32 with a margin of error of +-$4.4 (constructed using 95% confidence.) In this instance the parameter of interest is
the average amount spent on DVDs by all residents.
An internet magazine reports the monthly internet access fee for all households has a normal distribution with mean $28. You think the mean fee is actually more than that. You select a random sample of 40 households and find the average monthly internet access fee is $32. (Assume the population standard deviation is $10.) What are your null and alternative hypotheses? Label them clearly.
Ho: mu = 28Ha: mu > 28
The Central Limit Theorem is important in statistics because:
It says for n ≥ 30, and any distribution that's not normal, the sampling distribution of Xis approximately normal.
random variable
Some variable that we don't know the exact value. We know the distribution, or the likely values, but don't know beforehand the exact value.
The Milbert Marketing Group recently conducted a study of buying habits of the residents of Apex, North Carolina. From the Apex telephone directory they randomly selected 200 individuals and asked them how much they spent on purchases of DVD movies in the past month. They found that these individuals had spent an average of $32 with a margin of error of +-$4.4 (constructed using 95% confidence.) In this instance the sample is
The 200 individuals contacted.
if x has a normal distribution when does x bar have a normal distribution?
the central limit theorem is not needed, n can be any size
Suppose you take a random sample of size n and look at the average x bar. As n increases, which of the following statements is true?
the mean of x bar stays the same