Central Limit Theorem, combined
Central Limit Theorem....
: For any population with a mean and standard deviation, the distribution of sample means for a sample size (N) will approach a normal distribution with a mean and standard deviation of standard dev/square root of sample size as N approaches infinity
(T/F) Obtaining a lot of samples and calculating their means is pretty close to representing the actual population
true
standard error equation
(σ / √n)
What can we do to increase power?
- increase sample size
2 Properties of Distribution of Sample Means
1. Mean sample =Mean population Mean 2. Normally distributed
Standard Error of the Mean tells....
: tells how well does a sample mean represent the mean
Which of these is a correct null hypothesis? equation A H 0: μ = 12 B H 0: μ > 12 c Ho: x(bar) = 12
A
Q4c State why, in answering part (b), you did not need to use the Central Limit Theorem.
A machine produces steel rods with LENGTHS THAT ARE NORMALLY DISTRIBUTED
Alternative Hypothesis
Ha
If Sample Size Increases and The Standard Error of the Mean Decreases, how does this represent the population mean?
Represents the population mean better
Sample Size Increases.....
The standard error of the mean decreases
Q8bii State why use of the Central Limit Theorem was required in calculating this confidence interval.
The taxi journey times are not known to be normally distributed
Q3c State why, in part (a), use of the Central Limit Theorem was not necessary.
The volume, in millilitres, of lemonade in mini-cans may be " ASSUMED TO BE NORMALLY DISTRIBUTED. "
Q11aiii State why use of the Central Limit Theorem is not required in answering part (a)(i).
The weight of sand in a bag can be MODELLED BY A NORMAL RANDOM VARIABLE
Q7aii State why, in calculating your confidence interval, use of the Central Limit Theorem was not necessary.
The weights of packets of sultanas may be ASSUMED TO BE NORMALLY DISTRIBUTED
Ho: A diet of fast foods has no effect on liver function What is two way Ha? One way Ha ?
Two: A diet of fast food affects liver function One: A diet of fast food increases liver function One: A diet of fast food decreases liver function
Q2c State, with justification, whether you made use of the Central Limit Theorem in constructing the confidence interval in part (a).
Yes. The volumes of the population is not known to be normally distributed. (No need to mention sample size here.)
Exponential Distribution
a continuous random variable (RV) that appears when we are interested in the intervals of time between some random events
Uniform Distribution
a continuous random variable (RV) that has equally likely outcomes over the domain, a < x < b; often referred as the Rectangular Distribution because the graph of the pdf has the form of a rectangle.
Normal Distribution
a continuous random variable (RV) with pdf where μ is the mean of the distribution and σ is the standard deviation.
Mean
a number that measures the central tendency; a common name for mean is "average." The term "mean" is a shortened form of "arithmetic mean."
X(bar) ~ N(μ, (σ / √n)) is....
sampling distribution of the mean
sample parameters,, mean: Sd:
x bar s
population parameters: mean: Sd:
μ σ
Standard Error of the Mean can be stated as:
"How far can we expect the mean to be from the mean?"
Ho: Growth rates of forest trees are unaffected by increases in carbon dioxide levels in the atmosphere. Two Ha: One Ha:
-Growth rates of forest trees are affected by increases in carbon dioxide levels in atmosphere - Growth rates of forest trees increase by increases in carbon dioxide levels in atmosphere - Growth rates of forest trees decrease by increases in carbon dioxide levels in atmosphere
2 Influencing Factors of Standard Error of the Mean
1. Amount of variability in the population 2. size of the sample.
Hypothesis Testing Steps
1. Check Assumptions 2. Hypotheses 3. use sample data to collect an estimate of that parameter 4. Compare your estimate to the claim (critical value) 5. Make a conclusion about the claim (reject or fail to reject)
Why does Mean 1 Not Equal Mean 2.
: because of random error and normal distribution
Normal distribution of Distribution of Sample Means
: regardless of shape of population mean, the distribution of means is ALWAYS normal.
Standard Error of the Mean equation is the same as the
: same as the Central Limit Theorem Equation
Which hypothesis should be written as an inequality? A the alternative hypothesis B the null hypothesis C either the alternative or the null hypothesis
A
Which of these is NOT a correct alternative hypothesis to correspond with H 0: μ = 8? A H a: μ ≠ 8 B H a: μ ≤ 8 C H a: μ > 8
B
Central Limit Theorem (CLT) tells us that for any population distribution, if we draw many samples of a large size, nn, then the distribution of sample means, called the sampling distribution, will:
Be normally distributed. Have a mean equal to the population mean, μ. Have a standard deviation equal to the standard error of the mean, σ / n‾ √σ/n
The Central Limit Theorem (CLT) tells us that for any population distribution, if we draw many samples of a large size, nn, then the distribution of sample means, called the sampling distribution, will:
Be normally distributed. Have a mean equal to the population mean, μ. Have a standard deviation equal to the standard error of the mean, σ/n‾√σ/n.
Which of these is NOT a correct null hypothesis? A H 0: μ 1 = μ 2 B H 0: μ 1 - μ 2 = 0 C H 0: μ 1 < μ 2
C
CLT Practical Rules Commonly Used #2
If the original population is itself normally distributed, then the sample means will be normally distributed for any sample size n (not just the value of n larger than 30)
Q9bi Explain the reason for the statistician's statement.
If you go three standard deviations below the mean, you're at - 115 minutes. This is impossible but three standard deviations below the mean is not unusual in a Normal Distribution.
Q6bi Explain why Y is unlikely to be normally distributed.
If you go two standard deviations below the mean, you're at - 13 minutes. This is impossible but two standard deviations below the mean is not unusual in a Normal Distribution.
Q12 Has the CLT been used anywhere in this question?
No. " The weight of rice in a packet may be modelled by a normal distribution. "
Q1 Are we using CLT in this question?
No. The length of one-metre galvanised-steel straps used in house building " MAY BE MODELLED BY A NORMAL DISTRIBUTION "
Q10bii State why the distribution of (Ybar), the mean of a random sample of 60 single visits, is approximately normal.
Since the sample size is over 30, the CLT applies.
As Variability of Population Increases....
Standard error of the mean increases.
CLT Conclusion #2
The mean of the sample means will be the population mean µ
CLT Given #1
The random variable x has a distribution (which may or may not be normal) with mean µ and standard deviation ø
Q9bii Give a reason why, despite the statistician's statement, your answer to part (a)(iii) is still valid.
The sample size is over 30 so the CLT applies
CLT Conclusion #3
The standard deviation of the sample means will approach (O/√n)
(T/F) Given a reasonably large sample size, the distribution of sample means from *any population is normal.
True NOTE: IT SAYS ANY NOTE: IT SAYS DISTRIBUTION OF *SAMPLE *MEANS
(T/F) Two sided Ha is preferred over one sided Ha. Explain why if true or false
True, because one sided Ha has to be justified, proven right!
Two way Ha is .... while one way Ha is...
Two way means that you are not taking a side, but one way means you are taking a side
Type II Error
When you fail to reject a false null hypothesis: you say your experiment didn't work but it did. (Beta denotes this error)
Type I Error
When you reject a true null hypothesis: your say your experiment worked, but it didn't (your alpha (a) is essentially the probability that type one error occurs)
If Variability of the Population Increases and Standard Error of the Mean Increases, how does this represent the population mean?
Won't represent the population mean as well
sampling distribution of the mean
X(bar) ~ N(μ, (σ / √n))
the average value of n independent instances of random variables from ANY probability distribution will have approximately a t-distribution when
after subtracting its mean and dividing by its standard deviation and the n is sufficiently large
If you hold everything else constant, then an increase in Type I error means a (decrease/increase) in Type II error
decrease
Ha (equal/ do not equal) while Ho (equal/ do not equal)
do not equal equal
(T/F) to reject null hypothesis, we want p > a
false; a > p
Central Limit Theorem, CLT
for any given population with a mean μ and a standard deviation σ with samples of size n, the distribution of sample means for samples of size n will have a mean of μ and a standard deviation of σ and will approach a normal distribution as n approaches infinity
Standard Error Z means
how far the sample mean from the population mean / how far are sample means in general from the population mean
As sample sizes increase, the graph of the skewed distribution gets...
normal; the larger the sample sizes the better
The standard error is usually smaller than the standard deviation of our (sample/population) but is usually equal to or close to our (sample/population)
population sample
What happens when the degrees of freedom increases
the peak starts to get closer to the peak of a standard normal distribution The tail starts to get thicker
If we calculate the 95% confidence interval, we would expect
the true population mean to be found in its range in 95% of our samples
Standard Error of the Mean wants...
to represent the population, and the confidence of that depends on it.
(T/F) Always want to assume your null hypothesis is true
true
(T/F) You NEVER accept the null hypothesis
true; you either reject the Ho or fail to reject the Ho
t distribution
when we have a sample and don't know the population variance
sample parameters: mean: Sd:
x bar s
Can statistical and research hypothesis equal?
yes
Are Hypotheses hard to prove that they are right?
yes; it's easier to prove something is wrong or NOT equal to what you think
3 Distributions
1. Population 2. Sample 3. Distribution of Sample Means.
variances
The differences between planned amounts and actual amounts
CLT Conclusion #1
The distribution of sample x̄ will, as the sample size increases, approach a normal distribution
Type I and Type II errors are (inversely/directly) related when everything else is held constant
inversely
dependent samples t-test
means of two conditions, when the same sample is used for both
Q6bii State why (Ybar), the mean of a random sample of 35 gas meter installations, is likely to be approximately normally distributed.
A sample size of over 30 means that the CLT applies.
The null and alternative hypotheses are written about A a population parameter. B sample data. C a sample statistic.
A; Ho and Ha are always about population parameters, however, when you're testing it out, you're looking at sample parameters
how would you describe a scenario where a < p?
Fail to reject null hypothesis: you are wrong There is not enough evidence to reject the null. There is not enough evidence to indicate that Ha is true
(T/F) you can accept the null but never do anything to Ha
False; You cannot accept the null but you never do anything to the Ha!!!
CLT Practical Rules Commonly Used #1
For samples of size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution. The approximation get better as the sample size n becomes larger.
I DON'T UNDERSTAND THIS
Given H0 is true, the p-value of a test is the probability of getting a value as extreme or more extreme in the direction of HA. The size of our p-value will drive our conclusion about H0
Research hypothesis Reading Program X (RPX) will improve children's reading ability What is the Ha and Ho of a statistical hypothesis
Ho: Children using RPX will score the same on a comprehension test as children using the old reading program Ha: Ha: Children using RPX will have greater reading comprehension scores than children using the old reading program
You believe that men are more likely to play FPS video games than women. What is your Ho? What is your Ha?
Ho: Men and women play equal amounts of FPS video games (you want to collect data that shows this is wrong) Ha: Time playing video games for men > for women
Q10bi Explain why a normal distribution is unlikely to provide an adequate model for Y.
If you go two standard deviations below the mean, you're at - 31 minutes. This is impossible but two standard deviations below the mean is not unusual in a Normal Distribution.
Q14d The normal distribution provides a good model for many continuous distributions which arise in production processes or in nature. Explain why the Central Limit Theorem provides another reason for the importance of the normal distribution.
If you have a large enough sample size, then the means of those samples are approximately normally distributed regardless of the population's distribution.0
single sample t-test
a statistic to evaluate whether a sample mean statistically differs from a specific value
independent samples t-test
a test to determine if there is a difference between two separate, independent groups; conducted when researchers wish to compare mean values of two groups
Q5e Indicate where, if anywhere, in this question you needed to make use of the Central Limit Theorem.
a) No - route A journey times are normally distributed. b) No - route B journey times are normally distributed. c) Not relevant d) Yes - Car journey times are from an unknown distribution. Sample size of 36 journeys allows us to use CLT.
CLT
statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population.
z test
test of the data compared to a general population with population parameters
sum of squares, SS
the sum of the squared deviation scores
(T/F) Z- scores only work for N(0,1)
true
population parameters,, mean: Sd:
μ σ
Hypothesis Testing Step: Check Assumptions
- Assumptions are requirements that must be met for the statistical test to be valid - Assumptions include: random sampling, independent observations
independent observations (Assumptions)
- has to be an independent observation, that is one observation doesn't affect others.
Null Hypothesis
- hypothesis that nothing happened - it didn't work - usually that something equals something else (or 0) - No difference between groups
Statistical Hypotheses
- set a claim and a counterclaim about a population parameter
a-value
- usually 5 % or 0.05 - if there is less than a 5% chance that the findings were due to chance, then the null can be rejected. - alpha is usually set by researcher, but conventionally 5% - WHAT PERCENT RISK ARE WE GOING TO TO TAKE
p-value
- what is the probability that we could get these values if the null were true. - want the probability to be very low
steps to developing a hypothesis:
1. Gather information (read and observe) 2. Research question 3. make research hypothesis 4. Make statistical hypothesis *Statistical hypothesis will more likely have numbers involved
Q13 Has the CLT been used anywhere in this question?
No. " The heights can be modelled by a normal distribution. "
Ho is
Null Hypothesis
Central Limit Theorem
Regardless of the distribution of a population, as n increases, the distribution of the *means of *random samples from the population will approach a normal distribution, specifically: N(μ, (σ / √n)) - sampling distribution of the means
CLT equation
Regardless of the distribution of a population, as n increases, the distribution of the means of random samples from the population will approach a normal distribution, specifically: N(μ, (σ / √n)) - sampling distribution of the means
how would you describe a scenario where a > p?
Reject null hypothesis: you are right Always use the word significant. so. There is a significant difference/finding/etc.
how are research hypothesis different from statistical hypothesis?
Research gives expected relationship between two variables, while statistical makes assumptions based off of a parameter. Compares Ha and Ho.
CLT Given #2
Samples all of the same size n are randomly selected from the population of x values.
standard error
is usually smaller than the standard deviation of our population,, but is usually equal to or close to our sample the standard deviation of a sampling distribution, simply put the standard deviation from a point
Power
probability of correctly rejecting the null hypothesis (1-B)
if small p-value ............. Ho if big p-value............. Ho
reject fail to reject
The goal is to (accept/reject) null hypothesis
reject rejecting proves that your experiment worked