STA 2023 Unit 2
confidence interval
is a range of values with an associated probability or confidence level C. The probability quantifies the chance that the interval contains the true population parameter
mean of a random variable
is a weighted average of the possible values of X, reflecting the fact that all outcomes might not be equally likely.
variance of a random variable
is a weighted average of the squared deviations (X − µX)2 of the variable X from its mean µX.
test statistic
is based on the statistic that estimates the parameter. the standardized sample mean
The mean of the sampling distribution
is equal to the population mean μ.
The probability of making a Type II error
is labeled beta
Type II error
is made when we fail to reject the null hypothesis and the null hypothesis is false (incorrectly keep a false H0).
Type I error
is made when we reject the null hypothesis and the null hypothesis is actually true (incorrectly reject a true H0).
critical value z*
is related to the chosen confidence level C. C is the area under the standard Normal curve between −z* and z*
power of a test of hypothesis with fixed significance level α
is the probability that the test will reject the null hypothesis when the alternative is true. the probability that the data gathered in an experiment will be sufficient to reject a wrong null hypothesis
The probability of making a Type I error
is the significance level alpha
binomial probability
is this count multiplied by the probability of any specific arrangement of the k successes P (X = k) = (n choose k) p^k (1 - p)^ n-k
The standard deviation of the sampling distribution
is σ/√n, where n is the sample size.
Averages are more/less variable than individual observations.
less
mean of binomial distribution
m = np
margin of error formula
m is z* (sigma/ SR n)
statistical significance doesn't tell you about the ____ of the event, only if there is one
magnitude
Let X = your bank balance at the start of the day, and let Y = your bank balance at the end of the same day. And let us suppose that your starting balances vary with mean $2500 and standard deviation $37, while your ending balances vary with mean $3700 and standard deviation $34, and the two variables are correlated with ρ=0.75. Let us suppose that instead I will give you $60 cash at the start of each day. On average, how much will you start out with each day? What is the standard deviation of those amounts?
mb + x = b + mx 60 + 2500 = $2,560 σ2b+x = σ2X = $37
tests of significance
method of statistical inference assess evidence for a claim about a population
confidence intervals
method of statistical inference estimating a value of a population parameter
independent events use the ______ rule
multiplication rule P(BBB)= P(B) x P(B) x P(B)
Let X = your bank balance at the start of the day, and let Y = your bank balance at the end of the same day. And let us suppose that your starting balances vary with mean $2500 and standard deviation $37, while your ending balances vary with mean $3700 and standard deviation $34, and the two variables are correlated with ρ=0.75. On average, how much is deposited per day? What is the standard deviation of daily deposits
my - x= my - mx 3700 - 2500= $1,200 σ2X-Y = σ2X + σ2Y - 2ρσXσY (34)2 + (37)2 - (2 * .75 * 34 * 37) = 638 square root is $25.26
If A happens and it does NOT prevent B from happening
not disjoint
A company tests whether the mean volume of tea in their bottles is 500 ml, as stated on the label. Here, the company would likely be more concerned that the bottles contain less than advertised, which would likely lead to consumer complaints of false advertising. Thus we test:
one-sided test: H0 : µ = 500 ml Ha : µ < 500 ml
test of statistical significance
tests a specific hypothesis using sample data to decide on the validity of the hypothesis.
a SMALL p value implies
that random variation because of the sampling process alone is not likely to account for the observed difference.
If A happens and it DOES alter the chances that B will happen
dependent
final sample spaces deal with ____ data
discrete
discrete or continuous & sample space: the exact number of characters printed on a randomly selected page of this packet
discrete S= {1,2,3,4...}
discrete or continuous & sample space: the number of eyelashes on a randomly selected giraffe
discrete S= {1,2,3,4...}
complement rule
P(x)= 1 - P(xc) c: complement rule states that the probability of an event not occurring is 1 minus the probability that it does occur.
A complete set of probabilities always add up to
1
the total area under a probability histogram is always
1 just as the total area under a density curve is always 1
how to calculate the variance of a discrete random variable
((x1 - mean x)2 * probability 1) + ....
You are in charge of quality control in your food company. You sample randomly four packs of cherry tomatoes, each labeled 1/2 lb. (227 g). The average weight from your four boxes is 222 g. H0 : µ = 227 g (µ is the average weight of the population of packs) Ha : µ ≠ 227 g (µ is either larger or smaller) What is the probability of drawing a random sample such as yours if H0 is true? Find the P Value, is it significant? Do you reject the null hypothesis?
(see notes) p value= 0.0456 yes it is significant x < .05 yes you reject the null hypothesis
how to find the mean of a discrete random variable
(x1 + probability 1) + (x2 + probability 2) + ...
random variable
- many r.v. for one random process ex) rolling 2 die (random process) R.V: sume of these faces --> 2-12
the probability of a single event of a density curve is
0
P-value of ___ or less is considered significant
0.05
Let X = your bank balance at the start of the day, and let Y = your bank balance at the end of the same day. And let us suppose that your starting balances vary with mean $2500 and standard deviation $37, while your ending balances vary with mean $3700 and standard deviation $34, and the two variables are correlated with ρ=0.75. Let us suppose that I will give you 5% of your starting balance every day. On average how much will I give you every day? What is the standard deviation of those amounts?
0.5 mx = 0.5(2500) = $125 (0.5)2 * (37)2 = 3.42 square root --> $1.85
Let us suppose that the left eye spherical measurements on eyeglass prescriptions are Normally distributed with mean 0 diopters and standard deviation 1 diopter. We will randomly select 35 people and calculate their mean left eye spherical measurement. (c) What is the probability that the sample mean turns out to be within 0.4 diopters of the population mean?
1 - 0.0082 - 0.0082 = 0.9836 = 98%
the power of a test is
1 − b.
the population parameter "mu" must be within roughly ___ standard deviations from the sample average, in ___ of all samples.
2; 95%
Decrease σ.
A larger variance σ2 implies a larger spread of the sampling distribution, σ/√n. Thus, the larger the variance, the lower the power. The variance is in part a property of the population, but it is possible to reduce it to some extent by carefully designing your study.
law of large numbers
As the number of randomly drawn observations in a sample increases, the mean of the sample gets closer and closer to the population mean "mu". only applies to really large numbers
confidence interval to test a two-sided hypothesis.
C = 1 - α.
In a large population of adults, the mean IQ is 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign. The distribution of the sample mean IQ is: A) Exactly normal, mean 112, standard deviation 20 B) Approximately normal, mean 112, standard deviation 20 C) Approximately normal, mean 112 , standard deviation 1.414 D) Approximately normal, mean 112, standard deviation 0.1
C) Approximately normal, mean 112 , standard deviation 1.414
If A happens and it DOES prevent B from happening?
Disjoint and depenedent
The FDA tests whether a generic drug has an absorption extent similar to the known absorption extent of the brand-name drug it is copying. Higher or lower absorption would both be problematic, thus we test:
H0 : µgeneric = µbrand Ha : µgeneric (does not equal) µbrand two-sided
mean and margin of error for confidence interval
Mean ± m m is called the margin of error m within x ± m Ex: 120 ± 6 Two endpoints of an interval m within (x −m) to (x + m) Ex: 114 to 126
statistical inference
Methods for drawing conclusions about a population from sample data
Increase α
More conservative significance levels (lower α) yield lower power. Thus, using an α of .01 will result in lower power than using an α of .05.
Let us suppose that the left eye spherical measurements on eyeglass prescriptions are Normally distributed with mean 0 diopters and standard deviation 1 diopter. We will randomly select 35 people and calculate their mean left eye spherical measurement. (a) Write down the distribution of the sample mean.
N (0, 1/square root of 35)
the sample mean distribution is N()
N(μ, σ/√n)
discrete or continuous & sample space: the exact square footage of a randomly selected apartment
continuous S= {0 < (or = to) x < infinity}
The fees in a sample of 292 bankruptcy cases was examined. x = $1078 and s = $592. What is the distribution of the sample means of x? Find the middle 95% of the sample means distribution.
Normal (mean μ, standard deviation σ/√n) = N($1078, $34.6). Roughly ± 2 standard deviations from the mean, or $1078 ± $2x34.6. approximately ($1008.80, $1147.20).
A coin is flipped 10 times. Each outcome is either a head or a tail. The variable X is the number of heads among those 10 flips, our count of "successes." Find B (n,p)
On each flip, the probability of success, "head," is 0.5. The number X of heads among 10 flips has the binomial distribution B(n = 10, p = 0.5).
The probability that you obtain heads OR tails
P(HH or TT) = P(HH) + P(TT) = 0.25 + 0.25 = 0.50
discrete data
data that can only take on a limited number of values.
Let us suppose that 17% of all hospital admissions are for gun-related injuries. X = the number of gun-related admissions in 29 randomly selected hospital admissions is a Binomial random variable. What is the probability that more than 5 but less than 12 are gun-related among 29 randomly selected hospital admissions?
P(x=6) + P(x=7) + P(x=8) + ... + P(x=11) P(x < or equal to 11) - P (x < or equal to 5) binomcdf (29, .17, 11) - binomcdf (29, .17, 5)= 0.3683
What is the probability, if we pick one woman at random, that her height will be some value X? For instance, between 68 and 70 inches P(68 < X < 70)? N(64.5, 2.5)
Probability= 0.0669
small p values and null hypothesis
REJECT null hypothesis The true property of the population is significantly different from what was stated in H0.
A basketball player shoots three free throws. What is the number of baskets made?
S = { 0, 1, 2, 3 }
A basketball player shoots three free throws. What are the possible sequences of hits (H) and misses (M)?
S = { HHH, HHM, HMH, HMM, MHH, MHM, MMH, MMM } Note: 8 elements, 23
If you flip two coins, and the first flip does not affect the second flip Sample size and probability of each of these events?
S = {HH, HT, TH, TT}. The probability of each of these events is 1/4, or 0.25.
ex) coin flip what is the sample space? probability of heads AND tails
S= {H,T} P of heads= 0.5 P of tails= 0.5
p value
Tests of statistical significance quantify the chance of obtaining a particular random sample result if the null hypothesis were true. This is a way of assessing the "believability" of the null hypothesis given the evidence provided by a random sample.
binomial distribution
The distribution of the count X of successes in the binomial setting is the binomial distribution with parameters n and p: B(n,p). The parameter n is the total number of observations. The parameter p is the probability of success on each observation. The count of successes X can be any whole number between 0 and n
binomial coefficient
The number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences). (n) = n! / k! (n-k)! (k)
null hypothesis, H0
The statement being tested in a test of significance
S= sample space
This is a set, or list, of ALL possible outcomes of a random process.
significance level, alpha
This value is decided arbitrarily before conducting the test.
central limit theorem
When randomly sampling from any population with mean μ and standard deviation σ, when n is large enough, the sampling distribution of x bar is approximately normal: ~ N(σ, σ/√n).
discrete random variable
X has a finite number of possible values
continuous random variable
X takes all values in an interval (infinite)
higher confidence C implies
a larger margin of error m (thus less precision in our estimates).
probability
a random phenomenon can be defined as the proportion of times the outcome would occur in a very long series of repetitions.
probability distribution
a random variable X tells us what values X can take and how to assign probabilities to those values.
Increasing the sample size
decreases the spread of the sampling distribution and therefore increases power. But there is a tradeoff between gain in power and the time and cost of testing a larger sample
continuous random variables can by represented by (1)
density curve
event
a subset of the sample space
Think of the random process of repeatedly selecting a two-kid family and recording the number of girls in each family. Let X = the number of girls in a randomly chosen two-kid family. a) what are the possible values of x? (b) Let us assume that the probability of getting a girl is 0.6 for all families. Calculate the probability of getting each number of girls that you wrote down above. (c) Construct a probability distribution table for X. (d) Calculate the mean (e) Calculate the variance (f) calculate the standard deviation
a) x= {0,1,2} b) / c) 0 1 2 0.16 0.48 0.36 d) 1.2 e) ((0 - 1.2)2 * 1.6) + ... f) 0.69 (square root of variance)
disjoint events use ______ rule
addition rule P (A or B)= P(A) + P(B)
hypothesis
an assumption or a theory about the characteristics of one or more variables in one or more populations.
size of the effect
an important factor in determining power. Larger effects are easier to detect.
discrete or continuous & sample space: the exact amount of gas it takes to drive from FSU to a certain off-campus home on a randomly selected day
continuous S= {0 < (or = to) x < infinity}
the individual outcomes of a random phenomenon are always
disjoint
mean of a random variable is also called
expected value of x
assigning probabilities empirically
from our knowledge of numerous similar past events ex) Mendel discovered the probabilities of inheritance of a given trait from experiments on peas without knowing about genes or DNA.
assigning probabilities theoretically
from our understanding the phenomenon and symmetries in the problem ex) A six-sided fair die: each side has the same chance of turning up Genetic laws of inheritance based on meiosis process
confidence interval in relation to null hypothesis
gives a black and white answer: Reject or don't reject H0. But it also estimates a range of likely values for the true population mean µ.
two-tail or two-sided test
has these null and alternative hypotheses: H0 : µ = [a specific number] Ha : µ (does not equal) [a specific number]
one-tail or one-sided test
has these null and alternative hypotheses: H0 : µ = [a specific number] Ha : µ < [a specific number] OR H0 : µ = [a specific number] Ha : µ > [a specific number]
Two events A and B are disjoint if
if they have no outcomes in common and can never happen together. The probability that A or B occurs is then the sum of their individual probabilities
If A happens and it does NOT alter the chances that B will happen
independent
how to find the standard deviation of a random variable
positive square root of the variance
The sampling distribution of a statistic is the
probability distribution of that statistic.
discrete random variables can by represented by (2)
probability distribution table probability histogram
a lower confidence level C
produces a smaller margin of error m (thus better precision in our estimates).
p value in relation to null hypothesis
quantifies how strong the evidence is against the H0. But if you reject H0, it doesn't provide any information about the true population mean µ.
random process
series of independent trials where the outcome of each trial is unpredictable, but a pattern of outcomes shows up in the long run. ex) rolling dice
variance of binomial distribution
sigma squared x = np(1 - p)
standard deviation of binomial distribution
sigma x = square root np(1 - p)
sampling distribution of a statistic
the distribution of all possible values taken by the statistic when all possible samples of a fixed size n are taken from the population.
independent
the outcome of a new coin flip is not influenced by the result of the previous flip
alternative hypothesis , Ha
the statement we suspect is true instead of the null hypothesis
how do we find specific z* value?
use table of z/t values (table C)
If the P-value is greater than α (P > α)
we fail to reject H0.
If the P-value is equal to or less than α (P ≤ α)
we reject H0.
confidence interval formula
x bar +/- z* (sigma/ SR n)
Let us suppose that 17% of all hospital admissions are for gun-related injuries. X = the number of gun-related admissions in 29 randomly selected hospital admissions is a Binomial random variable. What is the distribution of X?
x is B (29, .17)
Suppose that for FSU students, the probability that an individual is eating pizza at any given time is 40%. That means if I randomly sample an FSU student, the probability that they are eating pizza is 40%. Suppose we randomly sample 3 FSU students. Let X = the number among 3 randomly sampled students who are eating pizza. Construct the probability distribution for X.
x | 0 1 2 3 p(x)| . 6^3 3 (.4 x .6^2) 3 (.4^2 x .6) .4^3
Let us suppose that 17% of all hospital admissions are for gun-related injuries. X = the number of gun-related admissions in 29 randomly selected hospital admissions is a Binomial random variable. Sample space of x?
x= {0,1,2,...,29}
Let us suppose that the left eye spherical measurements on eyeglass prescriptions are Normally distributed with mean 0 diopters and standard deviation 1 diopter. We will randomly select 35 people and calculate their mean left eye spherical measurement. (d) What is the 90th percentile of the sampling distribution of the sample mean?
z = 1.28 (0.915) 1.28 = x - 0 / (1/SR 35) 0.216 = x
Let us suppose that the left eye spherical measurements on eyeglass prescriptions are Normally distributed with mean 0 diopters and standard deviation 1 diopter. We will randomly select 35 people and calculate their mean left eye spherical measurement. (b) What is the probability that the sample mean turns out to be less than -0.4 diopters?
z= (-0.4 - 0)/ (1/ SR of 35) = -2.4 0.0082 = 82%
You invest 20% of your funds in Treasury bills and 80% in an "index fund" that represents all U.S. common stocks. Your rate of return over time is proportional to that of the T-bills (X) and of the index fund (Y), such that R = 0.2X + 0.8Y. Based on annual returns between 1950 and 2003: Annual return on T-bills µX = 5.0% σX = 2.9% Annual return on stocks µY = 13.2% σY = 17.6% Correlation between X and Yρ = −0.11 Find the mean and standard deviation
µR = 0.2µX + 0.8µY = (0.2*5) + (0.8*13.2) = 11.56% σ2R = σ20.2X + σ20.8Y + 2ρσ0.2Xσ0.8Y = 0.2*2σ2X + 0.8*2σ2Y + 2ρ*0.2*σX*0.8*σY = (0.2)2(2.9)2 + (0.8)2(17.6)2 + (2)(−0.11)(0.2*2.9)(0.8*17.6) = 196.786 σR = √196.786 = 14.03%