BBSS-E
T/F A large P-value in a test will favor a rejection of the null hypothesis.
False. A small P-value in a test will favor a rejection of the null hypothesis. The smaller the P-value of the test, the more evidence there is to reject the null hypothesis.
T/F In a hypothesis test, you assume the alternative hypothesis is true.
False. In a hypothesis test, you assume the null hypothesis is true.
Hypothesis Testing
method for testing a claim or hypothesis about a parameter in a population using data measured in a sample (likelihood it would be true)
Null hypothesis (Hsub0)
contains a statement of equality (greather than or equal to, less than or equal to, equal to)
alpha
denotes the level of significance of a hypothesis test, and it is the probability of committing a type I error
Power
probability of rejecting a false null
null distribution
the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true
Use the confidence interval to find the margin of error and the sample mean. (1.66,2.04)
(2.04-1.66)/2 = .19 Margin of error = .19 2.04-.19 = 1.85 Sample mean = 1.85
T-stat
(X-mean)/(SD/root(n))
null hypothesis (H0)
(stated as the null) a statement about a population parameter, such as the population mean, that is assumed to be true
-P-values are probabilities, so they are always a number between .... -The order of ... of the P-value matters more than its exact numerical value.
0 and 1 magnitude
100!/98!
100x99 (everything else cancels out)
t distribution
The sampling distribution of the test statistic
probability rule: the probability that an event A does not occur equals 1 ... the probability that it does occur
minus
alternative, Ha
more general statement that complements yet is mutually exclusive with null hypothesis
use the values population mean = 6.39 sample mean = 5.1 to find the sampling error
sample mean (x-bar) - population mean (mu) = sampling error -1.29
2 biases that can occur
sample selection bias - survivorship bias, excluding situations that haven't survived time-period bias- sensitivity to the starting/ending dates of sample
Playing the game of roulette, where the wheel consists of slots numbered 00, 0, 1, 2, ..., 41 To play the game, a metal ball is spun around the wheel and is allowed to fall into one of the numbered slots.
sample space = {00, 0, 1, 2, ..., 41}. outcomes= 43
Identify the sample space of the probability experiment and determine the number of outcomes in the sample space. -Guessing the last digit in the price of a TV
sample space= 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 outcomes= 10
A lower confidence level C produces a ... margin of error m (more ... less accuracy).
smaller precision
p-value
smallest level of significance at which the null can be rejected if the P is LOW reject the null!!!
significance tests
someone makes claim about unknown value of population parameter tests a specific hypothesis using sample data to decide on validity of hypothesis
The null hypothesis, H0, is a very ... statement about a parameter of the population(s).
specific
List 4 steps to hypothesis testing
state hypothesis, set criteria for decision, compute test stat, make decision
r measures the ... and ... of a linear association
strength direction
Wording effects can influence _____________________________.
survey results
P(A or B) = P(A) + P(B)
true
one sided vs two sided
two sided: symmetric, not equal sign one sided: asymmetric and specific, greater or less signs
For ... tests, 2P(Z>/= absolute value(z)) because of the symmetry of the normal curve
two-sided
summarize data about two categorical variables or factors collected on the same set of individuals
two-way tables
null hypothesis
designated H0, is the hypothesis that the researcher wants to Reject
r^2, the coefficient of ..., is the square of the correlation coefficient
determination
for continuous probabilities events are defined over the ... of values
intervals
... normal calculations- when you are seeking the rang of values that correspond to a given proportion/area under the curve -find the desired area/proportion in the middle of the table -then look at the corresponding z-value from the left column and top row
inverse
Because ... have small chance variation, very small population effects can be highly significant if the sample is large.
large random samples
Higher confidence C implies a ... margin of error m (less precision more ...).
larger accuracy
to calculate the area between two z-values, first get the area under N(0, 1) to the left for each z-value from the table and then subtract the smaller area from the ...
larger area
significance level: alpha
largest P-value tolerated for rejecting H0, decided arbitrarily before conducting test -when p<=a, we reject null -when p>a, we fail to reject null
as the number of randomly drawn observations (n) in a sample increases: -the mean of the sample gets closer and closer to the population mean -the sample proportion gets closer and closer to the population proportion p
law of large numbers
the ... describes what would happen if we took samples of increasing size n
law of large numbers
the area between z1 and z2 = area ... of z1 - area ... of z2
left left
How do you Carry out a hypothesis test
-You assume the null hypothesis is true -Then consider how likely the observed value of the test statistic was to occur -if the likelihood is < a given threshold , then you reject the null hypothesis tests
steps in hypothesis testing
1. Stating the hypotheses. 2. Identifying the appropriate test statistic and its probability distribution. 3. Specifying the significance level. 4. Stating the decision rule. 5. Collecting the data and calculating the test statistic. 6. Making the statistical decision. 7. Making the economic or investment decision.
List the four survey challenges.
1. Undercoverage or selection bias 2. Nonresponse 3. Wording effects 4. Response bias
Name the three sampling processes.
1. Voluntary response sampling 2. Convenience sampling 3. Probability sampling
Level of confidence 90% = critical value ___
1.645
Level of confidence 99% = critical value ___
2.575
A probability experiment consists of rolling a 6-sided die. Find the probability of the event below. rolling a number less than 3
2/6= 0.333
almost all 99.7% of observations are within ... of the mean
3 standard deviations
The access code for a car's security system consists of four digits. The first digit cannot be 1 and the last digit must be odd. How many different codes are available?
4,500 using 0= 9x10x10x5
a sample size of ... or more will typically be good enough to overcome an extremely skewed population and mild outliers in the sample
40
the poisson distribution is skewed when u < ...
5
how many different groups of 3 can be selected from 5 ppl
5!/3!(5-3)!=10
A certain lottery has 35 numbers. In how many different ways can 4 of the numbers be selected? (Assume that order of selection is not important.)
52369 nCr= n!/((n-r)!r!) =35!/((35-4)!4!) =1256640/4!
There are 50 members on the board of directors for a certain non-profit institution. If they must elect a chairperson, first vice chairperson, second vice chairperson, and secretary, how many different slates of candidates are possible?
5527200 50x49x48x47
Critical region
A region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis
Critical Region
A region of the probability distribution which, if the test statistic falls within it, would cause you to reject the null hypothesis.
Hypothesis =
A statement made about the value of a population parameter
One-tailed hypothesis
Alternate Hypothesis; H(1): p<... and H(1): p>...
Two-tailed hypothesis
Alternate Hypothesis; H(1): p≠...
What does the symbol H1 stand for?
Alternative Hypothesis
Interval Estimate
An interval, or range of values, used to estimate a population parameter
Sixth Step
Compare the OV to the CV
Fifth Step
Compute the CV using the appropriate table
Fourth Step
Compute the test statistic value to get the OV
Statistical significance only says whether the effect observed is likely to be due to chance alone because of random sampling
Doesn't tell about the magnitude of the effec May not be practically important With large sample size, small effect can be signficant
T/F You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is greater than 0.5.
False- You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is exactly 0.5.
The probability that event A or event B will occur is P(A or B)=P(A)+P(B)−P(A or B).
False. -P(A and B)
One- tailed test
Hypothesis tests with alternative hypotheses in the form H1 : p < and H1: p >
Two-tailed tests
Hypothesis tests with an alternative hypothesis in the form H1 : p /=
Null Hypothesis (Ho)
Hypothesis that you asume as correct.
Seventh Step
If the OV is greater than the CV, the null is incorrect. If the OV is less than the CV, the null is correct.
Error rates
Occur when we've made a mistake in drawing our statistical conclusion, There are Type I and Type II errors.
general addition rule: P(A or B)
P(A) + P(B) - P(A and B)
multiplication rule for independent events: if A and B are independent then: P(A and B) =
P(A)P(B)
general multiplication rule: the probability that any two events, A and B, both occur is: P(A and B) =
P(A)P(BlA)
Baye's theorem: if we know the conditional probability P(BlA) and the individual probability P(A) we can use Baye's theorem to fine the continual probability of ...
P(AlB)
when two events A and B are independent, P(AlB) = ... because no information is gained from the knowledge of event A
P(B)
for example, for P(x</= 2) =
P(X = 0) + P(X = 1) + P(X = 2)
A ... quantifies how strong the evidence is against the H0. But if you reject H0, it doesn't provide any information about the true population mean µ.
P-value
The probability, if H0 was true, of obtaining a sample statistic at least as extreme (in the direction of Ha) as the one obtained.
P-value
Decide if the situation involves permutations, combinations, or neither. Explain your reasoning. The number of ways 19 people can line up in a row for concert tickets. Does the situation involve permutations, combinations, or neither?
Permutations. The order of the 19 people in line matters.
general decision rule for a two-tailed test is
Reject H0 if: test statistic > upper critical value or test statistic < lower critical value
parametric test
T-test, Z-test, chi square, f-test 1. concerned w parameters (mean/variance) 2. the validity depends on a definite set of assumptions concerned with the parameters of distribution
Construct the indicated confidence interval for the population mean μ using the t-distribution. c=0.95, x bar=13.1, s=3.0, n=5
Tc= 2.776 margin of error= 3.7 xbar +- 3.7 (9.4, 16.8)
Alternative Hypothesis H1
Tells you about the parameter if your assumption is shown to be wrong
What is the alternative hypothesis, H₁?
Tells you about the parameter if your assumption is shown to be wrong
Significance level =
The PB of rejecting the null hypothesis when its true. Eg a sig level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference
Critical value
The first value to fall inside of the critical region
Null Hypothesis H0
The hypothesis that is assumed to be correct
What is the null hypothesis, H₀?
The hypothesis that you assume to be correct
Write a statement that represents the complement of the given probability. The probability of randomly choosing a tea drinker who has a college degree (Assume that you are choosing from the population of all tea drinkers.)
The probability of choosing a tea drinker who does not have a college degree
What is the actual significance level of a hypothesis test?
The probability of incorrectly rejecting the null hypothesis
Actual significance level
The probability of incorrectly rejecting the null hypothesis.
In the general population, one woman in ten will develop breast cancer. Research has shown that 1 woman in 650 carries a mutation of the BRCA gene. Seven out of 10 women with this mutation develop breast cancer.
The probability that a randomly selected woman will develop breast cancer given that she has a mutation of the BRCA gene= 0.7 The probability that a randomly selected woman will carry the gene mutation and develop breast cancer= (0.7x(1/650)= 0.0011 dependent
Test statistic =
The result of the experiment Or The statistic that's is calculated from the sample
What is the test statistic?
The result of the experiment or the statistic that is calculated from the sample (eg the number of heads in 8 tosses)
A combination is an ordered arrangement of objects.
The statement is false. A true statement would be "A permutation is an ordered arrangement of objects." A permutation is an ordered arrangement of objects. The number of different permutations of n distinct objects is n!. On the other hand, a combination is a selection of r objects from a group of n objects without regard to order and is denoted by nCr.
Why do we standardize to a t-scale?
We standardize to a t-scale, allowing us to use 1 test for every situation
Point estimate
a single value estimate for a population parameter
hypothesis
a statement or proposed explanation for an observation, a phenomenon, or a scientific problem that can be tested using the research method (a hypothesis is often a statement about the value for a parameter in a population)
alternate hypothesis (H1)
a statement that directly contradicts a null hypothesis by stating that the actual value of a population is less than, greater than, or not equal to the value stated in the null hypothesis
effect size
a statistical measure of the size of an effect in a population, which allows researchers to describe how far scores shifted in the population, or the percent of variance that can be explained by a given variable
one-sample z test
a statistical procedure used to test hypothesis concerning the mean in a single population with a known variance
Type III error
a type of error possible with one-tailed tests in which a decision would have been to reject the null hypothesis, but the researcher decides to retain the null hypothesis because the rejection region was located in the wrong tail (the "wrong tail" refers to the opposite tail from where a difference was observed and would have otherwise been significant)
rejection point/ critical value for the test statistic
a value with which the computed test statistic is compared to decide whether to reject or not reject the null hypothesis
A standard deck of cards contains 52 cards. One card is selected from the deck. (a) Compute the probability of randomly selecting a heart or diamond (b) Compute the probability of randomly selecting a heart or diamond or spade. (c) Compute the probability of randomly selecting an eight or spade
a.) 0.5 (26/52) b.) 0.75 c.) 0.308 (4+13-1)
The odds of an event occurring are 2:6. Find (a) the probability that the event will occur and (b) the probability that the event will not occur.
a.) 2+6=8 so 2/8=0.25 b.)6/8= 0.75
z statistic
an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates from the population mean stated in the null hypothesis
Experiments compare the response to a given treatment versus _____________________, _______________________, and or _____________________.
another treatment; the absence of treatment, a control; a placebo, a fake treatment
central limit theorem components: -the larger the sample size n, the better the ... of normality -many statistical tests assume normality for the sampling distribution and the central limit theorem tells us that, if the sample size is large enough, we can safely make this assumption even if the raw data appear ...
approximation non-normal
probabilities are computed as ... under the corresponding portion of the density curve for the chosen interval
areas
A one-tail or one-sided alternative is ... and ...: -Ha: µ < [a specific value or another parameter] OR -Ha: µ > [a specific value or another parameter]
asymmetric specific
chi-square distributions
asymmetrical family of distributions- a dif distribution exsists for each possible value of degrees of freedom bounded below by 0. only positive values bc X^2 use for tests concerning the variance of a single normally distributed population sensitive to violations of its assumptions. if sample not random or if it does not come from a normally distributed population, inferences might be faulty
The P-value is the area under N(µ0, σ√n) for values of x̅ ... in the direction of Ha as that of our random sample.
at least as extreme
What determines the choice of a one-sided versus two-sided test is the question we are asking and what we know about the problem ... performing the test. If the question or problem is asymmetric, then Ha should be one-sided. If not, Ha should be two-sided.
before
Whats a type 2 error also called?
beta
the ... counts the number of ways in which k successes can be arranged among n observations
binomial coefficient
the number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences)
binomial coefficient
the center and spread of the ... for a count X are defined by the mean (u) and standard deviation (o)
binomial distribution
... are models for some categorical variables, typically representing the number of successes in a series of n independent trials
binomial distributions
P(X = k)
binomial probability
ex. we can conclude that there is a correlation between bear lengths and weights but we cannot conclude that greater lengths cause more weight
causality
association, however strong, does not imply ...
causation
confidence interval equation
center +/- margin of error (m) xbar +/- z*sigma / sqrt(n) -confidence level C represents an area of corresponding size C under sampling distribution
Use the given statement to represent a claim. Write its complement and state which is H0 and which is Ha. mu greater than or equals 568
complement of the claim: mu < 568 H0: mu greater than or equal to 568 Ha: mu less than 568
alternative hypothesis (Hsub-a)
complement of the null hypothesis statement that must be true if H0 is false and it contains a statement of strict inequality (greater than, less than, not equal to)
the distribution of one factor for each level of the other factor
conditional distribution
... reflect how the probability of an event can be different if we know that some other event has occurred or is true
conditional probabilities
A ... gives a black and white answer: Reject or don't reject H0. But it also estimates a range of likely values for the true population mean µ.
confidence interval
The ... C determines the value of z* (in Table C). The ... also depends on z*.
confidence level margin of error
we say that two variables are ... when their effects on a response variable cannot be distinguished from each other
confounded
Match the level of confidence, c=0.98, with its representation on the number line, given x bar =56.7, σ=8.9, and n=55.
constructing a confidence interval (mu) find Zc of c=0.98 OR STAT-->TESTS-->7 (z Interval) enter data and c-level as .98= 2.8 (interval estimate) xbar - interval
think of the poisson distribution as describing the number of items in ...
containers
.... contain an infinite number of events
continuous sample spaces
What range is a small Cohens d?
d < .2
Whats a large Cohens d?
d> .8
caution
data must be probability sample or come from randomized experiment sampling distribution must be approximately normal to use z procedure, we must know sigma
test statistic
degrees of freedom: n-1
we use ... to model continuous probability distributions because they assign probabilities over the range of values making up the sample space
density curves
If you picked different samples from a population, you would probably get ... sample means ( x̅ ) and virtually none of them would actually equal the true population mean, u.
different
Case-control studies start with two random samples of individual with ____________________________, and look for exposure factors in the subjects' past.
different outcomes
have a sample space that is made up of a list of individual outcomes
discrete probability models
discrete variables that can take on only certain values (a whole number or a descriptor)
discrete sample space
positive predictive value: P( ....l....)
disease positive test
two events are ... or ... if they can never happen together or have no common outcome
disjoint mutually exclusive
A _____________________ experiment is one in which neither the subjects nor the experimenter know which individuals received which treatment until the experiment is completed.
double-blind
What should you do in step 4?
draw a normal dist
The individuals in an experiment are the _________________________. If they are human, we call them _________________.
experimental units; subjects
Cross-sectional studies measure the ________________ and the ________________ at the same time.
exposure; outcome
predication outside the range is ... which you should avoid
extrapolation
the binomial coefficient "n choose k" uses the ... notation "!"
factorial
The explanatory variables in an experiment are often called _____________.
factors
Step 3
find test statistic( z score) and alpha
directional tests, one-tailed tests
hypothesis tests in which the alternative hypothesis is states as greater than (>) or less than (<) a value stated in the null hypothesis (hence the researcher is interested in a specific alternative to the null hypothesis)
uncertainty and confidence
if you picked different samples from a population, you would get different sample means (xbar) and virtually none of them would actually equal the true population mean, u
One way to increase the precision of a confidence interval without decreasing the level of confidence is to ___
increase the sample size
Whats the relationship btw effect size, sample, and power?
increases
two events are ... knowing that one event is true to has happened does not change the probability of the other event
independent
test statstic for a test of differences between 2 populations. (normal distribution/ variances unkown but assumed equal) or variances are not assumed equal!
independent random sample pooling!! when variances are assumed equal
an observation that markedly changes the regression if removed; this is often an isolated point
influential individual
an association may exist between x and y even when there is no significant linear correlation; could be a nonlinear association
linearity
... is a variable that is not among the explanatory or response variables in a study, and yet may influence the relationship between the variables studied
lurking variable
Observational studies often fail to yield clear causal conclusions, because the explanatory variable is confounded with ___________________.
lurking variables
confidence level and margin of error
m=z*sigma / sqrt(n) higher C=larger margin of error, less precision and more accuracy lower C=smaller margin of error, more presion and less accuracy .90...z*: 1.645 .95...z*: 1.96 .99...z*: 2.575
Statistical significance doesn't tell about the ... of the effect.
magnitude
A confidence interval ("CI") can be expressed as: -a center ± a ... m: μ within x̅ ± m -an ...: μ within (x̅ − m) to (x̅ + m)
margin of error interval
we can examine each factor in a two-way table separately by studying the row totals and column totals because they represent the ... expressed in percents
marginal distributions
Test Statisitc
math formula that identifies how far and how many standard deviations a sample outcome is from the value stated in a null hypothesis
Define Cohen's d
measures number of SDs from effect, z value up, pop effect up
-You may need a certain margin of error (e.g., drug trial, manufacturing specs). In many cases, the population variability (σ) is fixed, but we can choose the number of measurements (...). -Using simple algebra, you can find what sample size is needed to obtain a desired margin of error.
n
Binomial distribution conditions: -the total number of observations ... is fixed in advance -each observation falls into just one of two categories: ... and ... -the outcomes of all n observations are statistically ... -all n observations have the same probability p of ...
n success failure independent "success"
find what sample size is needed to obtain desired margin of error
n = (z*sigma/m)^2
n over k =
n!/k!(n k)!
For the same confidence level, ... confidence intervals can be achieved by using ... sample sizes.
narrower large
when x is smaller than the mean, the z is ...
negative
differences in the means
no paired observations form the 2 samples not independent samples
sometimes we are just told that a variable has an approximately ... distribution
normal
the sample size depends on the population distribution and more observations are required if the population distribution is far from ...
normal
... are used to model many biological variables and they can describe a population distribution or a probability distribution
normal curves
To test H0: µ = µ0 using a random sample of size n from a Normal population with known standard deviation σ, we use the ... N(µ0, σ√n).
null sampling distribution
the standard deviation of the sampling distribution of means is ...
o/sqrt n
binomial distributions describe the possible number of times that a particular event will occur in a sequence of ...
observations
The Hawthorne Effect also known as the ___________________________ is a term used to describe a type of bias that may occur due to behavior modification because of study enrollment.
observer effect
type II error
occurs if the NULL hypothesis is not rejected when it's false
Type I error
occurs if the NULL hypothesis is rejected when it's true
odds v. probability
odds of 2:3 (2/3) means probability of success is 2/5
If you obtain a different t value between two examples, what might the difference be from?
one tailed versus two tailed, assuming alpha is the same.
For ... tests, P(Z >/= z) or P(Z </= z)
one-sided
A matched pairs design is a repeated measures design if the experiment involves only _____________________________________________.
only one individual undergoing two treatments
You randomly select one card from a standard deck. Event A is selecting a king. Determine the number of outcomes in event A. Then decide whether the event is a simple event or not.
outcomes= 4 simple event= no
an observation that lies outside the overall pattern
outlier
... have unusually large residuals (in absolute value)
outliers
A statistic is unbiased if it does not ___
overestimate or underestimate the population parameter
c-confidence interval for a population proportion p
p hat - E is less than p is less than p hat + E where E = z score (square root of [p hat times q hat] / n)
DEPENDENT test concerning mean differences
paired observation comparison tests- stat test for differences in dependent items use single t-test just w mean and sd of differences
mean of differences
paired observations from the 2 samples 2 independent samples
simple linear regression: data comes in ... (xi, yi) where xi, is the ith observation for variable x and yi is the ith observation for variable y
pairs
Undercoverage or selection bias occurs when ________________________________________.
parts of the population are systematically left out
Nonresponse occurs when ___________________________________________.
people choose not to participate
Response bias occurs when _________________________.
people lie
-When we take a random sample, we can compute the sample mean and an interval of size ... around the mean. -Based on the ~68-95-99.7% rule, we can expect that: ... of all intervals computed with this method capture the parameter μ.
plus-or-minus 2σ/√n ~95%
q hat
point estimate for population proportion of *failures*
p hat
population proportion
Point estimate for p
population proportion of successes
... is a sample statistic representing the population correlation coefficient p
r
the value of ... is always in between -1 and 1 or -1 </= r </= 1
r
each factor can have any number of levels and if the row factor has "r" levels and the column factor has "c" levels, we say that the two-way table is an "..." table
r by c
the value of ... is the proportion of variation in y that is explained by x
r^2
in a ... event, outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of distributions
random
Requirements for making inferences about p, using r: 1. Paired data (x,y) must be a ... 2. A scatterplot must confirm that the points approximate a ... pattern 3. Outliers should be removed if they are known to be ...
random sample straight-line errors
in addition to x, there may be a variety of other factors affecting y, such as ... or other factors not included in the study
random variation
confidence interval
range of values with an associated probability -quantifies chance that interval contains unknown population parameter
When P-value ≤ α, we ... H0.
reject
When p < .05 (and equal .05 ) what do you do?
reject
What are the two decisions that you can make from performing a hypothesis test?
reject the null hypothesis fail to reject the null hypothesis
When the z score falls within the ... region (shaded area on the tail-side), the p-value is smaller than α and you have shown ....
rejection statistical significance
Experiments use _____________________: several or many individuals are studied.
replication
the vertical distances from each point to the least-squares regression line are called ... and the sum of all the residuals is by definition 0
residuals
When p > .05 what do you do?
retain (fail to reach significance)
Step 4
retain or reject null hypothesis
Case-control studies -> ________________________
retrospective
If n is not a whole number, then ___
round n up to the next whole number
Determine the number of outcomes in the event. Decide whether the event is a simple event or not. - A computer is used to select randomly a number between 1 and 9, inclusive. Event C is selecting selecting a number greater than 4.
sample space= 9 9-4=5 Event C= 5 outcomes simple event? no bc C has more than 1 outcome
A ________________________ is an observational study that relies on a random sample drawn from the entire population.
sample survey
different random ... taken from the same population will give different ... but there is a predictable pattern in the long run
samples statistics
a ... describes what would happen if we took all possible random samples of a fixed size n
sampling distribution
you should begin any investigation into the association between 2 variables by constructing a ...; that can have a positive, negative or no correlation
scatterplot
The ..., is the largest P-value tolerated for rejecting H0 (how much evidence against H0 we require). This value is decided arbitrarily before conducting the test.
significance level, α
Someone makes a claim about the unknown value of a population parameter. We check whether or not this claim makes sense in light of the "evidence" gathered (sample data).
significance tests
Define hypothesis
statement/explanation for obs from pop that can be testing w/research method
if the data have approximately a normal distribution, the normal quantile plot will have a roughly ... pattern
straight line
A ________________________________ has percentages of individuals of certain types.
stratified random sample
Cross-sectional studies -> ______________________
surveys
A two-tail or two-sided alternative is ...: Ha: µ ... [a specific value or another parameter]
symmetric not =
Use the given confidence interval to find the margin of error and the sample mean. (13.7,23.1)
take the avg of the endpoint (add both values and divide by 2) subtract the mean from upper endpoint to find margin of error mean- 18.4 margin of error-4.7
Use the given confidence interval to find the margin of error and the sample proportion. (0.772,0.798)
take the avg to find mean subtract mean to find an endpoint mean=0.785 margin of error= 0.013 p hat= margin of error + left endpoint= 0.785 margin of error= subtract endpoints and divide answer by 2
Sampling Error
the difference between the point estimate and the actual parameter value
permutations
the differences 5P3= 5!/2!
alpha level
the level of significance or criterion for a hypothesis test; is the largest probability of committing a Type I error that we will allow and still decide to reject the null hypothesis
p value
the probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true (the p value for obtaining a sample outcome is compared to the level of significance)
Type I error
the probability of rejecting a null hypothesis that is actually true (researchers directly control for the probability of committing this type of error)
Type II error (beta error)
the probability of retaining a null hypothesis that is actually false
Population Proportion
the probability of success in a single trial of a binomial experiment
Level of Confidence
the probability that the interval estimate contains the population parameter, assuming that the estimation process is repeated a large number of times
The point estimate for p is given by ___
the proportion of successes in a sample and is denoted by p = x/n where x is the number of successes in the sample and n is the sample size.
obtained value
the value of a test statistic (often compared to the critical value(s) of a hypothesis test to make a decision, when the obtained value exceeds a critical value, we decide to reject the null hypothesis, otherwise we retain the null hypothesis)
use a z test when
the variance is KNOWN
binomial parameters: -the parameter n is the ... number of observations -the parameter p is the probability of ... on each observation -the count of successes X can be any whole number between ... and ...
total successes 0 n
In a completely randomized experimental design, individuals are randomly assigned to groups, and then the groups are randomly assigned to ___________________.
treatments
... are used to represent probabilities graphically and facilitate computations
tree diagrams
What should you do in step 1?
use both words and symbols
f-test
use for tests concerning the inequality of 2 variances family of asymmetrical distributions bounded from below by 0 (like chi squared) defined by 2 degrees of freedom right skewed and rejection is always on the right side
the standard deviation of the sampling distribution measures how much the statistic x-bar ... from sample to sample
varies
null hypothesis, H0
very specific statement about parameter of populations
permutation
ways in which things are ordered -fit 7 ppl into 3 chairs -how many ways can we fit 3 balls into 2 cups? 7x6x5 OR 3x2
use of sampling distribution
we take one random sample of size n and rely on known properties of sampling distribution -remember 68, 95, 99.75% rule
lower and higher confidence levels present a ... situation
win/lose
variable ... is the independent, predictor, or explanatory variable
x
p hat =
x/n
Find the margin of error for the given values of c, s, and n. c=0.90, s=5, n=24
1.714 times (5/sqrt24)= 1.7
Level of confidence 95% = critical value ___
1.96
the ... is the count multiplied by the probability of any specific arrangement of the k successes
binomial probability
F-stat
sample deviation 1 ^2 / sample deviation 2 ^2
The most unbiased point estimate of the population mean is the ___
sample mean
Find the critical value Tc for the confidence level c=0.99 and sample size n=14.
3.012 T-distribution table: look up 0.99 and n-1 (13) and find corresponding number
Let p be the population proportion for the following condition. Find the point estimates for p and q. In a survey of 1487 adults from country A, 738 said that they were not confident that the food they eat in country A is safe. The point estimate for p hat is... q hat...
738/1487= 0.496 q hat= 1- Ans= 0.504
0!
=1
Hypothesis
A statement made about the value of a population parameter
P(AlB) not equal to P(BlA)
Baye's theorem
(T/F) To estimate the value of p, the population proportion of successes, use the point estimate x.
False, to estimate the value of p, use the point estimate p hat = x/n
3 things into 2 spaces
P (3,2)
the conditional probability of event b, given event A is: P(BlA) =
P(A and B)/P(A)
addition rule for disjoint events: P(A or B) =
P(A) + P(B)
a list or description of all possible outcomes of a random process
S or sample space
pooling
SP2 use when u you assume population variances are equal estimate drawn from the combination of two different samples
Third Step
Select the appropriate test statistic
Second Step
Set the level of risk (alpha level)
Find the P-value for the indicated hypothesis test with the given standardized test statistic, z. Decide whether to reject H0 for the given level of significance α. Two-tailed test with test statistic z=−2.18 and α=0.02
Since this is a two-tailed test and the test statistic is left of center, the P-value is twice the area to the left of the test statistic z-score-= normalcdf(-10000,-2.18,0,1) multiplied by 2 to find p-value= 0.0292
... P-values are strong evidence AGAINST H0 and we reject H0. The findings are "statistically significant."
Small
First Step
State the null and research hypotheses
Alternative Hypothesis (Ha)
The claim about the population that we are trying to find evidence for.
Polling Organisations
Use small samples to make inferences about a population
Why don't we use the sample mean during hypothesis testing?
We do not use the sample mean because we are using inferential statistics, not descriptive statistics
When do we reject the null hypothesis?
We reject the null hypothesis when the sample mean falls within the critical region
factorials (!)
used without replacement
When taking a random sample from a Normal population with known standard deviation σ, a level C confidence interval for µ is: x-bar +/- .../sqrt(n) or x-bar +/- m -m is the margin of error for this level C confidence interval. It is calculated using a ... (z*) and the standard deviation -σ/√n is the standard deviation of the ... distribution -C is the area under the ... between −z* and z*
z* z critical value sampling N(0,1)
a ... measures the number of standard deviations that a data value x is from the mean u
z-score
we can standardize data by competing a ...
z-score
for a normal quantile plot: the data points are ranked and the percentile ranks are converted to ...; the z-scores are then used for the ... axis and the actual data values are used for the ... axis; use technology to obtain normal quantile plots
z-scores horizontal vertical
if r is close to ..., we conclude that there is no significant linear correlation between x and y
zero
the area under N(0,1) for a single value of z is ...
zero
hypothesis testing
Comparing sample mean to the null hypothesis - the hypothesized/population value. If your data is unlikely, the null is rejected and if your data is likely, you fail to reject the null. As well, the NULL CAN NEVER BE ACCEPTED.
Find the critical value(s) and rejection region(s) for the type of z-test with level of significance α. Include a graph with your answer. Right-tailed test, α=0.10
The critical values are z=1.28 (1-alpha= 0.9; invNorm(0.9,0,1)= 1.28 The rejection region is z >1.28 Pick the right-tailed graph
degrees of freedom
The number of individual scores that can vary without changing the sample mean. Statistically written as 'N-1' where N represents the number of subjects.
p-value
The probability of observing a value as extreme as your data or greater under the assumption that the null hypothesis is true
Critical Value of a Test Statistic (tcrit)
The value of a test statistic that corresponds to a specified level of chance probability. Can determine with qt() function or from a table. You need to say how much that area is and the degrees of freedom.
What is the purpose of the null hypothesis?
To state that there is no difference within the experiment regarding intervention/manipulation
What is the purpose of the alternative hypothesis?
To state that there will be a significant difference somewhere based on intervention/manipulation
Hypothesis
a statement or proposed explanation for an observation, phenomenon, or scientific problem that can be tested using the research method. Often a statement about the value for a parameter in a population
Null Hypothesis
assuming something is true
We have ... that μ falls within the interval computed.
confidence C
Alpha
is always towards the tail, it is the cutoff point, it reveals a surprising value that would reject the null
a standardized sampling distribution is a
t distribution or standard normal distribution
P-value
the probability of obtaining a sample outcome , given that the value stated in the null hypothesis is true
Type 1 Error
the probability of rejecting a null hypothesis that is actually true
Z-stat
(X-mean)/(SD/root(n))
Find the minimum sample size n needed to estimate μ for the given values of c, σ, and E. c=0.98, σ=8.2, and E=2 E= margin of error c- confidence level
(Zc(sigma)/E ) squared Zc= 1-.98=0.02 1-0.02/2=0.99 invNorm(0.99)= 2.33 (2.33x6.4/1) squared=
power
(in hypothesis testing) the probability of rejecting a false null hypothesis (specifically, the probability that a randomly selected sample will show that the null hypothesis is false when the hypothesis is indeed false)
Determine whether the events E and F are independent or dependent. Justify your answer. - E. A person living at least 70 years. F: The same person regularly handling venomous snakes - E: A randomly selected person finding cheese revolting F: Another randomly selected person finding cheese delicious - E: The unusually foggy weather in London on May 8 F: The number of car accidents in London on May 8
-E and F are dependent because regularly handling venomous snakes can affect the probability of a person living at least 70 years - E cannot affect F and vice versa because the people were randomly selected, so the events are independent. -The unusually foggy weather in London on May 8 could affect the number of car accidents in London on May 8, so E and F are dependent.
Based on Cohen's d what is a small effect?
.2 - .5
What's a medium Cohens d?
.2 < d < .8
Based on Cohen's d what is a medium effect?
.5 - .8
binomial distributions ar skewed when p is close to ... or close to ... especially if the sample is small
0 1
probability rule: probabilities range from...
0 to 1
if x has the N(u, o) distribution than z has the N(...) distribution
0, 1
You toss a coin and randomly select a number from 0-9. What is the probability of getting tails and selecting a 9?
0.05 (1/20)
A probability experiment consists of rolling a 20-sided die. Find the probability of the event below. rolling a prime number
0.4
Nine of the 50 digital video recorders (DVRs) in an inventory are known to be defective. What is the probability you randomly select an item that is not defective?
0.82 (50-9=41) (41/50)
probability rule: the probability of the complete sample space S must equal ...
1
the closer r^2 gets to ..., the better the model explains the data
1
the total are under a density curve represents the whole population (sample space) and equals ... (100%)
1
when x is ... standard deviation larger than the mean then x = 1
1
68% of all observations are within ... of the mean
1 standard deviation
Steps of Hypothesis testing
1) State the hypothesis 2)Set the criteria for a decision 3)Compute the test statistic 4)Make decision
Assuming that no questions are left unanswered, in how many ways can a ten-question true/false quiz be answered?
1,024 2x2x2x2x2x2x2x2x2x2=1024
Find the critical value zc necessary to form a confidence interval at the level of confidence shown below. c = 0.81
1-0.81 = .19 .19/2 = .095 use technology or insert into a t-table to find the answer 1.31
Find the critical value Zc necessary to form a confidence interval at the level of confidence shown below. c=0.89
1-0.89 divided by 2 invNorm(Ans = 1.6
q hat
1-p hat
t-tests examples
1. Chi-Squared 2.Single sample 3. Paired 4. Two-sampled
Constructing a Confidence Interval for a Population Proportion
1. Identify the sample statistics n and x 2. Find the point estimate p hat 3. Verify that the sampling distribution of p hat can be approximated by a normal distribution 4. Find the critical value, that corresponds to the given level of confidence c 5. Find the margin of error E 6. Find the left and right endpoints and form the confidence interval
The Belmont Report was created partly in response to the Tuskegee Syphilis Study. Name the three main aims of the report.
1. Respect for persons 2. Beneficence 3. Justice
Hypothesis testing procedure
1. State hypothesis 2. Select appropriate test statistic 3. Specify level of significance 4. State the decision rule regarding the hypothesis 5. Collect the sample and calculate the sample statistics 6. Make a decision regarding the hypothesis 7. Make a decision based on the results of the test
Constructing a Confidence Interval for a Population Mean
1. Verify that standard deviation is known, and either the population is normally distributed or n is greater than or equal to 30 2. Find the sample statistics n and x bar 3. Find the critical value that corresponds to the given level of confidence 4. Find the margin of error E 5. Find the left and right endpoints and form the confidence interval
nonparametric test
1. not concerned with parameters 2. makes minimal assumptions about the populations from which the sample comes from use when: 1. when the data we use does not meet distribution assumptions 2. the data are given in ranks 3. hypothesis we are addressing does not concern a parameter
2 types of Hypotheses
1. null- what we want to reject. what we are testing for (Ho) 2. alternative- what we accept when the null hypothese is rejected (Ha)
4 possible outcomes to testing a null hypothesis
1. we reject a false null hypothesis- this is correct 2. we do not reject a true null hypothesis- this is correct 3. we reject a true null hypothesis- type 1 error 4. we do not reject a false null hypothesis- type 2 error
level of confidence 90% 95% 98% 99%
1.645 1.96 2.33 2.575
You have 13 different video games. How many different ways can you arrange the games side by side on a shelf?
13!
In a random sample of 50 refrigerators, the mean repair cost was $139.00 and the population standard deviation is $15.70. A 90% confidence interval for the population mean repair cost is (135.35,142.65). Change the sample size to n=100. Construct a 90% confidence interval for the population mean repair cost. Which confidence interval is wider? Explain.
15.7 divided by sqrt of 100 times 1.645 plus and minus the mean to get (136.42, 141.58) The n=50 confidence interval is wider because a smaller sample is taken, giving less information about the population.
when x is ... standard deviations larger than the mean then x = 2
2
Researchers found that people with depression are five times more likely to have a breathing-related sleep disorder than people who are not depressed. Identify the two events described in the study. Do the results indicate that the events are independent or dependent?
2 events= depressions and breathing-related sleep disorder dependent
about 95% of all observations are within ... of the mean
2 standard deviations
a sample size of ... or more is generally enough to obtain a normal sampling distribution from a skewed population, even with mild outliers in the sample
25
Outside a home, there is a 6-key keypad with letters A, B, C, D, E and F that can be used to open the garage if the correct six-letter code is entered. Each key may be used only once. How many codes are possible?
6!= 720
A restaurant offers a $12 dinner special that has 7 choices for an appetizer, 11 choices for an entrée, and 4 choices for a dessert. How many different meals are available when you select an appetizer, an entrée, and a dessert?
7x11x4=308
Space shuttle astronauts each consume an average of 3000 calories per day. One meal normally consists of a main dish, a vegetable dish, and two different desserts. The astronauts can choose from 10 main dishes, 9 vegetable dishes, and 14 desserts. How many different meals are possible?
8190 10x9x (desserts) 2 desserts (14x13/2)
Margin of error for the given values of c, σ, and n. c = .9 σ = 5.1 n = 121
90% = 1.645 E = zc (σ/√n) E = 1.645 (5.1/ √121) = .763
You are given the sample mean and the population standard deviation. Use this information to construct the 90% and 95% confidence intervals for the population mean. Interpret the results and compare the widths of the confidence intervals. A random sample of 40 home theater systems has a mean price of $145.00. Assume the population standard deviation is $15.50. n= 40 mu= 145 sigma= 15.50
90%- find the margin of error Zc 90%= 1.645 1.645(15.5/sqrt40)= 4.03 find the left endpoint (145-4.03)=140.97 find the right endpoint (145+4.03)=149.03 With 90% confidence, it can be said that the population mean price lies in the first interval. With 95% confidence, it can be said that the population mean price lies in the second interval. The 95% confidence interval is wider than the 90%.
For the same sample statistics, which level of confidence would produce the widest confidence interval?
99%, because as the level of confidence increases, zc increases.
Based on Cohen's d what is a large effect?
> .8
use the normal approximation for binomial when both np and nq are ...
>/= 10
Significance level
A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true.
When you calculate the number of permutations of n distinct objects taken r at a time, what are you counting?
A permutation is an ordered arrangement of objects. The number of different permutations of n distinct objects is n!. The number of ordered arrangements of n objects taken r at a time.
How many different 10-letter words (real or imaginary) can be formed from the following letters? B, B Z, Z N, N J, A, K, C
A permutation of nondistinct items without replacement is the number of ways n objects can be arranged (order matters) in which there are n1 of one kind, n2 of a second kind, and n Subscript k of a kth kind, where n=n1+n2 +..+nk. The number of such permutations is given by the following formula. 10!/2!x2!x2!=453600
paired samples t-test
A special type of single-sample t-test. The sampling unit gets both treatments and within it, we take the difference between each pair of measurements and compare whether the difference between pairs of data is different from a mean of 0. Ho: There is no difference between the two groups (mean difference=0) Ha: There is a difference between the two groups (doesn't = 0)
For the same sample statistics, which level of confidence would produce the widest confidence interval?
As the level of confidence increases, Zc increases causing wider intervals. 99%, because as the level of confidence increases, Zc increases.
we express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p:
B(n, p)
Hypothesis Testing
Can help polling organisations to assess the accuracy of their predictions
Classify the statement as an example of classical probability, empirical probability, or subjective probability. Explain your reasoning. The probability of choosing five numbers from 1 to 36 to match five numbers drawn by the lottery is 1/376,992 almost equals 0.0000027 .
Classical because each outcome in the sample space is equally likely
Give at least one difference and one similarity between "hypothesis testing" and "estimation" for a population mean.
Differences: 1) Hypothesis testing leads to a yes/no (reject/fail to reject) decision while estimation produces a numeric value (e.g. estimate of the population mean), or a pair of values (.e.g confidence interval for the population mean), but does not result in an immediate yes/no decision. 2) Hypothesis testing answers the question "is a pre-determined population mean likely, based on what we saw in the sample?" Estimation of the mean provides an estimate of the population mean, working from the information in the sample. Similarities: 1) Both hypothesis testing and estimation are done using one sample from the population to make an inference about the population. 2) Both hypothesis testing and estimation rely on the sample being randomly selected from the population. 3) Both hypothesis testing and estimation make use of the sample mean and the standard error from the sample as part of their calculations. They just finish up with different calculations afterwards. 4) Both hypothesis testing and estimation involve the concept of the sampling distribution, and relating the probability of a 'rare event' in the tails of that sampling distribution to what was observed in the sample.
Explain how the complement can be used to find the probability of getting at least one item of a particular type.
Getting "none of the items" is the set of all outcomes in the sample space that are not included in "at least one item." Using the definition of the complement of an event and the fact that the sum of the probabilities of all outcomes is 1, the following formula is obtained. P(at least one item)equals= 1−P(none of the items)
Finding a minimum sample size to estimate mu
Given a c-confidence level and a margin of error E, the minimum sample size n needed to estimate the population men is n =
Whats the symbol for null hypothesis?
H0
What hypotheses are there?
H0 - the null hypothesis - the hypothesis that u assume to be correct H1- the alternative hypothesis - tells you about the parameter if the H0 is shown to be wrong
Whats the symbol for alternative hypothesis
H1
Why might you want to have a narrower confidence interval when doing statistical inference?
If you have a narrower confidence interval for a variable, then if your null hypothesis is not true, you are more likely to collect a sample that leads to the null hypothesis mean being outside the confidence interval, meaning that you would correctly reject the null hypothesis. With a wider confidence interval, the null hypothesis mean is in principle more likely to fall in the interval, and so you might fail to reject the null, even though it isn't true.
If your sample mean is towards the center of the null distribution, what does that tell you, and what would the hypothesis test result be? E.g.
If you see a sample mean that is close to the center of the standardized null distribution, as shown, this indicates that the data and the null distribution/null hypothesis are not conflict: there is a high probability that your sample could have been observed if the null hypothesis were true. In probabilistic terms, we say that the probability of your observed value or something more extreme is high under the null distribution. This is indicated by the large area in the tail of the distribution up to the tobs value. In hypothesis testing terms, you would fail to reject the null hypothesis, because there is no substantial evidence that your sample isn't from the null distribution.
If your sample mean is towards the tails of the null distribution, what does that tell you, and what would the hypothesis test result be? E.g.
If you see a sample mean that is far from the center of the standardized null distribution, as shown, this indicates that the data and the null distribution/null hypothesis are in disagreement at some level, there is only a low probability that your sample could have been observed if the null hypothesis were true. In probabilistic terms, we say that the probability of your observed value or something more extreme is low under the null distribution. This is indicated by the small area in the tail of the distribution up to the tobs value. In hypothesis testing terms, you would reject the null hypothesis, because either (a) your sample is just a very strange/unexpected sample from the null (but that has low probability, as indicated bythe very small tail), or (b) the null hypothesis actually isn't true, and your sample came from a different population. Since the probability of (a) is very small, it makes sense to say that there is strong evidence for (b).
Explain the difference between the z-test for μ using rejection region(s) and the z-test for μ using a P-value.
In the z-test using rejection region(s), the test statistic is compared with critical values. The z-test using a P-value compares the P-value with the level of significance α. A rejection region (or critical region) of the sampling distribution is the range of values for which the null hypothesis is not probable. A critical value z0 separates the rejection region from the nonrejection region. To use a rejection region to conduct a z-test, calculate the standardized test statistic z. If the standardized test statistic is in the rejection region, then reject H0. If the standardized test statistic is not in the rejection region, then fail to reject H0. To use a P-value to make a conclusion in a hypothesis test, compare the P-value with α. If P ≤ α, then reject H0. If P > α, then fail to reject H0.
When it comes to the process of hypothesis testing, what is the specific type of statistics we use?
Inferential Statistics
independent two-sample t-test
Involves more than 1 group and is used to evaluate whether two populations have different means. H0: mean 1= mean 2 HA: mean 1 doesn't =mean2
correlations are calculated using means and standard deviations and so they are ... resistant outliers
NOT
Decide if the events are mutually exclusive. Event A: Electing a president of the United StatesElecting a president of the United States Event B: Electing a female candidate
No, cuz someone who is elected to be President can be female.
What does the symbol Ho stand for?
Null Hypothesis
Paired and independent two-sampled tests are similar but how do they differ in comparing means between 2 groups?
Paired sample designs reduce variation among sampling units from other factors while independent sample designs have greater statistical degrees of freedom for the same effort and can discern differences
describes the count X occurrences of an event in fixed, finite intervals of time or space when -occurrences are all ... -and the probability of an occurrence is the ... over all possible intervals
Poisson distribution independent same
What is the rejection region?
Region that is the representation of whether or not we should reject the null hypothesis - aka critical region
Steps in Hypothesis Testing
Steps: 1.State null (Ho) and alternative (Ha) hypothesis 2. Establish your null distribution and test statistic i)Change x-axis to mean values. Insert sampling and null distributions; can compare how far away your data is. ii)Standardize to t-score (refer to equation) iii)Set alpha (significance level). Usually 5%. If your data is further from this, it's significant iv) calculate tobs v) calculate p-value. 3. Conduct statistical test Compare data to null hypothesis via statistical test. P>α, fail to reject the null hypothesis. P≤α, reject null hypothesis. 4. Draw conclusions 1)P≤α reveals that the mean is significantly less than null hypothesis and that the data provide strong evidence that the sample is not from the null hypothesis 2)P>α, reveals that the mean is NOT significantly less than null hypothesis and that the data do not provide strong evidence that the sample is not from the null hypothesis
chi squared test of independence
Tests for independence among categorical variables. Ho: Categorical variables are independent and HA: Categorical variables are not independent For our example of colour blindness, a Chi-squared test hypothesis would be: Ho: There is no difference in the degree of colour blindness between males and females HA: There is a difference in the degree of colour blindness between males and females
What assumption is made when thinking about the test statistic, tobs, and the null distribution?
The assumption for the null distribution is that the null hypothesis is true. Since the null hypothesis is essentially "everything is as expected here, nothing interesting is happening", the null distribution gives you the distribution of sample means you would expect to see, just due to sampling variation
Use the values on the number line to find the sampling error. x bar= 3.8 mu= 4.25
The difference between the point estimate and the actual parameter value is called the sampling error. x bar - mu 3.8-4.25= -0.45
game 1: 1/10 game 2: 1:10 which is better to play?
The probability of winning the first game is 1/10. The probability of winning the second game is number of wins/ number of outcomes= 1/11 Since the second probability is smaller, it would be wiser to play the first game.
Test Statistic
The result of the experiment or the statistic that's calculated from the sample
(T/F) The point estimate for the population proportion of failures is 1 - p hat
The statement is true
single-sample T-tests
This is used to compare a single obtained sample mean to a known or hypothesized population mean. Can be 1-tailed or 2-tailed. For example (2-tailed) Ho: There is no difference between the mean number of eggs per fish in the sample and the threshold of 1100 Ha: There is a difference between the mean number of eggs per fish in the sample and the minimum threshold of 110. For example (1-tailed) Ho: The mean number of eggs per fish in the sample is not less than the threshold of 1100. Ha: The mean number of eggs per fish in the sample is less than the threshold of 1100.
If two events are mutually exclusive, they have no outcomes in common.
True
T/F If two events are independent, P(A|B)equals=P(B).
True Two events A and B are independent if P(B|A)=P(B) or if P(A|B)=P(A).
Find the margin of error for the given values of c, σ, and n. c=0.95, σ=2.9, n=64
Zc x (sigma/sqrt of n) c= .95 (Zc of 1.96) 0.711
null hypothesis
a claim about a population parameter (i.e. mean) that takes a skeptical viewpoint (Ho). For example, Ho: The flu vaccine has no effect
alternative hypothesis
a claim about a population parameter that represents eveything not included in the null hypothesis (Ha) Ha The flu vaccine has an effect
level of significance, significance level
a criterion of judgement upon which a decision is made regarding the value stated in a null hypothesis; the criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true (usually at 5%; less than 5% = reject the null)
Significance Level
a criterion of judgement upon which a decision is made regarding the value stated that the actual value of pop. parameter is less than, greater than, or not equal to the value stated in the null hypothesis
Critical Value
a cutoff value that defines the boundaries beyond which less than 5% of sample means can be obtained if null is true (sample means beyond this are rejected)
critical value
a cutoff value that defines the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true (sample means obtained beyond a critical value will result in a decision to reject the null hypothesis)
one-tailed test
a directional test, reflecting a directional hypothesis. For example, we are expecting "the mean to be less than [mean value] this mean value".
two-tailed test
a hypothesis test in which the research hypothesis does not indicate a direction of the mean difference or change in the dependent variable, but merely indicates that there will be a mean difference
test statistic
a mathematical formula that identifies how far or how many standard deviations a sample outcome is from the value stated in a null hypothesis; allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true (value is used to make a decision regarding a null hypothesis)
Cohen's d
a measure of effect size in terms of the number of standard deviations that mean scores shifted above or below the population mean stated by the null hypothesis (larger the value of the d, larger the effect in the population)
hypothesis testing, significance testing
a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample; in this method, we test a hypothesis by determining the likelihood that a sample statistic would be selected if the hypothesis regarding the population parameter were true
Population Parameter
a numerically valued attribute of a model for a population
Alternative Hypothesis
a statement that directly contradicts the null and offers all other possible solutions
test statistic (t-test)
a statistic whose value helps determine whether a null hypothesis should be rejected and is used to compare the means of two groups.
One Sample Z-Test
a statistical procedure used to test hypotheses concerning the mean in a single population with a known variance
t-test
a statistical test used to evaluate the size and significance of the difference between two means
State whether the standardized test statistic z indicates that you should reject the null hypothesis. (left-tailed) (a) z=1.208 (b) z=−1.364 (c) z=−1.467 (d) z=- 1.189 a) For z=1.208, should you reject or fail to reject the null hypothesis?
a) Fail to reject H0 because z > −1.285. b)Reject H0 because z< −1.285. c) Reject H0 because z<−1.285. d)Fail to reject H0 becuase z > -1.285
A light bulb manufacturer guarantees that the mean life of a certain type of light bulb is at least 750 hours. A random sample of 24 light bulbs has a mean life of 728 hours. Assume the population is normally distributed and the population standard deviation is 65 hours. At α=0.02, do you have enough evidence to reject the manufacturer's claim? This is left-tailed. a)) Identify the null hypothesis and alternative hypothesis. b) Identify the critical value(s). C) Identify the standardized test statistic d) Decide whether to reject or fail to reject the null hypothesis (e) interpret the decision in the context of the original claim.
a) H0: mu equal than > 750 (claim) Ha: mu < 750 b) null z: invNorm(alpha,0,1)= -2.05 c)z= -1.66 (used STAT tests menu, p. 368) d) Fail to reject H0. There is not sufficient evidence to reject the claim that mean bulb life is at least 750 hours. e)
Determine whether to reject or fail to reject H0 at the level of significance of a)α=0.07 and b) α=0.02. H0: μ=123, Ha: μ≠123, and P=0.0396
a) Reject H0 because P<0.07 b) Fail to reject H0 because P>0.02
During a 52-week period, a company paid overtime wages for 16 weeks and hired temporary help for 7 weeks. During 4 weeks, the company paid overtime and hired temporary help. Complete parts (a) and (b) below. (a) Are the events "selecting a week that contained overtime wages" and "selecting a week that contained temporary help wages" mutually exclusive? (b) If an auditor randomly examined the payroll records for only one week, what is the probability that the payroll for that week contained overtime wages or temporary help wages?
a.) No b.)0.365 (30/52 +7/52 -4/52)
A company that makes cartons finds that the probability of producing a carton with a puncture is 0.03, the probability that a carton has a smashed corner is 0.08, and the probability that a carton has a puncture and has a smashed corner is 0.002 a.) mutually exclusive? b.) If a quality inspector randomly selects a carton, find the probability that the carton has a puncture or has a smashed corner.
a.) no b.)0.108 (.03+.08-.002)
(a) List an example of two events that are independent. (b) List an example of two events that are dependent.
a.) rolling a die twice b.) Drawing one card from a standard deck, not replacing it, and then selecting another card
estimation
allows us to describe the distribution of the population parameters (How large is the effect)
Whats another name for a type 1 error?
alpha
Because a two-sided test is symmetric, you can easily use a confidence interval to test a two-sided hypothesis. C = 1-... You just have to do 1- C and divide by ...
alpha 2
The probability that the test statistic will fall inside rejection region due to chance alone is equal to:
alpha one minus confidence interval the significance level
A null and alternative hypothesis are given. Determine whether the hypothesis test is left-tailed, right-tailed, or two-tailed. H0:σ ≥ 66 Ha: σ < 6
always determined by Ha -less than (L tail), greater than (R tail), not equal to (split tail) left-tailed test
The confidence level C (in %) represents an ... of corresponding size C under the sampling distribution.
area
Find the P-value for a left-tailed hypothesis test with a test statistic of Z=−1.15. Decide whether to reject H0 if the level of significance is α=0.05
area to the left of z= normalcdf(-10,000, -1.15,0,1)= 0.1251 p-value= 0.1251 To use a P-value to make a conclusion in a hypothesis test, compare the P-value with alphaα. If P≤ α, then reject H0. If P>α, then fail to reject H0. Since P>α, fail to reject H0.
error of interpreting r: data based on ...
averages
can be analyzed to determine if there is an association between the two variables
bivariate (paired) data
Determine which numbers could not be used to represent the probability of an event.
can't be less than 0 or greater than 1 -can be % -can be fraction -can be any decimal places (not just two)
If two events are mutually exclusive, why is P(A and B)=0?
cannot occur at the same time Two events are said to be mutually exclusive if they cannot occur simultaneously.
error of interpreting r: concluding that correlation implies ...
causality
example: even though the population of a is strongly skewed, the sampling distribution of x-bar when n=25 is approximately normal, as expected from the ...
central limit theorem
when randomly sampling from any population with mean u and standard deviation o, when n is large enough, the sampling distribution of x-bar is approximately normal
central limit theorem
Statistical significance only says whether the effect observed is likely to be due to ... because of random sampling.
chance alone
The number of ways a five- member committee can be chosen from 10 people.
combo- order doesnt matter- all equal positions
Cohort studies enlist individuals of _______________________________, and keep track of them over a long period of time.
common demographic
With significance tests, you should plot your results, ... them with a baseline or similar studies.
compare
Use the given statement to represent a claim. Write its complement and state which is H0 and which is Ha. sigma = 3
complement: sigma does not equal 3 H0: sigma =3 Ha: sigma does not euqal 3
continuous variables that can take on any one of an infinite number of possible values over an interval
continuous sample space
exists between two variables when one of them is linearly related to the other in some way
correlation
averages may suppress individual variation and may inflate the ...
correlation coefficient
significance, statistical significance
describes a decision made concerning a value stated in the null hypothesis (when the null hypothesis is rejected, we reach significance; when the null hypothesis is retained, we fail to reach significance)
alternative hypothesis
designated Ha, is what is concluded if there is sufficient evidence to reject the null hypothesis -The alternative hypothesis can be one-sided or two-sided. A one-sided test is referred to as a one-tailed test, and a two-sided test is referred to as a two-tailed test.
Step 1
determine null and alternative hypothesis
What's an effect size?
diff btw hypothesized parameter values
Effect
difference between a sample mean and the population mean stated in the null (significant to reject the null)
combinations
division 5C3= (5P3)/3!
Whats the diff between effect size and statistical sign?
effect size is practical
if we conclude that there is a linear correlation between x and y, we can find a linear equation that expresses y in terms of x and that ... can be used to predict the values of y for given values of x (simple linear regression)
equation
Use the confidence interval to find the estimated margin of error. Then find the sample mean. A biologist reports a confidence interval of(2.3,3.5) when estimating the mean height (in centimeters) of a sample of seedlings.
estimated margin of error= 3.5-2.3 (all divided by 2)= 0.6 sample mean=2.3+0.6 (left margin plus) OR 3.5-0.6 (right margin/upper limit MINUS) =2.9
an ... is a subset of the sample space
event
When P-value > α, we ... H0.
fail to reject
t?f If you want to support a claim, write it as your null hypothesis.
false. If you want to support a claim, write it as your alternative hypothesis. A hypothesis test can only reject or fail to reject the null hypothesis. Failing to reject the null hypothesis does not mean that the null hypothesis is true. So to support a claim, the desired result of the test would be to reject the opposite of that claim. Thus, the opposite of the claim should be stated as the null hypothesis, and the claim should be the alternative hypothesis.
Step 2
find critical value, how many tails?, p-value
Construct the confidence interval for the population mean μ. c=0.98, x bar=4.3, σ=0.6, and n=50
find margin of error then subtract it from xbar to find left value, add to xbar to find right value (upper limit) invNorm(0.98=2.05 (=Zc) 2.05(0.6/sqrt50)=0.174 plus and minus xbar= 4.13, 4.47
central limit theorem
for any given distribution with a mean and variance, the sampling distribution of the mean approaches a normal distribution as sample size increase (aka sd/n)
r^2 represents the ... of the variation in y that is explained by the regression model
fraction
The alternative hypothesis, Ha, is a more ... statement that complements yet is mutually exclusive with the null hypothesis.
general
a conditional percent is computed using the counts within a single row or a single column and the denominator is the corresponding row or column total rather than the table ... total
grand
most of the time we just don't know if the population is normal and all we have is sample data so: -we can summarize the data with a ... and describe its shape -if the sample is ..., the shape of the histogram should be similar to the shape of the population distribution -the ... can help guess whether the sampling distribution should look roughly normal or not
histogram random central limit theorem
List 2 ways to calc effect size
how far scores shifted in pop, percent of variance that can be explained by given variable
A test of statistical significance tests a specific ... using sample data to decide on the validity of the hypothesis.
hypothesis
nondirectional tests, two-tailed tests
hypothesis tests in which the alternative hypothesis is stated as not equal to a value stated in the null hypothesis; hence, the researcher is interested in any alternative to the null hypothesis
the ... is a necessary mathematical descriptor of the regression line and it does not describe a specific property of the data
intercept
you can use the equation of the ... to predict y for any value of x within the range studied
least-squares regression
the ... is the unique line such that the same of the ... between the data points and the line is zero, and the sum of the squared vertical distance is the smallest possible
least-squares regression line vertical distances
averages are ... variable than individual observations
less
When the standard error of a sample mean is decreased by increasing n, it becomes ___
less variable
If an experiment has several factors, a treatment is a combination of specific _____________ of each factor.
levels
if r is close to -1 or 1, we conclude that there is significant ... correlation
linear
least squares regression is only for ... so always plot the raw data to confirm
linear associations
... measures the strength of the linear association between paired x and y qualitative values in a sample
linear correlation coefficient r
t-test v normal test
mean= o t test sd > 1 but normal = 1 t-test has fatter tails
Cohen's D
measure of effect size in terms of # of SD's that the mean scores shifted above or below the pop. mean stated by the null
For the statement below, write the claim as a mathematical statement. State the null and alternative hypotheses and identify which represents the claim. A laptop manufacturer claims that the mean life of the battery for a certain model of laptop is less than 9 hours.
mu < 9 H0: mu greater than or equal to 9 Ha: mu < 9 The alternative hypothesis Ha: μ < 9 is the claim.
For the statement below, write the claim as a mathematical statement. State the null and alternative hypotheses and identify which represents the claim. A laptop manufacturer claims that the mean life of the battery for a certain model of laptop is less than 3 hours.
mu <3 hours H0: mu > or equal to 3 hours (equality is always null) Ha: mu < 3hours The alternative hypothesis Ha: μ<3 is the claim.
In a random sample of 24 people, the mean commute time to work was 32.8 minutes and the standard deviation was 7.3 minutes. Assume the population is normally distributed and use a t-distribution to construct a 90% confidence interval for the population mean μ. What is the margin of error of μ? Interpret the results.
n=24 mean=32.8 sigma= 7.3 Tc=1.714 margin of error= 2.6 confidence interval= 30.2, 35.4 With 90% confidence, it can be said that the population mean commute time is between the bounds of the confidence interval.
A ______________________ is a control in which the outcome is expected to stay the same or no response is expected.
negative control
when the sampling distribution is ..., we can standardize the value of a sample mean x-bar to obtain a z-score and this z-score can then be used to find areas under the sampling distribution from the normal probability table
normal
one way to assess if a data set has an approximately normal distribution is to plot the data on a ....
normal quantile plot
a family of symmetrical, bell-shaped curves defined by a mean (u) and a standard deviation (o); N(u, o)
normal/ Gaussian distribution
Find the P-value for the indicated hypothesis test with the given standardized test statistic, z. Decide whether to reject H0 for the given level of significance α. Right-tailed test with test statistic z=1.29 and α=0.04
normalcdf(1.29,10,000,0,1) p-value= 0.0985 Fail to reject H0 cuz p-value is higher than alpha
when a variable in a population is normally distributed, the sampling distribution of the sample mean x-bar is also ...
normally distributed
P-values that are ... don't give enough evidence against H0 and we fail to reject H0. Beware: We can never "prove H0."
not small
What are the two types of hypotheses used in a hypothesis test? How are they related?
null and alternative The null hypothesis H0 is a statistical hypothesis that contains a statement of equality, such as ≤, =, or ≥. The alternative hypothesis Ha is the complement of the null hypothesis. It is a statement that must be true if H0 is false and it contains a statement of strict inequality, such as >, ≠, or <. They are complements
A ________________ is a number summarizing a characteristic of the population while a _________________ is a number summarizing a characteristic of a sample.
parameter; statistic
when x is larger then the mean the z is ...
positive
A _________________________ is a control in which the outcome is expected to change.
positive control
Type 3 Error
possible with directional tests in which a decision would have been to reject the null, but the researcher decides to retain the null because the rejection region was located in the wrong tail
Statistical significance may not be ... important.
practically
What's the p value?
prob of getting extreme results given that null is true
Whats a type 2 error?
prob of keeping false null (false negative)
What's a type 1 error?
prob of rejecting h0 when its true (false positive)
Define power
prob of rejecting null assuming alternative is true
a ... is assigned for each possible simple event in the sample space S
probability
we define the ... of any outcome of a random phenomenon as the proportion of times the outcome would occur in a very long series of repetitions
probability
mathematically describe the outcome of random processes
probability models
Type II error
probability of failing to reject null hypothesis when it is false/false negatives. It is something that we have no control over. For example, Display Ad A is not effective in driving conversations but is accepted as true.
Type I error
probability of rejecting null hypothesis when it is true/false positive. It is something that we decide. For example, a person is judged as guilty when the person actually did not commit the crime
A study found that 34% of the assisted reproductive technology (ART) cycles resulted in pregnancies. Twenty-five percent of the ART pregnancies resulted in multiple births.
probability that a randomly selected ART cycle resulted in a pregnancy and produced a multiple birth= (.34x0.25)= 0.085 The probability that a randomly selected ART cycle that resulted in a pregnancy did not produce a multiple birth= 0.750 unusual? No, this is not unusual because the probability is not less than or equal to 0.05
P-value
probability, if H0 was true, of obtaining a sample statistic at least as extreme as one obtained -small: statistically significant, reject null -large: don't give enough evidence against H0, fail to reject...we can NEVER prove H0
A probability experiment consists of rolling a eight-sided die and spinning the spinner shown at the right (4 colors). The spinner is equally likely to land on each color. Use a tree diagram to find the probability of the given event. Then tell whether the event can be considered unusual. -Event: rolling a number less than 4 and the spinner landing on red
probability= 0.094 (3/32) unusual? No, bc it's not close enough to 0. (An event that occurs with a probability of 0.05 or less is typically considered unusual.)
Cohort studies -> _____________________
prospective
The point estimate for the population proportion of failures is
q = 1- p
The margin of error does not cover all errors: The margin of error in a confidence interval covers only ... Undercoverage, nonresponse or other forms of bias are often more ... than random sampling error (e.g., our elections polls). The margin of error does not take these into account at all.
random sampling error serious
Experiments ____________________ the assignment of subjects to treatments.
randomize
Caution About Z Procedures for a Mean: -The data must be a probability sample or come from a .... Statistical inference cannot remedy basic design flaws, such as voluntary response samples or uncontrolled experiments. -The sampling distribution must be approximately .... This is not true in all instances (if the population is skewed, you will need a large enough sample size to apply the central limit theorem). -To use a z procedure for a population mean, we must know ..., the population standard deviation. This is often an ... requisite.
randomized experiment Normal σ unrealistic
A _____________________________ design gives two or more treatments to each subject over time, in random order.
repeated measures
Use of Sampling Distributions: -If the population is N(μ,σ), the ... is N(μ,σ/√n). -If not, the sampling distribution is ~N(μ,σ/√n) if n is .... -We take one random sample of size n, and rely on the ... of the sampling distribution.
sampling distribution large enough known properties
the ... is the probability distribution of that statistic for samples of a given size n taken from a given population
sampling distribution of a statistic
the value of r does not change if all values of either variable are converted to a different ...
scale
this regression equation expresses an association between x and y
simple linear regression
a linear regression model with one predictor variable is a .... model
simple linear regression (SLR)
A ________________________________ is made of randomly selected individuals.
simple random sample, SRS
the ... of the regression line describes how much we expect y to change, on average for every unit change in x
slope
With a large sample size, even a ... effect could be significant.
small
Because ... have a lot of chance variation, even large population effects can fail to be significant if the sample is small.
small random samples
don't subtract the z-value because normal curves are not ...
square
establishing causation from an observed association can be done if: 1. the association is ... 2. the association is ... 3. higher doses are associated with ... responses 4. the alleged cause precedes the ... 5. the alleged cause is ...
strong consistent stronger effect plausible
the probability that a binomial random variable takes any range of values is the ... of each probability for getting exactly that many successes in n observations
sum
effect
the difference between a sample mean and the population mean stated in the null hypothesis (an effect is not significant when we retain the null hypothesis; an effect is significant when we reject the null hypothesis)
the probability that the confidence interval contains p is c, assuming that ___
the estimation process is repeated a large number of times
Type II error
the failure to reject the null hypothesis when it is actually false.
Margin of Error
the greatest possible distance between the point estimate and the value of the parameter it is estimating
confidence interval
the range of values within which a population parameter is estimated to lie
rejection region
the region beyond a critical value in a hypothesis test (when the value of a test statistic is in the rejection region, we decide to reject the null hypothesis, otherwise we retain the null hypothesis)
Type I error
the rejection of the null hypothesis when it is actually true
A ____________________ is any specific experimental condition applied to the subjects.
treatment
A matched pairs design chooses pairs of subjects that are closely matched, like twins, and each pair is randomly assigned ____________________.
treatments
Baye's theorem can be extended to events with more than ... outcomes
two
the mean of the sampling distribution x-bar is ...
u
there is no tendency for a sample average to fall systematically above or below ..., even if the population distribution is ...
u skewed
Voluntary response sampling and convenience sampling are biased while probability sampling is __________________.
unbiased
x-bar is an ... estimate of the population mean u
unbiased
A confidence interval is a range of values with an associated probability, or confidence level, C. This probability quantifies the chance that the interval contains the ....
unknown population parameter
t test
use when test concerns the value of an underlying or population mean hypothesis testing using a statistic (t-stat) & follows t-distiribution. t distribution is a probability distribution defined by a single parameter, degrees of freedom. each degree of freedom = one distribution in the family of distributions mean = 0 sd > 1 more prob for outcome distant from mean (fatter tails) as # of degrees of freedom increase w sample size, the t-distribution approaches the standard normal distribution use for tests w population mean of a normally distributed population w unknown variances population variance = sd!!!
chi-square test
used for hypothesis tests concerning the variance of a normally distributed population (n-1)samp(sd^2)/SD^2
Obtained Value
value of a test statistic
Critical Values
values that separate sample statistics that are probable from sample statistics that are improbable, or unusual
a statistic computed from a random sample is a random ...
variable
spearman rank correlation coefficient Rs
when data to do a t test with 2 variables based on correlation coefficient meaningfully departs from distribution assumptions. essentially the same thing as correlation coefficient, but calculated on the ranks of two variable. gives number -1 to 1. -1 = perfectly inverse relationship 1 = perfectly linear 0= no correlation
test for population mean
with known standard deviation sigma: z=(xbar - u0)/(sigma/sqrt(n)) then use the chart to see where p-value falls near alpha, but doesn't provide any information about true population mean u
the value of r is not affected by the choice of ... and ...
x y
In a survey of 608 males ages 18-64, 396 say they have gone to the dentist in the past year. Construct 90% and 95% confidence intervals for the population proportion. Interpret the results and compare the widths of the confidence intervals. If convenient, use technology to construct the confidence intervals.
x= 396 n=608 p hat= 0.651 q hat= 0.349 Zc 90%= 1.645 margin of error=sqrt (p hat times q hat/n) times 1.645= 0.015 endpoint L= p hat - E= 0.619 endpoint R= p hat + E= 0.683 With the given confidence, it can be said that the population proportion of males ages 18-64 who say they have gone to the dentist in the past year is between the endpoints of the given confidence interval. The 95% confidence interval is wider.
p hat= x/n
x= number of successes in survey n= sample size
variable ... is the dependent or response variable
y
... = intercept + slope x
y-hat
... is the predicted value of y for a given value of x
y-hat
the probability of an event being equal to a single numerical value is ... when the sample space is continuous
zero
In hypothesis testing, does choosing between the critical value method or the P-value method affect your conclusion?
No, because both involve comparing the test statistic's probability with the level of significance The P-value method converts the standardized test statistic to a probability (P-value) and compares this with the level of significance, whereas the critical value method converts the level of significance to a z-score and compares this with the standardized test statistic. Thus, both methods will result in the same conclusion.