STATS FINAL SEM TWO MAJOR STUDY SET
how to analyze spread in a distribution:
use standard deviation (with mean) or variance, lower and upper Quartile (with median)
what is extrapolation
using the data and least square regression line to look at predicted values outside the range of the given least square regression line
how are variance and standard deviation related?
variance is the standard deviation square rooted Sx= square root of 1/ n-1 (x1- xbar)^2
when analyzing events with multiple outcomes, what visual aide will be the most beneficial?
ven diagram
how do we interpret a confidence INTERVAL?
we are # % confident that the interval from minimum to maximum contains the true parameter (in context)
how to analyze shape in a distribution:
Normal distribution shape: median=mean Skewed left shape: median > mean skewed right shape: mean > median uniform shape: mean = median Bimodal: mean = median
to analyze a normal distribution, and find a probability of a sample statistic occurring, given an assumed population mean and standard deviation with the function in the calculator.....
Normalacdf
finding p-values using t-distributions....
2nd vars (distribution): 6: tcdf: lower, upper, df
empirical rule
68-95-99.7 for normal distributions
how to calculate mean of binomial curve
(n)(p)= mean
chi-square test statistic is....(and it is the same for alllll tests)
(observed-expected)^2 / expected
Stratified Random Sample
(only for observational study) individuals are classified into groups by shared characteristic (strata) , then sampled within the groups -Pros: balances representation of each type of individual, improves results because of equal representation, reduces variability of possible sample results -Cons: complex, resource demanding
how do you calculate the probability of an outcome
(p)= # of corresponding outcomes to event A/ total # of outcomes in sample space
how do you interpret the y-intercept for the least-square regression line
(x) is 0 when y-intercept is (blank)
Conditions: Large counts (sample size)
-ensures the sample size of the sampling distribution is appropriate for inference (center, spread, shape) how to meet condition for proportion: (n)(phat) > 10 and (n)(1-phat) > 10 how to meet condition for means: n > 30 (central limit theorem) (t-test only) graph of sample shows no obvious skew or outliers, meaning it is normal IF POPULATION HAS AN approximately normal distribution, this condition can be considered 'met' REGARDLESS of sample size
Conditions: Independent
-how to meet condition for proportion and mean (samples and observational studies) : n< 10% of the population this ensures that the sample size is independent
Conditions: random
-how to meet condition for proportion and mean: problem will say if the sample is randomly selected or not - this ensures that the sample accounts for variability and is a good equal representation for the population
how can power be increased?
-increasing alpha= makes it easier to declare the evidence convincing -increase sample size -use a better experimental design (stratifying or blocking possibly)
how to analyze/ find center in a distribution:
-mean (add up all data points/ # of data points) mean is NOT resistant to the effect of outliers -median (the middle of the data, can median can be more closely analyzed with IQR) median IS RESISTANT to the effects of outliers -it is best to analyze with mean unless the data is skewed, at which point median should be used
for continuous random variables what is the probability of getting exactly one given outcome
0
when testing a claim about population mean and the population standard deviation is NOT known for a significance test use the formula...
1 sample t test formula: t= (xbar-population mean) / (sample standard deviation / square root of n)
when testing a claim about a population mean and the population standard deviation is known for a significance test use the formula...
1 sample z test for mean formula: z= (xbar-Meano)/ (standard deviation / square root of n)
when testing a claim about a population proportion for a significance test use the formula...
1 sample z test for proportion in calculator: stat: tests: 1 PropZtest formula: z= (phat-po) / (square root of (po)(1-po)/n)
how to find t*
1- confidence level/ 2= a InvT: area= a df= n-1 calculate = t*
how to find Z*
1-confidence level/ 2 = a InvNorm: area= a mean=0 standard deviation= 1 calculate = z*
the expected value (mean) of a geometric random variable
1/p
when estimating the difference between two population means and the population standard deviations are NOT know for a confidence interval use the formula...
2 sample t interval for difference in population mean (xbar1-xbar2) +/- t* (square root 1ST sample standard deviation^2 / n1 + 2ND sample standard deviation ^2 / n2) on calc: stat: test: 2 Samp T Test
when testing a claim about the difference between two population means and the population standard deviations are not know for a significance test use the formula...
2 sample t test calculator: 2 sample t test (xbar1-xbar2)-(pop mean1- pop mean2) / (square root of 1STsample standard deviation^2/ n1 + 2NDsample standard deviation^2/ n2)
when testing a claim about the difference between two population means and the population standard deviations are known for a significant test use the formula..
2 sample t test in calculator: 2 samp Ttest (xbar1-xbar2) -(Pop Mean1- Pop mean2) / (square root of pop standard deviation^2 / n1 + pop standard deviation ^2 / n2)
when estimating the difference between two population proportions for a confidence interval use this formula
2 sample z interval for difference in sample proportions (phat1-phat2) +/- z* (square root of (p1)(1-p1)/ (n1) + (p2)(1-p2)/ (n2) ) on calculator: stat: test: 2-PropZtest
when testing a claim about the difference between two population proportions for a significance test use the formula...
2 sample z test in calculator: 2 prop z test formula: YOU HAVE TO POOL Pc= X1+X2/ N1+N2 (phat1-phat2)-0 / (square root of Pc(1-Pc) / n1 + Pc(1-Pc) / n2)
when estimating the difference between two population means and the population standard deviation is known for a confidence interval use the forumla...
2 sample z test for difference in sample means (xbar1 - xbar2) +/- z* (square root of FIRST standard deviation^2/ n1 + 2ND standard deviation^2/ n2 ) on calculator stat: test: 2-Samp Z Test
when do we use Chi-squared tests? what do chi-square tests allow us to measure?
A chi-square tests are for categorical data, they measure and compare categorical data
Simple Random Simple (SRS)
A sample size of (n) is chosen in such a way (random assignment) that every group of individuals in the population has an equal chance to be selected in the sample -a randINT can be used on the calc for random sampling, numbers/names in a hat, using table D, etc -Pros: fair, not too hard with adequate data on respondants -cons: difficult if you do not already have population information, can cause undercoverage
what is an event in probability
Any collection of outcomes from some chance process -a subset of a sample space ex: flipping a coin twice {HH, HT, TH, TT}- 4 events (possible combinations of flipping a coin twice) inside the brackets -p(at least one heads) = 3/4, because 3 out of 4 events included flipping a H -probability model: each event has a 1/4 probability of happening
Using Binomial distributions: BINS
B: Binary- the possible outcomes are classified as "success" or "failure" I: Independent- knowing results of one trial does not help you predict another N: Number- the number of trails in of the chance process is set in advance S: same- same probability of success on each trial
For a linear regression, the confidence interval statistic becomes
CI= b+/- t*(SEb) SE Coef is found in calculator output (little box with numbers in it) t* is in calculator
DOFS
D-direction (positive, negative, no association: r-value tells us about the association between x and y) O-outliers (may alter the equation or the regression line or line of best fit) F-Form/shape (linear, non-linear, curved, clusters: tells us if correlation is linear or not) S-strength (weak, moderate, strong correlation: measures the strength of correlation)
what is the central limit theorem
Draw a simple random sample from any problem with mean and standard deviation, when sample size is LARGE, the sampling distribution of the sample mean is approximately normal - x > 30
how to handle curved data (linear transformations): exponential regression
EQUATION FORMAT: Yhat= ab^x which means logy= log(a)+xlog(B) OR Iny=In(a)+xIn(b) TRANSFORMATIONS: x ---> x, y -----> logy/Iny
how to handle curved data (linear transformations): logarithmic Regression
EQUATION FORMAT: y=ax^p which means logy=log(a)+plog(x) OR Iny=In(a)+ pIn(x) TRANSFORMATIONS: x--> logx/Inx, y---> logy/Iny
what is the alternative hypothesis and what are the three different types of hypotheses you could have? (answer is slightly different for 1 sample and 2 sample tests)
Ha: what we are trying to find evidence for 3 different types: ha >, ha <, ha does not equal 1 sample: sample NOT the same as proportion 2 sample: sample not the same as other sample
significance tests for linear regression: hypotheses are
Ho: assume there is no difference between the variables (this means slope=0) Ha: can be <,>, or not equal
what is a null hypothesis (Ho) and what does the null hypothesis always assume to be true (answer slightly different for 1 sample and 2 sample tests)
Ho: the claim we weigh evidence against, ALWAYS assumes there is no difference 1 sample: Ho assuming no difference between sample and proportion 2 sample: Ho assuming no difference between two samples
what are null and alternative hypotheses for chi-square test of homogeneity?
Ho: there is no difference in the true distribution of (blank) and (blank) ha: there is a difference in true distributions of (blank) and (blank)
conditions for significance tests and confidence intervals for linear regression: LINEAR
L: LINEAR- scatterplot is approximately linear, no curve, no pattern in the residuals I: INDEPENDENT- (use n<10% of pop) N: NORMAL- check for skew/ outliers and patterns in residual plots E: EQUAL SD- no pattern in residuals and R: RANDOM-
Use the calculator LinRegInterval for.....
Least-square regression line in a confidence interval
If X has a binomial distribution with parameters of n and p, then use the formulas...
MEAN OF A SAMPLE PROPORTION (phat)= population proportion STANDARD DEVIATION OF A SAMPLE PROPORTION: square root of population proportion( 1- population proportion) / sample size
If sample mean (xbar) is the mean of a random sample of size n from an infinite population with a mean and standard deviation then use the formulas....
MEAN OF A SAMPLE: population mean STANDARD DEVIATION OF A SAMPLE: standard deviation/ square root of sample size
Voluntary response bias
Only ask for voluntaries to participate in survey, causing bias ex: doing a survey about if the school should spend money on a football team or a dance team, and anyone who voluntaries can take it, well the survey might become bias because maybe football players, or dance members or people who hate one of the two voluntary, this could cause the estimate to be under or overestimated
how do you calculate the conditional probability of a given event? what is the formula?
P( A given B) = P(A and B)/ P(B)
what formula do you use when calculating the probability of getting more than one outcome for a given event? this is an OR STATEMENT
P( A or B) = P(A)+ P(B) - P(A and B)
what formula do you use when calculating the probability of multiple events all happening
P(A and B) = P(A) x P(B) IF A and B ARE INDEPENDENT IF THEY ARE NOT INDEPENDENT USE P(A and B)= P(A) x P(B given A) P(A and B)= P(B) x P(A given B)
what is conditional probability?
P(A given B) = probability of event occurring/ probability of given
GeometricCDF is used for...
P(x < K), the probability that the first success will happen before or at the Kth trial
BiomalCDF finds...
P(x < k) (n,p,k)
1- geometricCDF is used for...
P(x > K)
1-binomalcdf finds...
P(x > k)
BinomalPDF finds...
P(x=k) (n,p,k) n= # of trails k= # of success p= probability
GeometricPDF is used for...
P(x=k) , the probability that the first success will happen at the Kth trial
Interquartile range
Q3-Q1= IQR it is an outlier if it is SMALLER than Q1-(1.5)(IQR) it is an outlier if it is BIGGER than Q3 + (1.5)(IQR)
How to design a method...
Random number generator is your friend! 1) assign each (unit, subject, etc) to a different number between (blank) and (blank) 2) describe how you will implement the sampling method you want to use 3) Randomly select (blank) numbers, ignoring repeats, and include the (unit,subject, etc) that corresponds with those numbers in your sample
what is a sample space in probability
S of a chance process, this is a set of all possible outcomes ex: flipping a coin ONCE { H T}= between brackets represents sample space, flipping a coin once means u can only get a heads or tails
SOCS (use when analyzing a distribution of data)
S: shape O: outliers C: center S: spread
what is the four-step process of statistical inference for a confidence interval
STATE: we will use a (blank) interval to estimate, with (blank) % confidence the true (mean/proportion) of (context). PLAN: type of interval, check conditions: Large counts/ sample size, random, and independence DO: conduct type of interval, find interval CONCLUDE: we are (blank)% confident that the interval from minimum to maximum contains the true parameter in context
what are independent events
Two events (A and B) are independent if the occurrence of one event does not change the probability that the other event will happen -they can happen at the same time, but knowing the outcome of one will not help you predict the outcome of the other P(A given B)= P(A) P(B given A) =P(B) means A and B are independent
what to say in STATE for a confidence interval:
We will use a (blank) interval to estimate, with (blank) % confidence, the true (mean/proportion) of (context)
Just because X and Y are correlated that doesn't mean...
X cause Y BECAUSE correlation does not imply causation
Matched pairs design
a common type of randomized block design for comparing two treatments, each block consists of a matching pair, or similar experimental units chance is used to determine which unit in the pair gets the treatment - sometimes- single experimental unit gets both treatments, so you are comparing the unit to itself, accounting for more variability pro: can compare a unit to itself con: order of treatment can sometimes affect results
Geometric distribution is
a density curve that allows us to determine how many trails it will take to get a success, also think of it as no set trials
what is a probability model
a description of some chance process that consists of two parts: 1) a sample space 2) a probability for each outcome
bias
a design of a study is considered this if it constantly under or overestimates the value you want -it can be anything in a study that causes a sample to not be representative of the population of interest
sampling bias
a design that constantly under or overestimates the value -a non random sample, all individuals or subjects were not equally likely to be selected for the sample, causing bias
confounding
a factor other than the independent variable that might produce an affect in an experiment (pretty much all factors can confound experiments unless controlled)
what is sample distribution
a graph of data taken from one sample
treatment
a specific condition applied to the individuals in an experiment -if an experiment has several explanatory variables, this is a combination of specific values of these variables
when analyzing a series of multiple events, each with multiple possible outcomes, what visual aide will be helpful
a two way table
when and why do you use a chi-square test for homogeneity?
comparing distributions of categorical variables in 2 or more populations -observed distribution to observed distribution (are we brother/sister) use X^2-Test in calculator (observed in L1 and expected values in L2)
4 principles of a good experiment: Comparison
use a design that compares two or more treatments -control group vs treatment group -placebo vs drug
what variables are used to represent the probability that Type I error and Type II error, respectively will happen
alpha= P(type one error) Fancy B= P(type two error)
if no alpha for a significant test is given use...
alpha=0.05
completely randomized design
an experimental design that uses random sampling and random assignment pros: should reduce bias by a lot con: possibility of sampling variability because not accounting for any variables
probability situation: multiple outcomes, mutually exclusive
answer: add probabilities P(A U B)= P(A) + P(B) also means P(A or B)= P(A) + P(B) REMEMBER P(A and B)= 0 for mutually exclusive events, they cannot happen at the same time
probability situation: multiple outcomes- NOT mutually exclusive
answer: add probabilities but subtract the overlap (if using venn diagram just add up the 3 sections in the diagram) formula: P(A U B) = P(A) + P(B) - P(A and B)
probability situation: multiple events- dependent
answer: multiply probabilities, account for the change in probability with each trail, account for combinations (nCr) formula: nCr (n over k) x Pevent1 x Pevent2 x Pevent3 .....etc, etc REMEMBER THESE PROBABILITIES CHANGE
probability situation: multiple events- independent
answer: multiply probabilities, and account for COMBINATIONS in which these events can occur (nCr) formula: nCr (n over k) x (Psuccess)^#of successess x (Pfail)^# of fails
probability situation: conditional probability (A given B)
answer: probability of both events/ probability of first event P(A I B) = P(A and B)/ P( B) or P(B I A)= P(A and B)/ P(A)
probability situation: probably of "at least one"
answer: the opposite of "none" 1-P(0)
In an Experimental study you...
apply a treatment
4 principles of a good experiment: Random Assignment
assigning participants to experimental and control groups by chance, this helps create roughly equivalent groups of experimental units by balancing the effects of other variables (can be done with random number generator, names in hats, etc)
how do you interpret a p-value, what does the p-value mean?
assuming Ho (null hypothesis) in context, there is a (blank-whatever p-value is) probability of getting a sample proportion that is not the Ho by chance.
how do you interpret the results of a test for which the p-value is greater than alpha?
because the p-value is greater than the significance level (0.05), we fail to reject the Ho. we do not have enough convincing evidence in support of Ha in context.
how do you interpret the results of a test for which the p-value is less than alpha?
because the p-value is less than the significance level (0.05), we can reject the Ho. there is convincing evidence in support of Ha in context.
why can two events that are mutually exclusive NEVER be independent?
because you knowing something about event A, when event A and B are mutually exclusive, means the occurrence of A helps you know B cannot happen because if they are mutually exclusive meaning both of then cannot happen at the same time, so they cannot be independent.
how to calculate df for Chi-square test of GOF
df= # of categories- 1
how to calculate df for chi-square test of homogeneity and independence
df= (# of rows-1)(# of columns-1)
df for a linear regression confidence interval is
df= n-2
how do we calculate the degrees of freedom of a t-distribution?
df=n-1
definition of the margin of error
distance between the sample statistic and the ends of the confidence interval
In an Observational study you...
do not apply treatment, you use a survey or observe
power increases when...
type one error increases, and type two error decreases
transforming and combing a random variable changes variable distribution: multiplying/ dividing by a CONSTANT number
effect on center/ mean: doubles or divides with multiplying or dividing on data effect of spread: spread changes with multiplying or dividing
transforming and combing a random variable changes variable distribution: adding/ subtracting a CONSTANT number
effect on center/ mean: goes up or down with adding or subtracting on data effect on spread: NONE
transforming and combing a random variable changes variable distribution: combining (adding or subtracting two random variables to each other)
effect on center: mean= meanx +meany effect on spread: standard deviation^2= St.dev^2x + st.dev^2y
what are mutually exclusive outcomes
events that cannot occur at the same time and have no outcomes in common P(event A and event B)= 0 ex: possibility of you passing and failing the test at the same time
how do you calculate the expected value of a discrete random variable
expected value (x)= (xipi) ex: 45(.34) + 44(.33) etc etc
How do we interpret a confidence LEVEL?
for example= 95% confidence level If we were to take many samples of sample size n and calculate many intervals, about 95% of the interval will capture the parameter (in context)
use InvNorm when...
given percentage or probability
if dealing with t-distribution and your sample size is NOT 30 or more, what other method can you use to catch for normality?
graph the data and look for outliers or skew to determine if normal or not
control group
group in experiment that does not receive the treatment, used for comparison
cluster sample
grouping based on location of individuals Pros: convenient while somewhat precise cons: not as precise as SRS or stratas
what information does the residual plot give you?
helps determine if data is linear and appropriate to be modeled by a least square regression line
what does a confidence interval allow us to do?
helps us narrow down our options for what ever we are trying to find, they allow us to take a statistic and find the true parameter in an interval
what are the null and alternative hypotheses for chi-square test of goodness of fit?
ho: states a claim about single variable in the population of interest ha: states that the categorical variable does not have the claimed distribution
what are the null and alternative hypotheses for the chi-square test of independence?
ho: there is no association between (blank) and (blank) in population of (blank) ha: there is an association between (blank) and (blank) in the population of (blank)
what is a residual?
how far the data is from the least square regression line -the differences between an observed value and the response variable
sampling error (margin of error)
how far we can expect the estimate to be from the true value ex: a measure of the accuracy of an opinion poll -measures variability
what is the margin of error
how far we can expect the sample statistic to vary from the population parameter
when to use nCr...
in calc, type in sample size of your data (n) then go to math: prb: scroll to nCr, click it: then type in your number of successes (k) use when you have multiple events that are independent
when making a claim about a study or experiment that utilizes matched pairs for a significance test use the formula...
in calculator: T test -use when there is a difference in means and data has to be in a certain order or it becomes messed up
where is the point estimate of a confidence interval
in the middle
what is sampling distribution
it describes all possible values of a statistic and how often they happen, a graph of statistics taken from multiple samples
4 principles of a good experiment: Control
keep other variables that might affect the response variable constant in all groups ex: if you are testing a new type of hair conditioner, make sure the subjects all use the same shampoo, shower the same amount of times, have a similar type of hair, make sure the new conditioner is the only thing that could change the subjects hair and not another variable
how to find range
largest data point- smallest data point -range is not resilient to the effect of outliers (use with median)
expected value
long term average based on all the probabilities mean value
use Normalcdf when...
looking for percentage/ probability
explanatory variable (x):
may help explain/ predict changes in response variable, dependent variable
Binomial curve: CENTER can be found with
mean=np number of trails x probability of success= expected # of success
response variable (y):
measures the outcome of a study, independent variable
what are the two type of hypotheses used in confidence intervals?
null hypothesis (Ho) alternative hypothesis (Ha)
Non response bias
occurs when an individual chosen for the sample cannot be contacted or refuses to participate ex: if a school sends out a survey to teachers about vacation days and only 10 out of 30 teachers reply then because of the 20 teachers who did not reply there might be a non- response bias that causes the estimate to be under or over estimated
Under-coverage bias
occurs when members of the population cannot be chosen in a sample (you cannot gain responses from prisoners, the homeless, children, so you really cannot survey the entire population) ex: if you are surveying people's opinions about race affecting the criminal justice system and you are unable to survey prisoners of color in jail about their opinions, this is under-coverage which might effect if your estimate is over or underestimated
when estimating a population mean and the population standard deviation is NOT known for a confidence interval use the formula...
one sample t interval xbar +/- t* (sample standard deviation/ square root of n) on calculator stat: test: T interval
when estimating a population mean and the population standard deviation is known for a confidence interval use this formula....
one sample z interval for sample mean xbar +/- z*(population standard deviation/ square root of n) on calculator: stat: test: Z interval
when estimating a population proportion for a confidence interval use this formula...
one sample z interval for sample proportion phat +/- z* (square root of phat(1-phat) / n) on calculator: stat: tests: 1-PropZInt
type one and type two error always go in the (blank) direction
opposite
when running a chi-square test what 3 things must you report?
p-value test used df
after running a t significance test what are three things you should report....
p-value found with formula reject/ fail to reject ho df
after running a z significance test what are two things you should report...
p-value found with formula reject/ fail to reject ho
if a sample proportion is not given, assume...
p= 0.05, assumes greatest margin of error
blind study
patients (experimental units) do not know which group they are in (control group or experimental group)
placebo effect
perceived effect on a subject when in reality they did not receive the treatment
what is a parameter
population mean, standard deviation, and proportion describes the population
how is power calculated?
power= 1- P(type II error) =1- fancy B
what is the r^2 value (coefficient of determination)
r^2= 1- summation of (yi- yhat)^2 / summation of (yi -yhat)^2 -difference in the residuals that has been accounted for by the least squares regression line, how much data the least squares regression line covers, usually a percent
conditions for chi-square tests:
random independent/ 10%: n < 10% of population large counts- all expected counts must be larger than 5, find all the expected counts and check
4 principles of a good experiment: Replication
repeat an experiment over and over to make sure response variable is constant -use enough experimental units in each group that any differences in results/ effects of the treatment can be distinguished from chance differences between the groups
double-blind study
researchers does not know which group is control or experimental -prevents bias in measuring results
how do you calculate a residual?
residual= observed y- predicted y = y- yhat (plug the x-value into the least square regression line and then subtract it from the y-value)
for homogeneity and independence chi-square tests what formula do you use to calculate expected value?
row total x column total / table total
type one error and power always go in the (blank) direction
same
what is a statistic
sample mean, standard deviation, proportion describes the samples
Convenience sample
sampled based on convenience (subjects availability, reach, etc) pros: easy and quick cons: bad bad results
Randomized block design ("blocking)
separating experimental units into groups (blocks) based off something they have in common and randomly dividing the members of each block to each treatment so you are comparing each block to the other subjects in its own block pro: accounts for variation in experimental units con: you've influenced it more than a completely randomized design by accounting more for variability
geometric distribution shape is always
skewed right as you continue the probability of having a success gets less and less
if you adjust sample sizes of a confidence interval it changes by the...
square root of the amount
Binomial curve: SPREAD can be found with
standard devation= square root of np(1-p)
how to calculate the standard deviation of binomial curve
standard deviation= square root of (n)(p)(1-p)
what formula can you use to calculate the spread (standard deviation) of a discrete random variable by hand?
standard deviation= square root of (x-mean)^2 x P(x=x) standard deviation= square root of st. dev^2
formula for confidence interval:
statistic +/- critical value x standard deviation of the statistic statistic: mean or proportion critical value: z* or t* standard deviation: use formula sheet, sampling distribution is also called standard error
sample
subgroup of individuals that we want to take data from, represent the population -measured with a survey, experimenting etc -this is used instead of population more often because of convenience, money, time
loaded questions
systematic pattern of inaccurate answers in a survey caused by poor wording of questions, order of questions, anything that can create a bias in the subjects response
significance test for linear regression test statistic is
t= b-slope/ SEb can be found in calculator output
when and why do you use a chi-square test for independence?
testing if variables have an association or not, if they have no association they are independent but if they have an association they are not independent (are we related to each other?) -1 sample, 2 variables use X^2-test in calculator (put observed in L1 and expected values in L2)
what is a z-score
the a mount of standard deviations away a data point is from the mean -ONLY works with normally distributed data
what happens to the margin of error (thus the WIDTH of the confidence interval) if we decrease the confidence level
the decreased confidence level increases the margin of error
what happens to the margin of error (thus the WIDTH of the confidence interval) if we decrease the sample size
the decreased sample size will increase the margin of error
population
the entire group of individuals we want information about -measured with a census
generalizability
the extent to which the results of a sample (or experimental group) can be applied to a certain population
what happens to the margin of error (thus the WIDTH of the confidence interval) if we increase the confidence level
the increased confidence level with decrease the margin of error
what happens to the margin of error (thus the WIDTH of the confidence interval) if we increase the sample size
the increases sample size decreases the margin of error
experimental units (subjects when human)
the physical entities which can be assigned, at random, to a treatment
how do you interpret slope for the least-square regression line
the predicted increase/decrease in (slope) is (blank) per (blank) increase in (x)
what is the definition of power?
the probability that the test will reject Ho at a chosen significance level when the Ha is true
how can a small sample size affect the validity of the sample? (this is related to sampling error)
the value would be less precise with a greater margin of error, we can expect the estimate from this smaller sample to be further from the true value than a larger sample would be
what is a discrete random variable
there a gaps between the numbers, single digits are this random variable 1, 2, 3, 4, 5, ......
Type two error and power always go in the (blank) direction
this error and power always go in opposite directions
what is a type II error?
this error occurs when you fail to reject the Ho, but in reality you should have rejected the Ho
what is a type I error?
this error occurs when you reject the Ho, but in reality you should have failed to reject the Ho
what is a proportion
this is a section of the population or sample
what is a mean
this is the average of the data in that population or sample
what is a continuous random variable
this random variable can take any number of values, no gaps between the numbers 1.01, 1.02, 1.03.....45.847923874 etc
power decreases when...
type one error decreases, and type 2 error increases
what do you say in STATE for a significance test
we will use a (blank) test to test the following hypotheses (Ho vs Ha) at the (blank)=alpha level Ho= no change/ no difference Ha= there is change/ there is a difference
false answers
when people do not tell the truth on surveys causing the overall results to be under or over estimated
when do we use a t-distribution?
when we have to use s (sample standard deviation) to estimate population standard deviation
what is the Law of Large numbers
when we observe more and more repetitions of any chance process, the proportion of times that specific outcome occurs begins to approach a single value
when and why do you use a Chi-square goodness of fit?
when you need to compare the observed distribution to a hypothesized (expected) one in 2 or more populations expected counts= (observed-expected)^2 / expected -sample distribution to population distribution ( are you my son/ daughter?) use X^2-GOF-test in calculator
z-score equation
x-mean/ standard deviation
what is the formula for the least-square regression line (line of best fit)
yhat=a + bx a= y-intercept b= slope yhat= predicted value of y for any given value of x
binomial curve: SHAPE approaches normality if...
you can expect at least 5 success and 5 failures
how to calculate margin of error for confidence interval
z*(square root of phat(1-phat) / n)