STATS FINAL SEM TWO MAJOR STUDY SET

¡Supera tus tareas y exámenes ahora con Quizwiz!

how to analyze spread in a distribution:

use standard deviation (with mean) or variance, lower and upper Quartile (with median)

what is extrapolation

using the data and least square regression line to look at predicted values outside the range of the given least square regression line

how are variance and standard deviation related?

variance is the standard deviation square rooted Sx= square root of 1/ n-1 (x1- xbar)^2

when analyzing events with multiple outcomes, what visual aide will be the most beneficial?

ven diagram

how do we interpret a confidence INTERVAL?

we are # % confident that the interval from minimum to maximum contains the true parameter (in context)

how to analyze shape in a distribution:

Normal distribution shape: median=mean Skewed left shape: median > mean skewed right shape: mean > median uniform shape: mean = median Bimodal: mean = median

to analyze a normal distribution, and find a probability of a sample statistic occurring, given an assumed population mean and standard deviation with the function in the calculator.....

Normalacdf

finding p-values using t-distributions....

2nd vars (distribution): 6: tcdf: lower, upper, df

empirical rule

68-95-99.7 for normal distributions

how to calculate mean of binomial curve

(n)(p)= mean

chi-square test statistic is....(and it is the same for alllll tests)

(observed-expected)^2 / expected

Stratified Random Sample

(only for observational study) individuals are classified into groups by shared characteristic (strata) , then sampled within the groups -Pros: balances representation of each type of individual, improves results because of equal representation, reduces variability of possible sample results -Cons: complex, resource demanding

how do you calculate the probability of an outcome

(p)= # of corresponding outcomes to event A/ total # of outcomes in sample space

how do you interpret the y-intercept for the least-square regression line

(x) is 0 when y-intercept is (blank)

Conditions: Large counts (sample size)

-ensures the sample size of the sampling distribution is appropriate for inference (center, spread, shape) how to meet condition for proportion: (n)(phat) > 10 and (n)(1-phat) > 10 how to meet condition for means: n > 30 (central limit theorem) (t-test only) graph of sample shows no obvious skew or outliers, meaning it is normal IF POPULATION HAS AN approximately normal distribution, this condition can be considered 'met' REGARDLESS of sample size

Conditions: Independent

-how to meet condition for proportion and mean (samples and observational studies) : n< 10% of the population this ensures that the sample size is independent

Conditions: random

-how to meet condition for proportion and mean: problem will say if the sample is randomly selected or not - this ensures that the sample accounts for variability and is a good equal representation for the population

how can power be increased?

-increasing alpha= makes it easier to declare the evidence convincing -increase sample size -use a better experimental design (stratifying or blocking possibly)

how to analyze/ find center in a distribution:

-mean (add up all data points/ # of data points) mean is NOT resistant to the effect of outliers -median (the middle of the data, can median can be more closely analyzed with IQR) median IS RESISTANT to the effects of outliers -it is best to analyze with mean unless the data is skewed, at which point median should be used

for continuous random variables what is the probability of getting exactly one given outcome

0

when testing a claim about population mean and the population standard deviation is NOT known for a significance test use the formula...

1 sample t test formula: t= (xbar-population mean) / (sample standard deviation / square root of n)

when testing a claim about a population mean and the population standard deviation is known for a significance test use the formula...

1 sample z test for mean formula: z= (xbar-Meano)/ (standard deviation / square root of n)

when testing a claim about a population proportion for a significance test use the formula...

1 sample z test for proportion in calculator: stat: tests: 1 PropZtest formula: z= (phat-po) / (square root of (po)(1-po)/n)

how to find t*

1- confidence level/ 2= a InvT: area= a df= n-1 calculate = t*

how to find Z*

1-confidence level/ 2 = a InvNorm: area= a mean=0 standard deviation= 1 calculate = z*

the expected value (mean) of a geometric random variable

1/p

when estimating the difference between two population means and the population standard deviations are NOT know for a confidence interval use the formula...

2 sample t interval for difference in population mean (xbar1-xbar2) +/- t* (square root 1ST sample standard deviation^2 / n1 + 2ND sample standard deviation ^2 / n2) on calc: stat: test: 2 Samp T Test

when testing a claim about the difference between two population means and the population standard deviations are not know for a significance test use the formula...

2 sample t test calculator: 2 sample t test (xbar1-xbar2)-(pop mean1- pop mean2) / (square root of 1STsample standard deviation^2/ n1 + 2NDsample standard deviation^2/ n2)

when testing a claim about the difference between two population means and the population standard deviations are known for a significant test use the formula..

2 sample t test in calculator: 2 samp Ttest (xbar1-xbar2) -(Pop Mean1- Pop mean2) / (square root of pop standard deviation^2 / n1 + pop standard deviation ^2 / n2)

when estimating the difference between two population proportions for a confidence interval use this formula

2 sample z interval for difference in sample proportions (phat1-phat2) +/- z* (square root of (p1)(1-p1)/ (n1) + (p2)(1-p2)/ (n2) ) on calculator: stat: test: 2-PropZtest

when testing a claim about the difference between two population proportions for a significance test use the formula...

2 sample z test in calculator: 2 prop z test formula: YOU HAVE TO POOL Pc= X1+X2/ N1+N2 (phat1-phat2)-0 / (square root of Pc(1-Pc) / n1 + Pc(1-Pc) / n2)

when estimating the difference between two population means and the population standard deviation is known for a confidence interval use the forumla...

2 sample z test for difference in sample means (xbar1 - xbar2) +/- z* (square root of FIRST standard deviation^2/ n1 + 2ND standard deviation^2/ n2 ) on calculator stat: test: 2-Samp Z Test

when do we use Chi-squared tests? what do chi-square tests allow us to measure?

A chi-square tests are for categorical data, they measure and compare categorical data

Simple Random Simple (SRS)

A sample size of (n) is chosen in such a way (random assignment) that every group of individuals in the population has an equal chance to be selected in the sample -a randINT can be used on the calc for random sampling, numbers/names in a hat, using table D, etc -Pros: fair, not too hard with adequate data on respondants -cons: difficult if you do not already have population information, can cause undercoverage

what is an event in probability

Any collection of outcomes from some chance process -a subset of a sample space ex: flipping a coin twice {HH, HT, TH, TT}- 4 events (possible combinations of flipping a coin twice) inside the brackets -p(at least one heads) = 3/4, because 3 out of 4 events included flipping a H -probability model: each event has a 1/4 probability of happening

Using Binomial distributions: BINS

B: Binary- the possible outcomes are classified as "success" or "failure" I: Independent- knowing results of one trial does not help you predict another N: Number- the number of trails in of the chance process is set in advance S: same- same probability of success on each trial

For a linear regression, the confidence interval statistic becomes

CI= b+/- t*(SEb) SE Coef is found in calculator output (little box with numbers in it) t* is in calculator

DOFS

D-direction (positive, negative, no association: r-value tells us about the association between x and y) O-outliers (may alter the equation or the regression line or line of best fit) F-Form/shape (linear, non-linear, curved, clusters: tells us if correlation is linear or not) S-strength (weak, moderate, strong correlation: measures the strength of correlation)

what is the central limit theorem

Draw a simple random sample from any problem with mean and standard deviation, when sample size is LARGE, the sampling distribution of the sample mean is approximately normal - x > 30

how to handle curved data (linear transformations): exponential regression

EQUATION FORMAT: Yhat= ab^x which means logy= log(a)+xlog(B) OR Iny=In(a)+xIn(b) TRANSFORMATIONS: x ---> x, y -----> logy/Iny

how to handle curved data (linear transformations): logarithmic Regression

EQUATION FORMAT: y=ax^p which means logy=log(a)+plog(x) OR Iny=In(a)+ pIn(x) TRANSFORMATIONS: x--> logx/Inx, y---> logy/Iny

what is the alternative hypothesis and what are the three different types of hypotheses you could have? (answer is slightly different for 1 sample and 2 sample tests)

Ha: what we are trying to find evidence for 3 different types: ha >, ha <, ha does not equal 1 sample: sample NOT the same as proportion 2 sample: sample not the same as other sample

significance tests for linear regression: hypotheses are

Ho: assume there is no difference between the variables (this means slope=0) Ha: can be <,>, or not equal

what is a null hypothesis (Ho) and what does the null hypothesis always assume to be true (answer slightly different for 1 sample and 2 sample tests)

Ho: the claim we weigh evidence against, ALWAYS assumes there is no difference 1 sample: Ho assuming no difference between sample and proportion 2 sample: Ho assuming no difference between two samples

what are null and alternative hypotheses for chi-square test of homogeneity?

Ho: there is no difference in the true distribution of (blank) and (blank) ha: there is a difference in true distributions of (blank) and (blank)

conditions for significance tests and confidence intervals for linear regression: LINEAR

L: LINEAR- scatterplot is approximately linear, no curve, no pattern in the residuals I: INDEPENDENT- (use n<10% of pop) N: NORMAL- check for skew/ outliers and patterns in residual plots E: EQUAL SD- no pattern in residuals and R: RANDOM-

Use the calculator LinRegInterval for.....

Least-square regression line in a confidence interval

If X has a binomial distribution with parameters of n and p, then use the formulas...

MEAN OF A SAMPLE PROPORTION (phat)= population proportion STANDARD DEVIATION OF A SAMPLE PROPORTION: square root of population proportion( 1- population proportion) / sample size

If sample mean (xbar) is the mean of a random sample of size n from an infinite population with a mean and standard deviation then use the formulas....

MEAN OF A SAMPLE: population mean STANDARD DEVIATION OF A SAMPLE: standard deviation/ square root of sample size

Voluntary response bias

Only ask for voluntaries to participate in survey, causing bias ex: doing a survey about if the school should spend money on a football team or a dance team, and anyone who voluntaries can take it, well the survey might become bias because maybe football players, or dance members or people who hate one of the two voluntary, this could cause the estimate to be under or overestimated

how do you calculate the conditional probability of a given event? what is the formula?

P( A given B) = P(A and B)/ P(B)

what formula do you use when calculating the probability of getting more than one outcome for a given event? this is an OR STATEMENT

P( A or B) = P(A)+ P(B) - P(A and B)

what formula do you use when calculating the probability of multiple events all happening

P(A and B) = P(A) x P(B) IF A and B ARE INDEPENDENT IF THEY ARE NOT INDEPENDENT USE P(A and B)= P(A) x P(B given A) P(A and B)= P(B) x P(A given B)

what is conditional probability?

P(A given B) = probability of event occurring/ probability of given

GeometricCDF is used for...

P(x < K), the probability that the first success will happen before or at the Kth trial

BiomalCDF finds...

P(x < k) (n,p,k)

1- geometricCDF is used for...

P(x > K)

1-binomalcdf finds...

P(x > k)

BinomalPDF finds...

P(x=k) (n,p,k) n= # of trails k= # of success p= probability

GeometricPDF is used for...

P(x=k) , the probability that the first success will happen at the Kth trial

Interquartile range

Q3-Q1= IQR it is an outlier if it is SMALLER than Q1-(1.5)(IQR) it is an outlier if it is BIGGER than Q3 + (1.5)(IQR)

How to design a method...

Random number generator is your friend! 1) assign each (unit, subject, etc) to a different number between (blank) and (blank) 2) describe how you will implement the sampling method you want to use 3) Randomly select (blank) numbers, ignoring repeats, and include the (unit,subject, etc) that corresponds with those numbers in your sample

what is a sample space in probability

S of a chance process, this is a set of all possible outcomes ex: flipping a coin ONCE { H T}= between brackets represents sample space, flipping a coin once means u can only get a heads or tails

SOCS (use when analyzing a distribution of data)

S: shape O: outliers C: center S: spread

what is the four-step process of statistical inference for a confidence interval

STATE: we will use a (blank) interval to estimate, with (blank) % confidence the true (mean/proportion) of (context). PLAN: type of interval, check conditions: Large counts/ sample size, random, and independence DO: conduct type of interval, find interval CONCLUDE: we are (blank)% confident that the interval from minimum to maximum contains the true parameter in context

what are independent events

Two events (A and B) are independent if the occurrence of one event does not change the probability that the other event will happen -they can happen at the same time, but knowing the outcome of one will not help you predict the outcome of the other P(A given B)= P(A) P(B given A) =P(B) means A and B are independent

what to say in STATE for a confidence interval:

We will use a (blank) interval to estimate, with (blank) % confidence, the true (mean/proportion) of (context)

Just because X and Y are correlated that doesn't mean...

X cause Y BECAUSE correlation does not imply causation

Matched pairs design

a common type of randomized block design for comparing two treatments, each block consists of a matching pair, or similar experimental units chance is used to determine which unit in the pair gets the treatment - sometimes- single experimental unit gets both treatments, so you are comparing the unit to itself, accounting for more variability pro: can compare a unit to itself con: order of treatment can sometimes affect results

Geometric distribution is

a density curve that allows us to determine how many trails it will take to get a success, also think of it as no set trials

what is a probability model

a description of some chance process that consists of two parts: 1) a sample space 2) a probability for each outcome

bias

a design of a study is considered this if it constantly under or overestimates the value you want -it can be anything in a study that causes a sample to not be representative of the population of interest

sampling bias

a design that constantly under or overestimates the value -a non random sample, all individuals or subjects were not equally likely to be selected for the sample, causing bias

confounding

a factor other than the independent variable that might produce an affect in an experiment (pretty much all factors can confound experiments unless controlled)

what is sample distribution

a graph of data taken from one sample

treatment

a specific condition applied to the individuals in an experiment -if an experiment has several explanatory variables, this is a combination of specific values of these variables

when analyzing a series of multiple events, each with multiple possible outcomes, what visual aide will be helpful

a two way table

when and why do you use a chi-square test for homogeneity?

comparing distributions of categorical variables in 2 or more populations -observed distribution to observed distribution (are we brother/sister) use X^2-Test in calculator (observed in L1 and expected values in L2)

4 principles of a good experiment: Comparison

use a design that compares two or more treatments -control group vs treatment group -placebo vs drug

what variables are used to represent the probability that Type I error and Type II error, respectively will happen

alpha= P(type one error) Fancy B= P(type two error)

if no alpha for a significant test is given use...

alpha=0.05

completely randomized design

an experimental design that uses random sampling and random assignment pros: should reduce bias by a lot con: possibility of sampling variability because not accounting for any variables

probability situation: multiple outcomes, mutually exclusive

answer: add probabilities P(A U B)= P(A) + P(B) also means P(A or B)= P(A) + P(B) REMEMBER P(A and B)= 0 for mutually exclusive events, they cannot happen at the same time

probability situation: multiple outcomes- NOT mutually exclusive

answer: add probabilities but subtract the overlap (if using venn diagram just add up the 3 sections in the diagram) formula: P(A U B) = P(A) + P(B) - P(A and B)

probability situation: multiple events- dependent

answer: multiply probabilities, account for the change in probability with each trail, account for combinations (nCr) formula: nCr (n over k) x Pevent1 x Pevent2 x Pevent3 .....etc, etc REMEMBER THESE PROBABILITIES CHANGE

probability situation: multiple events- independent

answer: multiply probabilities, and account for COMBINATIONS in which these events can occur (nCr) formula: nCr (n over k) x (Psuccess)^#of successess x (Pfail)^# of fails

probability situation: conditional probability (A given B)

answer: probability of both events/ probability of first event P(A I B) = P(A and B)/ P( B) or P(B I A)= P(A and B)/ P(A)

probability situation: probably of "at least one"

answer: the opposite of "none" 1-P(0)

In an Experimental study you...

apply a treatment

4 principles of a good experiment: Random Assignment

assigning participants to experimental and control groups by chance, this helps create roughly equivalent groups of experimental units by balancing the effects of other variables (can be done with random number generator, names in hats, etc)

how do you interpret a p-value, what does the p-value mean?

assuming Ho (null hypothesis) in context, there is a (blank-whatever p-value is) probability of getting a sample proportion that is not the Ho by chance.

how do you interpret the results of a test for which the p-value is greater than alpha?

because the p-value is greater than the significance level (0.05), we fail to reject the Ho. we do not have enough convincing evidence in support of Ha in context.

how do you interpret the results of a test for which the p-value is less than alpha?

because the p-value is less than the significance level (0.05), we can reject the Ho. there is convincing evidence in support of Ha in context.

why can two events that are mutually exclusive NEVER be independent?

because you knowing something about event A, when event A and B are mutually exclusive, means the occurrence of A helps you know B cannot happen because if they are mutually exclusive meaning both of then cannot happen at the same time, so they cannot be independent.

how to calculate df for Chi-square test of GOF

df= # of categories- 1

how to calculate df for chi-square test of homogeneity and independence

df= (# of rows-1)(# of columns-1)

df for a linear regression confidence interval is

df= n-2

how do we calculate the degrees of freedom of a t-distribution?

df=n-1

definition of the margin of error

distance between the sample statistic and the ends of the confidence interval

In an Observational study you...

do not apply treatment, you use a survey or observe

power increases when...

type one error increases, and type two error decreases

transforming and combing a random variable changes variable distribution: multiplying/ dividing by a CONSTANT number

effect on center/ mean: doubles or divides with multiplying or dividing on data effect of spread: spread changes with multiplying or dividing

transforming and combing a random variable changes variable distribution: adding/ subtracting a CONSTANT number

effect on center/ mean: goes up or down with adding or subtracting on data effect on spread: NONE

transforming and combing a random variable changes variable distribution: combining (adding or subtracting two random variables to each other)

effect on center: mean= meanx +meany effect on spread: standard deviation^2= St.dev^2x + st.dev^2y

what are mutually exclusive outcomes

events that cannot occur at the same time and have no outcomes in common P(event A and event B)= 0 ex: possibility of you passing and failing the test at the same time

how do you calculate the expected value of a discrete random variable

expected value (x)= (xipi) ex: 45(.34) + 44(.33) etc etc

How do we interpret a confidence LEVEL?

for example= 95% confidence level If we were to take many samples of sample size n and calculate many intervals, about 95% of the interval will capture the parameter (in context)

use InvNorm when...

given percentage or probability

if dealing with t-distribution and your sample size is NOT 30 or more, what other method can you use to catch for normality?

graph the data and look for outliers or skew to determine if normal or not

control group

group in experiment that does not receive the treatment, used for comparison

cluster sample

grouping based on location of individuals Pros: convenient while somewhat precise cons: not as precise as SRS or stratas

what information does the residual plot give you?

helps determine if data is linear and appropriate to be modeled by a least square regression line

what does a confidence interval allow us to do?

helps us narrow down our options for what ever we are trying to find, they allow us to take a statistic and find the true parameter in an interval

what are the null and alternative hypotheses for chi-square test of goodness of fit?

ho: states a claim about single variable in the population of interest ha: states that the categorical variable does not have the claimed distribution

what are the null and alternative hypotheses for the chi-square test of independence?

ho: there is no association between (blank) and (blank) in population of (blank) ha: there is an association between (blank) and (blank) in the population of (blank)

what is a residual?

how far the data is from the least square regression line -the differences between an observed value and the response variable

sampling error (margin of error)

how far we can expect the estimate to be from the true value ex: a measure of the accuracy of an opinion poll -measures variability

what is the margin of error

how far we can expect the sample statistic to vary from the population parameter

when to use nCr...

in calc, type in sample size of your data (n) then go to math: prb: scroll to nCr, click it: then type in your number of successes (k) use when you have multiple events that are independent

when making a claim about a study or experiment that utilizes matched pairs for a significance test use the formula...

in calculator: T test -use when there is a difference in means and data has to be in a certain order or it becomes messed up

where is the point estimate of a confidence interval

in the middle

what is sampling distribution

it describes all possible values of a statistic and how often they happen, a graph of statistics taken from multiple samples

4 principles of a good experiment: Control

keep other variables that might affect the response variable constant in all groups ex: if you are testing a new type of hair conditioner, make sure the subjects all use the same shampoo, shower the same amount of times, have a similar type of hair, make sure the new conditioner is the only thing that could change the subjects hair and not another variable

how to find range

largest data point- smallest data point -range is not resilient to the effect of outliers (use with median)

expected value

long term average based on all the probabilities mean value

use Normalcdf when...

looking for percentage/ probability

explanatory variable (x):

may help explain/ predict changes in response variable, dependent variable

Binomial curve: CENTER can be found with

mean=np number of trails x probability of success= expected # of success

response variable (y):

measures the outcome of a study, independent variable

what are the two type of hypotheses used in confidence intervals?

null hypothesis (Ho) alternative hypothesis (Ha)

Non response bias

occurs when an individual chosen for the sample cannot be contacted or refuses to participate ex: if a school sends out a survey to teachers about vacation days and only 10 out of 30 teachers reply then because of the 20 teachers who did not reply there might be a non- response bias that causes the estimate to be under or over estimated

Under-coverage bias

occurs when members of the population cannot be chosen in a sample (you cannot gain responses from prisoners, the homeless, children, so you really cannot survey the entire population) ex: if you are surveying people's opinions about race affecting the criminal justice system and you are unable to survey prisoners of color in jail about their opinions, this is under-coverage which might effect if your estimate is over or underestimated

when estimating a population mean and the population standard deviation is NOT known for a confidence interval use the formula...

one sample t interval xbar +/- t* (sample standard deviation/ square root of n) on calculator stat: test: T interval

when estimating a population mean and the population standard deviation is known for a confidence interval use this formula....

one sample z interval for sample mean xbar +/- z*(population standard deviation/ square root of n) on calculator: stat: test: Z interval

when estimating a population proportion for a confidence interval use this formula...

one sample z interval for sample proportion phat +/- z* (square root of phat(1-phat) / n) on calculator: stat: tests: 1-PropZInt

type one and type two error always go in the (blank) direction

opposite

when running a chi-square test what 3 things must you report?

p-value test used df

after running a t significance test what are three things you should report....

p-value found with formula reject/ fail to reject ho df

after running a z significance test what are two things you should report...

p-value found with formula reject/ fail to reject ho

if a sample proportion is not given, assume...

p= 0.05, assumes greatest margin of error

blind study

patients (experimental units) do not know which group they are in (control group or experimental group)

placebo effect

perceived effect on a subject when in reality they did not receive the treatment

what is a parameter

population mean, standard deviation, and proportion describes the population

how is power calculated?

power= 1- P(type II error) =1- fancy B

what is the r^2 value (coefficient of determination)

r^2= 1- summation of (yi- yhat)^2 / summation of (yi -yhat)^2 -difference in the residuals that has been accounted for by the least squares regression line, how much data the least squares regression line covers, usually a percent

conditions for chi-square tests:

random independent/ 10%: n < 10% of population large counts- all expected counts must be larger than 5, find all the expected counts and check

4 principles of a good experiment: Replication

repeat an experiment over and over to make sure response variable is constant -use enough experimental units in each group that any differences in results/ effects of the treatment can be distinguished from chance differences between the groups

double-blind study

researchers does not know which group is control or experimental -prevents bias in measuring results

how do you calculate a residual?

residual= observed y- predicted y = y- yhat (plug the x-value into the least square regression line and then subtract it from the y-value)

for homogeneity and independence chi-square tests what formula do you use to calculate expected value?

row total x column total / table total

type one error and power always go in the (blank) direction

same

what is a statistic

sample mean, standard deviation, proportion describes the samples

Convenience sample

sampled based on convenience (subjects availability, reach, etc) pros: easy and quick cons: bad bad results

Randomized block design ("blocking)

separating experimental units into groups (blocks) based off something they have in common and randomly dividing the members of each block to each treatment so you are comparing each block to the other subjects in its own block pro: accounts for variation in experimental units con: you've influenced it more than a completely randomized design by accounting more for variability

geometric distribution shape is always

skewed right as you continue the probability of having a success gets less and less

if you adjust sample sizes of a confidence interval it changes by the...

square root of the amount

Binomial curve: SPREAD can be found with

standard devation= square root of np(1-p)

how to calculate the standard deviation of binomial curve

standard deviation= square root of (n)(p)(1-p)

what formula can you use to calculate the spread (standard deviation) of a discrete random variable by hand?

standard deviation= square root of (x-mean)^2 x P(x=x) standard deviation= square root of st. dev^2

formula for confidence interval:

statistic +/- critical value x standard deviation of the statistic statistic: mean or proportion critical value: z* or t* standard deviation: use formula sheet, sampling distribution is also called standard error

sample

subgroup of individuals that we want to take data from, represent the population -measured with a survey, experimenting etc -this is used instead of population more often because of convenience, money, time

loaded questions

systematic pattern of inaccurate answers in a survey caused by poor wording of questions, order of questions, anything that can create a bias in the subjects response

significance test for linear regression test statistic is

t= b-slope/ SEb can be found in calculator output

when and why do you use a chi-square test for independence?

testing if variables have an association or not, if they have no association they are independent but if they have an association they are not independent (are we related to each other?) -1 sample, 2 variables use X^2-test in calculator (put observed in L1 and expected values in L2)

what is a z-score

the a mount of standard deviations away a data point is from the mean -ONLY works with normally distributed data

what happens to the margin of error (thus the WIDTH of the confidence interval) if we decrease the confidence level

the decreased confidence level increases the margin of error

what happens to the margin of error (thus the WIDTH of the confidence interval) if we decrease the sample size

the decreased sample size will increase the margin of error

population

the entire group of individuals we want information about -measured with a census

generalizability

the extent to which the results of a sample (or experimental group) can be applied to a certain population

what happens to the margin of error (thus the WIDTH of the confidence interval) if we increase the confidence level

the increased confidence level with decrease the margin of error

what happens to the margin of error (thus the WIDTH of the confidence interval) if we increase the sample size

the increases sample size decreases the margin of error

experimental units (subjects when human)

the physical entities which can be assigned, at random, to a treatment

how do you interpret slope for the least-square regression line

the predicted increase/decrease in (slope) is (blank) per (blank) increase in (x)

what is the definition of power?

the probability that the test will reject Ho at a chosen significance level when the Ha is true

how can a small sample size affect the validity of the sample? (this is related to sampling error)

the value would be less precise with a greater margin of error, we can expect the estimate from this smaller sample to be further from the true value than a larger sample would be

what is a discrete random variable

there a gaps between the numbers, single digits are this random variable 1, 2, 3, 4, 5, ......

Type two error and power always go in the (blank) direction

this error and power always go in opposite directions

what is a type II error?

this error occurs when you fail to reject the Ho, but in reality you should have rejected the Ho

what is a type I error?

this error occurs when you reject the Ho, but in reality you should have failed to reject the Ho

what is a proportion

this is a section of the population or sample

what is a mean

this is the average of the data in that population or sample

what is a continuous random variable

this random variable can take any number of values, no gaps between the numbers 1.01, 1.02, 1.03.....45.847923874 etc

power decreases when...

type one error decreases, and type 2 error increases

what do you say in STATE for a significance test

we will use a (blank) test to test the following hypotheses (Ho vs Ha) at the (blank)=alpha level Ho= no change/ no difference Ha= there is change/ there is a difference

false answers

when people do not tell the truth on surveys causing the overall results to be under or over estimated

when do we use a t-distribution?

when we have to use s (sample standard deviation) to estimate population standard deviation

what is the Law of Large numbers

when we observe more and more repetitions of any chance process, the proportion of times that specific outcome occurs begins to approach a single value

when and why do you use a Chi-square goodness of fit?

when you need to compare the observed distribution to a hypothesized (expected) one in 2 or more populations expected counts= (observed-expected)^2 / expected -sample distribution to population distribution ( are you my son/ daughter?) use X^2-GOF-test in calculator

z-score equation

x-mean/ standard deviation

what is the formula for the least-square regression line (line of best fit)

yhat=a + bx a= y-intercept b= slope yhat= predicted value of y for any given value of x

binomial curve: SHAPE approaches normality if...

you can expect at least 5 success and 5 failures

how to calculate margin of error for confidence interval

z*(square root of phat(1-phat) / n)


Conjuntos de estudio relacionados

Chapter 2, 6, 11-16, 27, 32- TEST TWO

View Set

Hello, Universe By Erin Entrada Kelly

View Set