AP Statistics Semester 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

Completely Randomized Design

Experimental units assigned to the treatments completely by chance

Cumulative frequency

Add the values with all the smaller values

Statistically significant

An observed effect s large that it would rarely occur by chance

Describing the sampling distribution of barx1- barx2

Liquid L Mean 27 ounces of soft drink Standard deviation1 .8 ounces 20 cups Liquid M Mean- 17 ounces Standard deviation- .5 ounces 25 cups The mean= 27-17= 10 ounces Standard deviation= SQRT((.8^2/.2)+ (.5^2/25)= .205 ounces

Cluster Sample

Group individuals located near each other then do an SRS

Mean and standard deviation of sampling distribution of Bar X

Mean of bar x is mu SD= standard deviation of population/SQRT(n)

Unimodal

Have a single peak

Sample size for desired margin of error

Z star* SQRT(Phat(1-Phat)/n)</= ME if p hat =.5

Multiplying by a constant

To change units like meters to feet

Boxplot

-A central box is drawn from the first quartile to the third quartile -a line in the box marks the median -lines extend from the box out to the minimum and maximum -outliers are marked with special symbols like an asterisk

Misleading charts

-Charts that start anywhere other than 0 -Bar charts that have different widths per category

Standard deviation facts

-Should only be used when mean is the measure of center -is always greater or equal to 0 - has the same unit of measurement as the original observations - is not resistant so outliers can make it larger or smaller

Density Curve

-always on or above the horizontal axis -has an area of exactly 1

Variability of a statistic

Described by the spread of its sampling distribution. The spread is determined by the size of the random sample. Larger samples give smaller spreads. Population size does not matter just the sample size

Conditional distribution

Describes only one group

Probability model

Description of some chance process that consists of two parts: a sample space S and a probability for each outcome

Simple Random Sample (SRS)

Everyone has an equal chance to be selected

Explanatory variable

Explain changes in response variable

C%. Confidence interval

Gives an interval of plausible values for a parameter. This interval is point estimate +/- margin of error The difference between the point estimate and the true parameter value will be less than the margin of error in C% of all samples

Confidence level C

Gives the overall success rate of the method for calculating the confidence interval

Population distributij

Gives values of the variable for all individuals in the population

Parameter

Number that describes a characteristic of the population

Population distribution & distribution of sample data vs. sampling distribution

Population distribution & distribution of sample data describes individuals while sampling distribution describes who a statistic varies in many samples

Confidence interval formula

Statistic +/- critical value * Standard deviation

Interquartile range

Third quartile- 1st quartile

Determining sample size example

Wants a margin of error no more than .03 at 95% confidence 1.96SQRT(.5(1-.5)/n)< .03 1.96/.03* SQRT (.5)(1-.5)<SQRT(N) (1.96/.03)^2 (.5)(1-.5)<N 1067.111<N

Confidence interval example

799 teens and 2253 adults asked about use of social media 80% teens 69% adults said they used social media 95% confidence interval (.8-.69) +/- 1.96(SQRT((.8)(.2)/799)+ ((.69)(.31)/2253)= .11+/- .034 (.076,.144)

Interpreting confidence levels

"If we take many samples of the same size from this population, about 95% of them will result in an interval that captures the actual parameter value

Difference of random variables

D= mean X- Mean Y

Experiment

Deliberately imposes some treatment to measure their responses

Cumulative relative frequency

Divide the entries in the cumulative frequency column by the total number of values and multiply by 100 for the percent

Relative frequency

Divide the value by the total number and multiply by 100 to get the percent

Subtracting a constant (for error)

Error=Guess-actual answer You move each data set 13 to the left to see the error in their guesses

Finding a critical valued

Find the critical value for an 80% confidence interval 20% is left out and split in half for each tail of the curve so the area is .1 Invnorm(.1,0,1)= -1.28 = z score

We can increase the power of a sig test by

Increasing the sample size, increasing the significance level, or increasing the difference that is important to detect between the null and alternative parameter values

Mean and SD of bar x example

Mu- 25 microorganisms per liter SD- 7 microorganisms per liter SRS of 10 adults X is mean of 25 microorganisms because it is an unbiased estimator of Mu SD- 7/SQRT(10)= 2.214

Multiplying or dividing by a constant effect on random variable (b)

Multiples measures of spread, center, and location -shape of distribution did not change

Mean of discrete random variable

Multiply each value by its probability

Statustic

Number that describes a characteristic of a sample

Residuals

Observed Y- Predicted Y

Observational study

Observes individuals and measures variables of interest but does not attempt to influence the responses

Nonresponse

Occurs when an individual chosen for the sample can't be contacted or refuses to participate

Undercoverage

Occurs when some members cannot be chosen in a sample

Binomial coefficient

P(X=k)= P^k(1-P)^n-k- this is to calculate binomial probability N = n!/k!(n-k)! K 5 total kids = 5!/2!3!= 5,4,3,2,1/2,1,3,2,1 2 kids have O blood Cancel out the 3,2,1 to get 5,4/2,1 to get 5*4/2*1= 10

Geometric probability formula

P(Y=K)=(1-P)^(k-1) * P

Using binomial probability formuka

P(x>3) if more than 3 kids have type O blood 5(got from 5 Ncr 4)*(.25^4)* (.75)+ (1)(*.25^5)*(.75^0)= .01465+.00098= .01563= 1.6% chance

Difference between pie and bar charts

Pie charts MUST ADD UP TO 100 OR ITS A PAC-MAN CHART

Cumulative relative frequency graph

Plot a point corresponding to the cumulative relative frequency in each class

Random sampling

Using a chance process to determine which members to include in the sample

Bias

Using a method that favors certain outcomes over the others

One Proportion z test on calculator

Using previous example STAT Tests 1 Prop Z Test P0= .08 X=47 N:500 Proportion: >P0

Significance test for p1-p2

Using previous example H_0: p1-p2= 0 H_a: P1-p2=/=0 Where p1-p2 is the difference of proportions of the students who didn't eat breakfast between 2 schools Random- yes 10%- yes Large counts- all are greater than 10 19/80= .2375 26/150= .1733 .2375-.1733= .0642 .0642/SQRT((.1574)/80)+((.1574/150))= 1.17 2 Prop z Test- P value- .2427 There is not convincing evidence that the true proportions at the two schools who didn't eat breakfast are different

A confidence interval for two sided test

Using previous problem PHat +/- Z* (SQRT((PHAT)(1-PHAT)/n)) InvNorm- (.025,0,1)= -1.96 95% confidence .6 +/- 1.96 SQRT((.6)(.4)/150)= .6 +/- .078= (.522,.0678) this is the true proportion of students who say they have never had a cigarette

Normal probability distributions example

What's the probability that a randomly chosen woman had height between 68 and 70 inches Mean= 64 in Standard deviation= 2.7 inches (68-64)/2.7= 1.48 (70-64)/2.7= 2.22 Find Scores in table A .9868-.9306= .0562

Using calculator for probability distributions

What's the probability that a randomly chosen woman had height between 68 and 70 inches Mean= 64 in Standard deviation= 2.7 inches Normal CDF (lower- 68, upper- 70, etc.)

Computing the test statistic and p value

A battery company wants to test H_0:Mu= 30 H_a:Mu>30 SRS of 15 batteries BarX= 33.9 S_x= 9.8 (33.9-30)/(9.8/SQRT(15))= 1.54 2nd Vars We use T values H_a>30 = T>1.54 Degrees of freedom= 15-1= 14 Tcdf(1.54,1E99,14)= .0729

Checking conditions example

A confidence interval for the true proportion. P of red beads in the container Class has 107 red beads and 144 white beads N*P hat> 10 251(107/251)= 107> 10 251(144/251)= 144> 10

Regression line

Summarizes the relationship between two variables- line of best fit

Binomial setting examples

Suppose a parent has 5 children. Let x be the number of children with o blood- is independent, has a specific number, has success rate IS BINOMIAL Turn over the first ten cards and let Y be the number of aces you observe NOT INDEPENDENT because probability of getting an Ace decreases the more you get it NOT BINOMIAL Turn over top card and put it in the deck and keep doing this until you get an ace- NO DESIGNATED NUMBER OF TRIALS SO NOT BINOMIAL

1 Sample t test for a mean on calculator

TESTS T-Test DATA Mu_0= 5 List= (whatever list you have the data on) Frequency- 1 Mu: <Mu

Variance of the sum of independent random variables

T^2= X^2 + Y^2

Standard Normal Table

Table of areas under a standard normal curve

Continuous random variables

Takes all values in an intervals of numbers. Probability distribution described by a density curve

Random Variable

Takes numerical values that describe the outcomes of some chance process

Chebyshev's inequality

This is for the proportions of other curves besides normal 1-(1/k^2) k=number of standard deviations away from mean

Probability distribution

Tossing a coin 3 times has 8 equally likely outcomes 0 heads= 1/8 1 head= 3/8 2 heads= 3/8 3 heads= 1/8 -all possible values and distributions

Describing sampling distribution of phat1-phat2

Two bags of colored goldfish Bag 1 has 25% red crackers and bag 2 has 35% red crackers Teacher takes an SRS of 50 crackers from bag 1 and SRS of 40 crackers from bag 2 Shape- 50(.25)=12.5 50(.75)= 37.5 40(.35)=14 40(.65)=26 All are greater than 10 Mean- .25-.35= .10 Standard deviation SQRT(((.25)(.75)/50)+((.35)(.65)/40))= .0971

Connection between mutually exclusive and independence

Two mutually exclusive event can never be independent like because if one event happens the other event is guaranteed not to happen Like someone who is pregnant and someone who is a man *insert triggered tumblr SJW's*

Finding binomial coefficient on calculator

Type the total number N Math PRB nCr Then type in the number needed So like 5,Math,PRB, NCr, 2= 10

Extrapolation

Use of a regression line for prediction outside the interval of values of x. Often not accurate

Point estimators examples

A. Quality inspectors want to estimate the mean lifetime Mu of the AA batteries produced in an hour at a factory. They select fifty random batteries- the mean bar x is the point estimator B. What proportion of high schoolers smoke? 2792 said they smoked out of 15,425 2792/15425= point estimate C. Quality control inspectors in A want to investigate the variability in battery lifetime by estimating the population variance sigma^2 which is the point estimate

Informed consent

All individuals must give informed consent

Data Analysis

Gathering, organizing, analyzing, and interpreting Data

Independent random variables

If knowing any event involving x tells us nothing about event involving y

Symmetric

If left or right sides appear to be even

Pie Charts and Bar Charts

Analyzing categorical data

Variables

Characteristics of individuals

Inference

Drawing conclusions that go beyond the data

Outliers

Values that stand out

Confidence interval for a difference between two means

(Barx1- barx2) +/- t*SQRT((s1^2/n1)+(s2^2/n2))

Confidence interval for a difference between two proportions

(Phat1-phat2)+|- Standard Deviation (look up for formula ⬆️)

Test statistic

(Statistic-parameter)/standard deviation of statistic

Confidence interval on calculator

(Using previous problem) ^ Stat Tests 1-PropZint (MAKE SURE ITS EXACTLY 1-PROPZINT) X= 246 N= 439 C-level- .95 Calculate (Make sure to press enter only once after calculate or else you'll only get the higher interval) You should get the lower and upper intervals and the sample proportion as well as N

Standardizing

(X-Mean)/Standard deviation= Z score

Finding Combined phat

(X1+X2)/(n1+n2) School 1 said 19/80 students didn't have breakfast while school 2 said 26/150 students didn't have breakfast Combined PHat= (19+26)/(80+150)= 45/230= .1957

Variance of discrete random variaboes

(X1-mean)^2 *(Probability) + (x2-mean)^2 *(P) etc Standard deviation is square root of variance

One sample t test statistic

(barx-Mu)/(S_x/SQRT(n))

Individuals

(the subjects) objects described by a data set

Effects of Adding or Subtracting a constant

-changes measures of center and location -does not change shape of the distribution or measures of spread (range,IQR, Sx)

Effects of multiplying or dividing

-changes measures of center, location, and spread -does not change the shape of the distribution

Assessing normality

-plot the data and see if it is symmetric or bell shaped -check whether it follows the 68-95-99.7 rule -making a normal probability plot

Normal Curves

-symmetric, single peaked, bell shaped -mean is mu, standard deviation sigma -changing the mean without changing the standard deviation moves the normal curve along the horizontal axis without changing the spread -the standard deviation controls the spread. Curves with larger standard deviations are more spread out

Rules of probability

-the probability of any event is a number between 0 and 1 -All possible outcomes together must have probabilities that add up to 1 -if all outcomes in the sample space are equally likely the probability that even A occurs is P(A)= number of outcomes corresponding to event A/ total number of outcomes in the sample space -The probability that an event does not occur is 1 minus the probability that the even does occur Event (not A)= A^c (complement of A) -If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities

Analyzing random variables on calculator

-values in list 1 - probabilities in list 2 -1 Variable stats

Venn Diagram

Displays sample space of two events and shows the union and intersection of the two

Area to z-scores on calculators

1. 2nd VARS 2. InvNormal Question will ask what is the x percentile Area= x percentile Mean-0 Standard deviation- 1

Z-scores to areas on calculators

1. 2nd VARS 2. Normalcdf If questions asks for greater than a value Lower= Value Upper= 1E99 (2nd , ) Mean- 0 Standard deviation- 1

Making a normal probability plot

1. Arrange data from largest to smallest and record percentiles 2. Use invNorm to find Z-scores 3. Plot each observation x against its z-score If it looks like a straight line, it is a normal curve

Criteria for establishing causation when we can't do an experiment

1. Association is very strong 2. Association is consistent 3. Larger values of the explanatory variable are associated with stronger responses 4. Alleged cause precedes the effect in time 5. Alleged cause is plausible

Types of variables

1. Categorical- cannot do math with, places into groups (zip code, phone number) 2. Quantitative- numbers you can do math with (hint- FINDING AVERAGE)

Principles of experimental design

1. Comparison- use a design that compares two or more treatments 2. Random assignment- use chance to assign experimental units to treatments 3. Control- keep other variables that might affect the response the same for all groups 4. Replication- use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups

Facts about correlation

1. Correlation makes no distinction breeds explanatory and response variables 2. R does not change when we change units of measurement 3. R is not a unit of measurement 4. Correlation does not imply causation 5. Correlation requires both variables to be quantitative 6. Correlation only measures linear relationships 7. A value close to 1 or -1 does not mean it is linear. USE YOUR EYES 8. R is strongly affected by outliers so it is not resistant 9. Correlation is not a completely summary of two variable data

Making histogram on a calculator

1. Stat 2. Edit 3. 2nd + y- hit the first graph unless already using 4. Find histogram icon 5. Press zoom and zoom stat

Choosing SRS- calculator

1. Math 2. PRB 3. Rand(Int 4. Min, Max

Moving beyond a point estimate

1. Sample mean is 240.8 but bar x should not be exactly Mu 2. 68-95-99.7 rule says x with 2(5)= 10 which is 2 standard deviations away from the mean (5 is known SD) n=16 3. 240.8+10= 250.8 240.8-10= 230.8

Determining sample size

1. Significance level- what is a bigger risk? A type 1 or type 2 Error? If type 1, decrease the alpha or significance level to .01. If type 2, do the opposite. 2. Effect Size- how large a difference between the null p value and the actual p value is important to accept 3. Power- what chance do we want our study to have to detect a difference of size we think is important

Finding Standard deviation

1. Stat 2. Calc 3. 1 Var Stats 4. Hit enter 5. S_x IS the Standard devation

Calculating R and linear regression

1. Stat 2. Calc 3. LinReg(ax+b)

Paired Data

1. Subtract the data so you can get 1 Table

How to make a residual plot

1. Turn on plot by 2nd y= 2. Make sure it's on the first option:scatterplot 3. Click down the the y list 4. 2nd stat (list) 5. Go to residual For your Y list

Normal calculations involving n example

1500 college students are asked how far away their home is 35% attend college within 50 miles of their home Find the probability that the random sample will give a result within 2% of the true value (33% & 37%) Mean= p= .35 SD= SQRT((.35)(.65)/1500)= .0123 Normalcdf(.33,.37,.35,.0123)= .8961 About 90% of samples of size 1500 will give a result within 2 percentage points of the truth

Finding t values on calculator

2nd VARS InvT( (For 95% confidence interval based on SRS of size n=12) Area- .025 Df- 11 Calculate- should get -2.2009

Geometric probability on calculator

2ND VARS geometpdf= P(Y=K) Geometcdf= P(Y<K) P='probability X value

Binomial probability on calculator

2nd Vars Binompdf= P(x=k) Binomcdf= P(x<K) Trials= Total N= 5 trials for blood type P= probability for O blood= .25 X= 3 kids .9843 When looking for x>K 1-Binomcdf= X

One sample t interval for Mu

40 light duty engines of the same type and the mean reading was 1.2676 and SD was .3332 A. Construct a 95% confidence interval for the mean amount of nox emitted Bar x +/- t* (S_x/SQRT(N) InvT(.025,39) 1.2675+/- 2.023 (.3332/SQRT(40)= 1.2675+/- 1.066= (1.1609,1.3741)

68-95-99.7 rule

68% of the observations fall within 1 standard deviation of the mean 95% fall within 2 standard deviations 99.7% fall within three standard deviations ONLY WORKS FOR NORMAL CURVES

Stem plot

A display for fairly small data sets

Critical value

A multiplier that makes the interval wide enough to have the stated capture rate (the 95% with 2 standard deviations)

Performing a significance test about the mean

A researcher measures the dissolved oxygen level at 15 randomly chosen locations along stream The results are 4.53, 5.04, 3.29, 5.23, 4.13, 5.50, 4.83, 4.40, 5.42, 6.38, 4.01, 4.66, 2.87, 5.73, 5.55 A DO level below 5 puts aquatic life at risk H_0= 5 H_a:Mu<5 Where Mu is the mean dissolved oxygen level Random: 15 RANDOM spots are picked 10%: there are an infinite number of locations so this is fine Large Sample- CHECK NORMALITY USING GRAPHS 1 Variable stats- BarX- 4.77 S_x= .94 (4.77-5)/(.94/SQRT(15))= -.94 Degrees of Freedom- 14 H_a:Mu<5= T<-.94 Tcdf(1E-99,-.94,14)= .1816 We fail to reject H_0

Treatment

A specific condition applied to the individuals

Null Hypothesis

A statement of no difference or the original hypothesis. The claim we seek evidence against. Ex. H_0:P=.8

Point estimator

A statistic that provides an estimate of a population parameter. That value is called a point estimate

Unbiased estimator

A statistic used to estimate a parameter if the mean of its sampling distribution is equal to the value of the parameter being estimated

Sample

A subset of individuals that we actually collect data from

Influential

A value is influential if removing it would distinctly change the result of the calculation AKA outliers

Linear transformations example

A. Temperature x of water follows normal distribution With mean 34 degrees Celsius and standard deviation of 2 degrees Celsius Mean y-32 + 9/5*meanx = 32+ 9/5(34)= 93.2 Fahrenheit Standard deviation- SDy= 9/5* SDx 9/5*2= 3.6 Fahrenheit B. Bath water should be between 90-100 Fahrenheit NormalCDF(90,100,93.2,3.6)= .7835

Can we use a t* critical value to calculate a confidence interval for the population mean

A. To estimate the average GPA at your school you randomly select 50 students from classes you take- NO BECAUSE BIAS LITERALLY LEARNED THIS IN RHE BEGINNING OF THE YEAR B. Still has to be relatively symmetric SO NO SKEWING BECUZ OF OUTLIERS c. there is skewing but NO OUTLIERS SO YES

A two sided test

According to the CDC, 50% of High School students have never smoked a cigarette. 150 random students are sampled and 90 say they have never smoked a cigarette P_0= .5 P_a=|= .5 Where p= the proportion of students who have never smoked a cigarette Random- the sample is 150 RANDOM students 10%- reasonable to assume 1500 students go to a high school Large counts- 150(.5)= 75 > 10 90/150= .6 .6-.5/(SQRT((.5)(.5)/150)= 2.45 1 PROP Z TEST(.5,90,150,=|=P_0)= .014 We are able to reject H_0 because the p value is less than a= .05. We have convincing evidence that of all the students in the high school. The proportion that says they had never smoked a cigarette is .5

Mean

Add up all the value and divide by the number of values The mean is sensitive to extreme values

Discrete random variables

Adding all probabilities must add up to 1 -fixed set of values with gaps in between

Adding or subtracting a constant effect on random variable (a)

Adds measure of center (mean, median, percentiles) - does not change shape or measures of spread (Range, IQR)

Least-Squares regression Line

Aka line of best fit

Confidentiality

All individual data must be kept confidential

Institutional review board

All studies must be reviewed in advance by this board. -board protects safety and well being of subjects

Distribution

All the different values of a variable and the frequency of those values

Sampling without replacement

An airline trained 25 officers- 15 male and 10 female Of 8 captains chosen 5 are female why is that not fair? 8 NCR 5* (.4)^5*(.6)^3= .124 The correct probability is .106 Binomial probability assumes that the number of females chosen stays the same at 40%. Because 8/25 is almost 1/3, tie binomial probability is off

Factors

Another terms for explanatory variables

Event

Any collection of outcomes from some chance process. Designated by capital letters Example: P(A)= a sum of a dice roll equals five P(A)=4/36

Geometric Setting

Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. P of successes must be the smae

Convenience sample

Asking the first people you see

Sampling distribution of a difference between two means

Average height of 10 year old girl- 56.4 inches Standard deviation- 2.7 inches Random sample of 12 girls Spread- 2.7/SQRT(12)= .78 inches Average height of 10 year old boy- 55.7 inches Standard deviation- 3.8 inches Random sample of 8 boys Spread- 3.8/SQRT(8)= 1.34 inches

One sample t interval for a population mean

Bar x +/- t* (S_x)(SQRT(N))

Bias, variability, and shape as described by a dart board

Bias means the aim is off and we miss the bulls-eye High variability means the results are scattered So we need Low bias and low variability

Voluntary response sample

Consists of people who choose themselves by responding in a general invitation

Simulating a sampling distribution

Choosing 500 SRSs of size n=20 from population of 200 chips 100 red and 100 blue Exjplain what a dot at .15 is- in SRS of 20 chips there were three read chips

Stratified Random Sample

Classify the population into groups then choose a separate SRS for each group

Census

Collects data from every individual

Matched Pairs Design

Create blocks by matching pairs of similar experimental units

Dot plots and stem plots

Describe quantitative data

Describing scatterplots

Direction- negative or positive association Form- curved or linear Strength- how closely the points follow the form

Checking for independence

Dominant hand. Male. Female. Total Right. 39. 51. 90 Left. 7. 3. 10 Total. 46. 54. 100 (Left Handed/Male)= 7/46 Left handed= 10/100 They are not independent because they're not the same score

Central Limit Theorem

Draw an SRS of size n from any population with mean mu and finite SD sigma. When n is large, the sampling distribution is approximately normal

Identifying outliers

First quartile - (1.5*IQR)= lower outliers Third Quartile + (1.5*IQRH= upper outliers

Significance Test

Formal procedure for using observed data to decide between two competing claims

Block

Group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments

Type 1 Error probability examples

H_0:P= .08 Ha:P>.08 Where p is the proportion of all potatoes with blemishes in the shipment of 500 potatoes at Alpha=.05 Shape- approximately normal because np>10 Center- .08 SD- SQRT((.08)(.92)/500)= .01213 INVNORM(.95,.08,.01213)= .05 We reject!

Two sided test using t values

H_0= 5 H_a =|= 5 N= 37 T= -3.17 Tcdf(1E-99,-3.17,36) and then multiply the answer by two to get both sides of the tail

Finding probabilities involving the sample mean exmaple

Height of young women Mean- 64.5 inches SD- 2.5 inches Find probability a girl is taller than 66.5 inches Normalcdf(66.5,1 E 99, 64.5, 2.5)= .2119 Find the probability that the mean height of an SRS of 10 young women exceeds 66.5 inches SD= 2.5/SQRT(10)= .79 NormalCDF(66.5,1 E 99, 64.5, .79)= .0057

Confidence interval example

How do the sizes of of long leaf pine trees in the north and south compare Number of each- 30 North: mean- 23.7; SD- 17.5 South: mean- 34.53; SD- 14.26 34.53-23.7 +/- 1.699

Parameters and statistics example

Identify population, parameter, sample, and statistic A poll asked a random sample of 515 U.S. adults whether or not they believe in ghosts. 160 said yes. Population: all US adults Parameter is p, the proportion of all us adults who believe in ghosts Sample is 515 interviewed And the statistic is p hat= 160/515

Performing a significance test

If more than .08 of the potatoes have blemishes the truck will be sent back A random sample out of 500 had 47 potatoes with blemishes Alpha- .05 H_0:P= .08 H_a:P> .08 Where p is the proportion of potatoes with blemishes Random- 500 random potatoes 10%- reasonable to assume 5000 potatoes in shipment Large counts 500(.08)= 40 >10 P hat= 47/500= .094 .094-.08/SQRT((.08)(.92)/500)= 1.15= Z score NormalCDF(1.15,1E99,0,1)= .1251 = P Value Because the p value is .1251, we fail to reject H_0. There is not convincing evidence that the shipment contains more than 8% of blemished potatoes

Conditional Probabiltiy

If one event has happened, the chance that another event will happen. P(B/A) event B occurs given event A occurs Ex. Probability a randomly selected student is male given that the student has pierced ears P(Male/Pierced ears)

Skewed to the left or right

If the left or right side are longer than the other

Independent events

If the occurrence of one event does not change the probability that the other event will happen P(A/B)= P(A) & P(B/A)= P(B)

Significance level Alpha

If the p value is smaller than alpha, we say the results of a study are statistically significant and we reject the null Hypothesis

Normal/large sample condition for sample means

If the population distribution is normal, so is the sampling distribution of bar x. This is true no matter what the sample size n is. If the population distribution is not normal, the central limit theorem says that the sampling distribution of x will be approximately normal if n is greater than or equal to 30

Mutually Exclusive or Disjoint

If two events have no outcomes in common and so can never occur together

Type 2 Error

If we fail to reject H_0 when H_a is true

Type 1 Error

If we reject H_0 when H_0 is true

Computing a test statistic

In an SRS of 50 free throws a player made 32 where the Average is 40 Z= (.64-.8)/SQRT((.8)(.2)/50)= -2.83 Normal CDF(1E-99,-2.83,0,1)= .0023 We can reject!

Five Number Summary

Minimum, First Quartile, Median, Third Quartile, Maximum

Calculating Binomial probabilities

Looking at the previous thing ^ the probability that a kid has O blood is .25 What's a probability that out of 5 kids none have o blood .75^5= .2373 What's the probability one has O blood? .75^4* .25* 5= .39551 because there are five different children who could have O blood Probability two have O blood .75^3*.25^2*10=.26367

Histogram

Looks at the distribution in groups Groups 1-5 6-10 etc (to make easier tho doesn't always have to)

Curve is symmetric

Mean and median are the same

Curve is skewed right

Mean is to the left (vice Verza for skewed left)

Sampling distribution of a sample proportion

Mean of P Hat= p Standard Deviation= SQRT(((P(1-P))/n As n increases the sampling distribution becomes approximately normal

Calculations using Central limit theorem

Mean= 1 hours SD= 1 hour Company will service 70 random air conditioners in the city. Plan to budget an average of 1.1 hours per unit Will this be enough? Standard Deviation= 1/SQRT(70)= .12 NormalCDF(1.1,1E99,1,.12)= .2023 so a 20% chance that you won't complete the work

Mean of a geometric random variable

Mean= 1/P

Formulas for mean and standard deviation of the sampling distribution of BarX1-BarX2

Mean= Mu1-Mu2 Spread- SQRT((sigma 1^2/n1)+(sigma 2^2/n2))

Mean and SD of binomial random variable

Mean= Total number * Probability SD= SQRT(NP(1-P))

Correlation

Measures direction and strength of linear relationship R is greater than or equal to -1 and less than or equal to 1

Response variable

Measures outcome of study

Standard Deviation & Variance

Measures the typical distance of values from the mean Deviations= x-the mean Variance= all the values of ((X-The Mean)^2) divided by N-1 Standard deviation = the square root of the variance

Double-Blind

Neither the subjects nor those who interact with them and measure the response know which treatment a subject received. Only the statistician knows until the end of the experiment

Linear transformations (multiplying and adding) effects on a random variable Y=a+bx

New Mean- a+(b * Old Mean) Standard deviation- |b|*old SD

Standard normal distribution

Normal distribution with mean 0 and standard deviation 1

Confidence interval for p no. 2

Out of 439 teens 246 yes to having sex find 95% confidence interval Invnorm: .025,0,1= .196 .56+ 1.96*SQRT((.56)(.44)/439) .56+/-.046 (.524,.606)

Sample proportion of successes

P Hat= Count of successes/size of sample= X/n

Proportions for samples and populations

P is simply a population proportion P hat is used to estimate the unknown parameter P

Rejecting or not rejecting H_0

P value small- reject H_0 P value large- fail to reject H_0

General addition rule

P(A U B)= P(A) + P(B) - P(A and B)

Multiplication rule for independent events

P(A Union B)= P(A) * P(B) The probability an individual o-ring functions properly is .977 (.977)*(.977)(.977).977).977).977)= .87

General Multiplication ruke

P(A union B)= P(A) * P(B/A) Ex. 93% of teenagers use internet 55% of online teens have a profile on social networking site Find probability that a randomly selected teen uses the Internet and has posted a profile P(Online+Profile)= P(Online)*P(profile/online) (.93)(.55)= .5115

Calculating conditional probabilities

P(A/B)= P(A union B)/ P(B) vice versa

Finding probability of at least ine

P(At least one positive)= 1- (no positives) One rapid rear has probability of .004 of producing a false positive If 200 people are selected who are free of the disease what is the chance one false positive will arise? .996^200= .4486 1-.4486= .5514 chance at least one person will receive a false positive

Experimental units

Smallest collection of individuals to which treatments are applied. When units are human beings they are usually called subjects

Binomial distribution

Probability distribution of X

Sample S

The list of all possible outcomes

Spread

The lowest point to the highest point

Interpreting confidence intervals

Question: if presidential election was held would you vote for A or B 95% Confidence interval for A is (.48,.54) Interpreting confidence interval- we are 95% certain that the interval from .48 to .54 captures the true proportion of all registered voters who favor candidate A What is the point estimate- .51 its the midpoint

Conditions for supporting A sig test about a difference in proportions

Random 10% Large counts

Conditions for performing inference about mu1- mu2

Random 10% Large sample size n1>30 n2>30

Conditions for performing a significance test about a mean

Random 10% of population Normal/Large Sample- the population has either a normal distribution or is >30 If n<30, use a graph to assess normality

Conditions for performing a significance test about a proportion

Random sample 10% rule Large counts- NP and N(1-P)> 10

Normal approximation to a binomial

Random sample of 2500 adults if they disagreed or disagreed 60% said they agreed Mean- 2500*.6= 1500 which is greater than 10 so it is normal SD- SQRT(2500)(.6)(.4)= 24.49 Find probability that at least 1520 agree NormalCDF(1520,100000000,2500,24.99)= 21%

Conditions for constructing a confidence interval about a proportion

Random: the data comes from a well designed random sample 10%- check that n<1/10N Large count- both n*p hat and n(1-p hat) are at least 10

Geometric settings and random variable exampke

Y= number of picks to correctly match the lucky day Probability- 1/7 Since it wants to know the number of trials it is geometric

Placebo effect

Responding favorably even when they take a placebo

Role of independence examples

SAT math score X= mean- 519 SD- 115 SAT reading Y- mean- 507 SD- 111 Mean x + Mean Y= 1026 Cannot compute Standard deviation Also is not independent because a student who scores high on one, most likely scores high on the other

Standard error of the sample mean

SE_barx= (S_x)/SQRT(N) where S_x is the standard deviation of the sample

Calculate confidence interval for p

SRS of beads from a container and got 107 red beads and 144 white bears A. Calculate and interpret 90% confidence interval for p B. It is claimed that 50% of the beads in the bag are red. Comment on this claim PHat- 107/251= .426 InvNorm(.05,0,1)= -1.646= Z score phat +/- z * SQRT(Phat(1-PHat)/N) .426+/- SQRT((.426)(1-.427)/251)= .375,.477 B. Because .5 is not in the interval, then [x] doubt the claim

Confidence intervals on calculator

STAT TESTS 2 PROP Z INT X1- 639 (.8)(799) n1- 799 X2- 1555 n2- 2253 C-level- .95

Residual plot

Scatterplot of residuals against the explanatory variable

Sampling distribution of phat1-phat2

Shape- when n1p1, n1(1-p1), n2,p2, and n2(1-p2) are all at least 10, the distribution is normal Standard Deviation of p1-p2 is SQRT((p1(1-p1)/n1)+ (p2(1-p2)/n2))

Influences on sample size

Significance level- a smaller significance level needs a larger sample Power- a higher power needs a larger sample Effect size- a noticeable difference between a null p value and actual p value needs a large sample size

Four step process of simulation

State- ask a question of interest about some chance process Plan- describe how to use a chance device to imitate one repetition of the process. Tell what you will record at the end of each repetition Do- perform the simulation Conclude- use the results of your simulation to answer the question of interest

Differences between sample and population means

The Greek letter Mu is population mean Bar X is sample mean

One sided vs. two sided hypothesis

The alternative Hypothesis is one sided if it states that it is smaller or if it states it is larger than H_0. It is two sided if it states that the parameter is different from the null Hypothesis

Alternative Hypothesis

The claim we hope to be true instead of the null Hypothesis ex. H_a:P<.8

Binomial random variable

The count X of successes in a binomial setting

Range

The difference between the largest value and the smallest

Sampling distribtuion

The distribution of values taken by the statistic in all possible samples of the same size from the same population

Population

The entire group of individuals we want information from

Coefficient of determination- R^2

The fraction of the variation in the values of y that is accounted for by the least squares regression line Examples: If all the points fall exactly on the regression line the r^2= 1. Then all the variation in y is accounted for by the linear relationship with s

Simulation

The imitation of chance behavior based on a molded that accurately reflects the situation

First quartile

The median of the values left of the median

Third quartile

The median of the values right of the median

Median

The midpoint of the distribution -if the number of values is odd the median is the center observation -if the number of values is even the median is the average of the two center observations -The median is a resistant measure of center

Center

The midpoint of the values

Geometric Random Variable

The number of trials Y it takes to get a success in a geometric setting.

Shape

The overall shape of the spread (skewed right etc)

Interpreting a P value

The p value is essentially the conditional probability. The NIH recommends a calcium intake of 1300 mg per day of calcium for teens. The NIH says that teens aren't getting enough calcium. H_0:mean= 1300 H_a:mean<1300 Where Mu is the true mean daily calcium intake in the population of teens The researchers found bar x to be 1198 and the p value to be .1404 "Assuming the daily calcium intake for teens is 1300 mg, there is a .1404 probability of getting a sample mean of 1198 mg or less just by chance."

Percentile

The percent of observations less than the chosen observation

Geometric distribution

The probability distribution of Y

P value

The probability, computed assuming H_0 is true, that the statistic (such as Phat or bar x) would take a value as extreme as or more extreme than the one actually observed, in the direction specified by Ha is called the p value of the test

Probability

The proportion of times the outcome would occur in a very long series of repetitions

Randomized block design

The random assignment of experimental units to treatments is carried out separately within each block

Degrees of freedom

The statistic t= (bar x-mu)/(S_x/SQRT(N) Has t distribution with degrees of freedom Df= n-1 If not normal then it will have a t_n-1 distribution

Marginal Distributions

The totals at the bottom and far right margins of a table

10% condition

When taking an SRS of size n from ovulation of size N. We can use a binomial distribution to model the count of success in the sample as long a n<1/10M

Standard Error

When the Standard deviation is estimated from data SQRT((P hat(1-p hat))/n)

Confounding

When two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other

Binomial Setting

When we perform several independent trials or the same chance process and record the number of times that a particular outcome occurs Four conditions Binary?- either success or failure Independent- one trial should not tell the success rate of the other Number- number of trial should be fixed Success- same probability of success each time

Rules for adding random variables

X= Pete's passengers mean- 3.75, SD- 1.0897 Y= Erin's passengers mean- 3.1 SD- .943 Pete charges $150 Erin charges $175 Calculate mean and SD of total amount Mean X= 150*3.75= $562.50 SD X- 150 * 1.0897= 163.46 Mean Y- 175*3.1= 542.50 SD Y- 175*.943= 165.03 Sum mean 562.50 + 542.50= 1105 Sum Variance- (163.46)^2 + (165.03)^2=.53,954.07 Sum SD- Sqrt(53,954.07)= 232.28

Regression line equation

Y(hat)= a(y intercept) +b(slope)x


Conjuntos de estudio relacionados

RN Somatic Symptom and Dissociative Disorders Assessment

View Set

MED TERM 2: CH. 17 MULTIPLE CHOICE

View Set

Point-Slope Form and Linear Equations Unit Test Review

View Set

PNU 135 PassPoint PrepU Adolescent

View Set

Intro to anthro exam 2; 7, 8, 9, 10

View Set