AP STATS TERMS and INTERPRETATIONS
P(atleast 1)
= 1-P(none)
Experimental vs Observation
A study is an experiment only if they impose a treatment on the test subjects Observational study- no additives
Residual
Actual minus Predicted
Linear Transformation
Adding and subtracting might adjust mean and median but does not adjust spread or shape Multiplying adjusts the mean and spread, but does not change the shape
SRS
An SRS is a sample taken in such a way that every set of n individuals has an equal chance to be in the sample actually selected
DOes ___ cause ____?
Association is NOT causation An observed association, no matter how strong, is not evidence of causation. Only a well designed, controlled experiment can lead to conclusions of cause and effect
Explain a P value
Assuming that the null is true, The P value measure the chance of observing a statistic as large or larger than one actually observed
Binomial Distribution Conditions
B-Binary, success or failure I- Trails must be independent N- Number of trials must be fixed in advance S- Probability of successes must be the same for each trial
Experimental designs
CRD- Completely randomized design- All experimental units are allocated at random among all treatments RBD- Randomized Block Design- Experimental Units are put into homogeneous blocks. The random assignment of the units to the treatments is carried out separately within each block Matched Pairs- A form of blocking in which each subject receives both treatments in a random order or the subjects are matched in pairs as closely as possible and one subject in each pair receives each treatment, determined at random.
Interpret r
Correlation measures the strength and direction of the linear relationship between x and y R is between -1 and 1 close to zero=very weak Positive r is positive correlation Negative r is negative correlation
Type 2 error
Failing to reject the null when it should be rejected
Interpret slope B of a LSRL
For every one unit change in the x variable the y variable is predicted to increase/de by ___ units
Chi squared df and expected counts
GOodness of fit: Df= # of categories - 1 Expected counts: Sample size times the hypothesized proportion in each category Homogeneity: Df: (# of rows-1)(#colums-1) Expected counts: (row total) (column total)/table total
Interpreting a Confidence Interval
I am __% confident that the interval from _ to __ captures the true _____
CLT
If the population distribution is normal the sampling distribution will also be normal with the same mean as the population. As N increases, the sampling distribution STD DEV will DECREASE If the population is not normal, the sampling distribution will become more and more normal as N increases As N increases STD DEV decreases
Interpreting a confidence interval
Intervals produced with this method will capture the true population _________ in about 95% of a possible samples of this same sample size from this same population
Two Sample T test, phrasing hints, null and alternative and conclusion
KEY PHRASE:DIFFERENCE IN THE MEANS Null:M1=M2 ALternative: M1-M2<0, >0 or not equal to 0 M1-M2= the difference between the mean____ for all __ and the mean____ for all ___ is ___ We do/do not have enough evidence at the .05 confidence level to conclude that the difference between the mean_____ and the mean____ is______
Paired T test
Key phrase: MEAN DIFFERENCE Same as 2 sample t test Mdiff=The mean difference in ___ for all ___ We do/dont have enough evidence at the .05 confidence level to conclude that the mean difference in ____ for all ___ is _____
Conditions for Inference for regression
Linear:True relationship between the variables is linear Independent: 10% rule Normal: Vary normally around the regression line for x Equal Variance: CLose to the regression line Random: Random sample or experiment
Power
Probability of rejection the null when the null is true
Conditions for counts or Chi squared tests
Random Large sample size: expected values are at least 5 Independent: Independent observations and independent samples or groups, 10 %
Inference for proportions conditions
Random Normal: Atleast 10 successes and failures in both groups, for a 2 sample problem Independent: Independent observations and independent samples/groups, or 10%
Inference for Means COnditions
Random Normal: Pop is normal or greater than 30 CLT Independent: Independent observations and independent samples/groups: 10% if sampling without replacement
Type 1 Error
Rejecting the null when the null is actually true
Describe or Compare the distributions
S:Shape O:Outliers C:Center S:Spread If it says compare, use comparison words like greater or less than for center and spread
Interpret SEb LSRL
SEb measure the standard deviation of the estimated slope for predicting the y variable from the x variable SEb measure how far the estimated slope will be from the true slope on average
SOCS
Shape: Skewness Outliers: are there ones? Center: Mean and Median SPread: Range, IQR, or standard deviation
Interpret Standard Deviation
Standard deviation measures spread by giving the typical or average distance that the observations (context) are away from their mean (context)
4 step SIgnificant tests
State: Hypothesis, SIgnificance level, parameters defined Plan: Check method and conditions DO: COmpute Conclude: Interpret result of your test in the context of the problem
Advantage of using Stratified Random Sample over an SRS
Stratified random sampling guarantees that each of the strata will be represented. When strata are chosen properly, a stratified random sample will produce better (less variable and more precise) info than the SRS of a sample size
Unbiased Estimator
The data is collected in such a way that there is no systematic tendency to overestimate or underestimate the true value of the pop parameter The mean of the sampling distribution equals the true value of the parameter being estimated
Interpreting Expected Value/Mean
The mean/expected value of a random variable is the long run average outcome of a random phenomenon carried out many times
Interpreting Probablity
The probability of any outcome of a random phenom is the proportion of times the outcome will occur in a very long series of repetitions. Probability is a long term relative frequency.
Bias
The systematic favoring of certain outcomes due to flawed sample selection, poor question wording, undercoverage or non response.
Outlier Rule
Upper: Q3+1.5(IQR) Lower Bound: Q1-1.5(IQR) IQR=Q3-Q1
Extrapolation
Using a LSRL to predict outside the domain of the explanatory variable
Large samples
WHen collected appropriately, large samples yield more precise results than small samples because in a large sample the values of the sample statistic tend to be closer to the true pop parameter
Interpret LSRL y intercept "a"
WHen the x variable (context) is zero, the y variable (context) is estimated to be ________
Interpret Y predicted
Y predicted is the estimated or predicted y value for a given x value
Can we generalize the results to the population of interest
Yes, if a large random sample was taken from the same pop we want draw conclusions about
Interpret r^2
_% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context)
Normal CDF
normalcdf(min,max,mean,std dev) Invnorm(areas to the left as a decimal, mean, std dev)
Interpret s
s is the standard deviation of the residuals It measures the typical distance between the actual y values and their predicted y values
Goal of blocking
to create groups of homogeneous experimental units Benefit: reduction of the effect of variation within the experimental units (context)