Stats Flashcards 1-2
Experimental Designs
1. CRD (Completely Randomized Design) - All experimental units are allocated at random among all treatments 2. RBD (Randomized Block Design) - Experimental units are put into homogeneous blocks. The random assignment of the units to the treatments is carried out separately within each block. 3. Matched Pairs - A form of blocking in which each subject receives both treatments in a random order or the subjects are matched in pairs as closely as possible and one subject in each pair receives each treatment, determined at random.
Interpreting a Residual Plot
1. Is there a curved pattern? If so, a linear model may not be appropriate. 2. Are the residuals small in size? If so, predictions using the linear model will be fairly precise. 3. Is there increasing (or decreasing) spread? If so, predictions for larger (smaller) values of x will be more variable.
Sampling Techniques
1. SRS - Number the entire population, draw numbers form a hat (every set of n individuals has equal chance of selection) 2. Stratified - Split the population into homogeneous groups, select an SRS from each group. 3. Cluster - Split the population into heterogeneous groups called clusters, and randomly select whole clusters for the sample. 4. Census - An attempt to reach the entire population 5. Convenience - Selects individuals easiest to reach 6. Voluntary Response - People choose themselves by responding to a general appeal
Linear Transformations
Adding "a" to every member of a data set adds "a" to the measures of position, but does not change the measures of spread or the shape. Multiplying every member of a data set by "b" multiples the measures of positions but "b" and multiplies most measures of spread by |b|, but does not change the shape.
Interpret r
Correlation measures the strength and direction of the linear relationship between x and y. - r is always between -1 and 1. - Close to zero = very weak - Close to 1 ot -1 = stronger - Exactly 1 or -1 = perfectly straight line - Positive r = positive correlation - Negative r = negative correlation
Interpret LSRL Slope "b"
For every one unit change in the x variable (context) the y variable (context) is predicted to increase/decrease by ____ units (context).
Outlier Rule
Upper Bound = Q3 + 1.5(IQR) Lower Bound = Q1 - 1.5(IQR) IQR = Q3 - Q1
Extrapolation
Using a LSRL to predict outside the domain of the explanatory variable. (Can lead to ridiculous conclusions if the current linear trend does not continue)
What is an Outlier?
When given 1 variable data: An outlier is any value that falls more than 1.5(IQR) above Q3 or below Q1 Regression Outlier: Any value that falls outside the pattern of the rest of the data.
Interpret LSRL y-intercept "a"
When the x variable (context) is zero, the y variable (context) is estimated to be _____.
Interpret r^2
__% of the variation in y (context) is accounted for by the LSRL of y (context) on x (context) OR __% of the variation in y (context) is accounted for by using the linear regression model with x (context) as the explanatory variable
Interpret LSRL "s"
s = ____ is the standard deviation of the residuals. It measures the typical distance between the actual y-values (context) and their predicted y-values (context).
Interpret LSRL "y-hat"
y-hat is the "estimated" or "predicted" y-value (context) for a given x-value (context)
Interpret a z-score
z = (value - mean) / standard deviation A z-score describes how many standard deviations a value or statistic falls away from the mean of the distribution and in what direction. The further the z-score is away from zero the more "surprising" the value of the statistic is.
Using Normalcdf and Invnorm (Calculator Tips)
Normalcdf (min, max, mean, standard deviation) Invnorm (area to the left as a decimal, mean, standard deviation)
What is a Residual?
Residual = y - y-hat A residual measures the difference between the actual (observed) y-value in a scatterplot and the y-value that is predicted by the LSRL using its corresponding x value. In the calculator: L3 = L2 - Y1(L1)
Interpret LSRL "SEb"
SEb measures the standard deviation of the estimated slope for predicting the y variable (context) from the x variable (context). SEb measures how far the estimated slope will be from the true slope, on average.
Describe the Distribution OR Compare the Distribution
SOCS! Shape, Outliers, Center, Spread Only discuss outliers if there are obviously outliers present. Be sure to address SCS in context! If it says "Compare": YOU MUST USE comparison phrases like "is greater than" or "is less than" for Center & Spread
SOCS
Shape - Skewed Left (Mean < Median) Skewed Right (Mean > Median) Fairly Symmetric (Mean = Median) Outliers - Discuss them if there are obvious ones Center - Mean or Median Spread - Range, IQR, or Standard Deviation
Interpret Standard Deviation
Standard Deviation measures spread by giving the "typical" or "average" distance that the observations (context) are away from their (context) mean