AP Stats Unit 1 & 2

Ace your homework & exams now with Quizwiz!

Variance

SD squared

describing a distribution

Shape, Outliers, Center, Spread (SOCS) +context

problem with segmented bar graph

Size of sample can differ

What does not work well for large data sets?

Stemplots

Strength

Strong: very close together Moderate: Not as close Weak: spread out, far apart

Zero

Sum of residuals

Skewed Left Distribution

Tail on the left. Mean less than the median.

correlation coefficient equation

r= 1/n-1 r is not resistant to ouliers r is not affected by changes in scale or center r=0 represents no correlation

Spread

range, IQR, standard deviation

Median/IQR

resistant to outliers

Standard Deviation of Residuals equation

s=square root of y1-y hat/n-2

show categorial variables

segmented bar graph, mosaic plot, pie chart

Boxplots (box-and-whisker plots)

showing data through quartiles

Describe Shape of distribution

skewness, symmetry, unimodal,bimodal,multmodal, approximately normal, bell curve

standard deviation

spread;a measure of variability that describes an average distance of every score from the mean

extrapolation

the act of estimation by projecting known information

z-score

a measure of how many standard deviations you are away from the norm (average or mean)

Minitab

A statistical package to perform statistical analysis Designed to perform analysis as accurately as possible

Frequency

(counts) amount of times something occurs

What is a categorical variable?

A characteristic of an individual that takes on values that are names or labels. Shape is irrelevant. Ex. type of fruit.

Stemplot

A graphical representation of a quantitative data set. Leading values of each data point are presented as stems and second digits are given as leaves.

Power Transformation

A transformation in which a power/exponent is chosen, and then each original value is raised to that power to obtain the corresponding transformed value. Do NOT pick 0 as the exponent as that would make every value 1, and an exponent of 1 is NOT a transformation either.

What is a discrete variable?

A variable that can take only specific values in a given range. Ex. # of students in a class

explanatory variable (independent variable)

A variable that may explain changes in a second variable, or a variable that contains information that you have. (x)

normal distribution (bell curve)

Bell-shaped curve Absolutely symmetrical Central Tendency: mode, mean, median? Mean of 0 and SD of 1.

How to read Histograms?

Bins only include left value and does not have to start at 0.

Larger SD

Data is farther apart

Describing a relationship between two variables

Direction, unusually features, form, and strength (dufs)

marginal distribution

Distribution of values of that variable among all individuals described by the table.

Ways to show quantitative data

Dotplots, stemplots, box and whisker plots, histograms

Dotplots

Each data is shown as a dot above its location on a number line.

Empirical Rule (68-95-99.7) Rule

Gives benchmarks for understanding how probability is distributed under a normal curve. In the normal distribution, 68% of the observations are within one standard deviation of the mean, 95% is within two standard deviations of the mean, and 99.7% is within three standard deviations of the mean.

segmented bar graph

Graph used to compare the distribution of a categorical variable in each of several groups. For each group, there is a single bar with "segments" that correspond to the different values of the categorical variable. The height of each segment is determined by the percent of individuals in the group with that value. Each bar has a total height of 100%.

Influential Point

If removed, substantial change a, b, and/or r.

normalcdf

Input: Z-score or variable value Output: area or probability

InvNorm

Input: area or probability Output: z-score or variable value

residual

Left over; remaining difference of observed and expected value, negative residual: below the line of best fit Positive residual: above the line of best fit

Does association imply causation?

No, only an experiment can show causation

Unusual Features

Outliers, high leverage, influential points

Relative Frequency

Percentage/Proportion

Direction

Positive or Negative

Correlation

Positive:goes up Negative: goes down

Lower outlier

Q1-(1.5*IQR)

Higher outlier

Q3+(1.5*IQR)

Equation for IQR

Q3-Q1

IQR (interquartile range)

Q3-Q1 (middle 50%)

What is a quantitative variable?

Quantitative variables are numeric like: Height, age, number of cars sold, SAT score

y-intercept

The predicted value of y when x=0.

cumulative frequency

The sums of the frequencies of the data values from smallest to largest.

response variable (dependent variable)

The variable that shows the value you want to predict. (y)

Nonlinear relationships

Two relationships may be related but not linear, in this case they will be curvilinear

histograms

Used when data is continuous The bars touch each other, shows frequency distributions

Scattorplot

a graphed cluster of plots; the slop of the points, each of which represents the values of two variables; the direction of the relationship between the two variables

coefficient of determination

a measure of the amount of variation in the dependent variable about its mean that is explained by the regression equation. This explains what percent of the data is explains/acccount for by the LSRL.

residual plot

a scatterplot of the regression residuals (y) against the explanatory variable (x)

What is a continuous variable?

a variable that can take on an infinite range of values along a specified continuum. (height)

socsC

always use context

Transformation

applies a math operation to a variable

Slope Interpretation of Residuals

change of y hat over the change of x or b/1

Smaller SD

data is closer together

High Leverage

data points whose x values are far from the mean of x

Conditional Distribution

describes the values of that variable among individuals who have a specific value of another variable

Outlier in a Scatterplot

doesn't follow trend, large residual

Roundoff error

effect of rounding off results

skewed right distribution

has a majority of data values on the left; best described by the median

association

knowing the value of one variable helps predict the value of the other

Relationship between residual plot and linear model

linear model is appropriate if there is no clear pattern ( if it shows quadratic, the linear model is NOT appropriate to use; not a good model to choose)

What is the form of the relationship?

linear or nonlinear

Center

mean, median, mode

Mean/Standard Deviation

nonresistant to outliers

residual equation

observed y - predicted y ; y-y hat

standard deviation formula

the square root of the variance

SD of residuals/Typical prediction error

the sum of residuals is typically zero. This value is typically (s) units away from the units predicted by the LSRL with x=some units. The smaller the number is the better prediction it will be.

Mosaic plot (segmented bar charts)

used to show the size of the sample

Predicted value

what does the hat above the y represent?

What is a distribution?

what values variables takes and how often it takes these values

Simpson's Paradox

when averages are taken across different groups, they can appear to contradict the overall averages

Mean of a sample

LSRL (Least Squares Regression Line)

y-hat = a + bx; a is y-intercept and b is slope

Mean of population

μ

Standard Deviation of population

σ


Related study sets

- AMT - Airframe - Aircraft Landing Gear Systems

View Set

Human Resource management Ch 9&10

View Set

ch 10- PPE acquisition and disposition

View Set

Texas Promulgated Contract Forms - Chapter 3

View Set

History of Interior Design Final Exam 19'

View Set