AP Statistics S.1 Final Review
how to find values from areas in any Normal distribution (2.2)
1. State the distribution and the values of interest 2. Perform calculations - show your work -use Table A or tech to find value of z and "unstandardize" -use the invNorm command, label each input 3. Answer the question
how to find areas in any Normal distribution (2.2)
1. State the distribution and the values of interest 2. Perform calculations - show your work -compute z-score and use Table A -use normalcdf command, label each input 3. Answer the question
interquartile range, IQR (1.3)
_____ = Q3-Q1
multimodal (1.2)
a distribution that has more than two clear peaks
bimodal (1.2)
a distribution that has two clear peaks
cumulative relative frequency graph (2.1)
a graph used to examine location within a distribution. _____________________ begin by grouping the observations into equal-width classes. The completed graph shows the accumulating percent of observations as you move through the classes in increasing order
cumulative relative frequency graph (1.1)
a graph used to examine location within a distribution; begin by grouping the observations into equal-width classes; the completed graph shows the accumulating percent of observations as you move through the classes in increasing order
regression line (3.2)
a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x
splitting stems (1.2)
a method for spreading out a stemplot that has too few stems
scatterplot (3.1)
a plot that shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point in the graph
standard deviation (1.3)
a statistic that measures the typical distance of the values in a distribution from the mean. It is calculated by finding an "average" of the squared distances and then taking the square root
experiment (4.2)
a study in which researchers deliberately impose treatments on individuals to measure their responses
census (4.1)
a study that attempts to collect data from every individual in the population
observational study (4.2)
a study that observes individuals and measures variables of interest but does not attempt to influence the responses
sample survey (4.1)
a study that uses an organized plan to choose a sample that represents some specific population. We base conclusions about the population on data from the sample
sample (4.1)
a subset of individuals in the population from which we actually collected data
frequency table (1.1)
a table that displays the count (frequency) of observations in each category or class
outlier (1.2)
an individual value that falls outside the overall pattern
subjects (4.2)
experimental units that are human beings
factors (4.2)
explanatory variables in an experiment
first quartile, Q1 (1.3)
if the observations in a data set are ordered from lowest to highest, the ______________ is the median of the observations who position is to the left of the median
standard deviation of the residuals (3.2)
if we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the ________________________ (s) is given by the formula pictured. This value gives the approximate size of a "typical" prediction error (residual)
standardized score (z-score) (2.1)
if x is an observation from a distribution that has known mean and standard deviation, the ______________ of x is... (refer to image)
third quartile, Q3 (1.3)
in a data set in which the observations are ordered from lowest to highest, the median of the observations whose position is to the right of the median
population (4.1)
in a statistical study, the entire group of individuals we want information about
the coefficient of determination (r^2) (3.2)
the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x
Least-squares regression line (LSRL) (3.2)
the line that makes the sum of the squared vertical distances of the data points from the line as small as possible
range (1.3)
the maximum value minus the minimum value for a set of quantitative data
variance (1.3)
"average" squared deviation of the observations in a data set from their mean
conditional distribution (1.1)
(of a variable) describes the values of that variable among individuals who have a specific value of another variable; there is a separate ___________ __________________ for each value of the other variable
marginal distribution (1.1)
(of one of the categorical variables in the two-way table of counts) is the distribution of values of that variable among all individuals described by the table
How do you explore data? (1.1)
1) begin by examining each variable by itself 2) study relationships among the variables 3) start with a graph or graphs 4) add numerical summaries
pie chart (1.1)
a chart that shows the distribution of a categorical variable as a wedge whose slices are sized by the counts or percents for the categories; must include all the categories that make up the whole
density curve (2.2)
a curve that a) is always on or above the horizontal axis and b) has area exactly 1 underneath it. A ______________ describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval
Normal distribution (2.2)
a distribution described by a Normal density curve. Any particular ______________ is completely specified by two numbers, its mean mew and standard deviation sigma. The mean of a ______________________ is at the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side. We abbreviate the _______________ with mean mew and standard deviation sigma as N(mew, sigma)
boxplot (1.3)
a graph of the five-number summary. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol such as an asterisk (*)
histogram (1.2)
a graph that displays the distribution of a quantitative variable. The horizontal axis is marked in the units of measurement for the variable. The vertical axis contains the scale of counts or percents. Each bar int he graph represents an equal width class. The base of the bar covers the class, and the bar height is the class frequency or relative frequency
segmented bar graph (1.1)
a graph used to compare the distribution of a categorical variable in each of several groups. For each group, there is a single bar with "segments" that correspond to the different values of the categorical variable. The height of each segment is determined by the percent of individuals int he group with that value. Each bar has a total height of 100%.
bar graph (1.1)
a graph used to display the distribution of a categorical variable or to compare the sizes of different quantities; horizontal axis identifies the categories or quantities being compared; drawn with blank spaces between the bars to separate the items being compared
Normal probability plot (2.2)
a plot used to assess whether a data set follows a Normal distribution. To make a _________________, 1) arrange the data values from smallest to largest and record the percentile of each observation 2) use the standard Normal distribution to find the z-scores at these same percentiles, and 3) plot each observation x against the corresponding z. If the point son a _______________ lie close to a straight line, the plot indicates that the data are approx. Normal
Simple Random Sample (SRS) (4.1)
a sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample
convenience sample (4.1)
a sample collected by taking from the population individuals that are easy to reach
cluster sample (4.1)
a sample obtained by classifying the population into groups of individuals that are located near each other, called clusters, and then choosing an SRS of the clusters. All individuals in the chosen clusters are included in the sample
stratified random sample (4.1)
a sample obtained by classifying the population into groups of similar individuals, called strata, then choosing a separate SRS in each stratum and combining these SRSs to form the sample
residual plot (3.2)
a scatterplot of the residuals against the explanatory variable. ________________ help us assess whether a linear model is appropriate
dotplot (1.2)
a simple graph that shows each data value as a dot above a location on a number line
stemplot or stem-and-leaf plot (1.2)
a simple graphical display for fairly small data sets that gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. Each observation is separated into a stem, consisting of all but the final digit, and a leaf, the final digit. The stems are arranged in a vertical column with the smallest at the top. Each leaf is written in the row to the right of its stem, with the leaves arranged in increasing order out from the stem
treatment (4.2)
a specific condition applied to the individuals in an experiment. If an experiment has several explanatory variables, a _________ is a combination of specific values of these variables
resistant measure (1.3)
a statistic that is not affected very much by extreme observations
effect of adding/subtracting a constant (2.1)
adds a to (subtracts a from) measures of center and location (mean, median, quartiles, percentiles, but does not change the shape of the distribution or measures of spread (range, IQR, standard deviation)
random assignment (4.2)
an experimental design principle. Use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variables among the treatment groups
replication (4.2)
an experimental design principle. Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
Normal curves (2.2)
an important class of density curves that are symmetric, single-peaked, and bell-shaped
outlier (1.3)
an individual value that falls outside the overall pattern of a distribution
variable (intro)
any characteristic of an individual; can take different values for different individuals
mean (x bar) (1.3)
arithmetic average. To find the _________ of a set of observations, add their values and divide by the number of observations
two-way table (1.1)
describes two categorical variables with a row variable and a column variable
inference (4.1)
drawing conclusions that go beyond the data at hand
describing a scatterplot (3.1)
in any graph of the data, look for the overall pattern and for striking departures from that pattern. Direction, form, and strength ________ the overall pattern of a ______________
the 68-95-99.7 rule (empirical rule) (2.2)
in the Normal distribution with mean (mew) and standard deviation (sigma), (a) approx. ____% of the observations fall within one sigma of the mew, (b) approx. ____% of the observations fall within 2 sigma of mew, and (c) approx. ____% of the observations fall within 3 sigma of mew.
association (1.1)
knowing the value of one variable helps predict the value of the other. If knowing the value of one variable does not help predict the value of the other, there is no ____________ between the variables
center (1.2)
mean, median
correlation r (3.1)
measures the direction and strength of the linear relationship between two quantitative variables. __________ is usually written as r.
effect of multiplying/dividing by a constant (2.1)
multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b, multiplies (divides) measures of spread (range, IQR, standard deviation) by b, but does not change the shape of the distribution
outliers and influential observations in regression (3.2)
observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals
nonresponse (4.1)
occurs when an individual chosen for the sample can't be contacted or refuses to participate
undercoverage (4.1)
occurs when some members of the population cannot be chosen in a sample
pictograph (1.1)
one of the worst ways to represent data where our eyes respond to the area of the pictures rather than the scales
voluntary response sample (4.1)
people decide whether to join a sample by responding to a general invitation
categorical variable (intro)
places an individual into one of several groups or categories
experimental units (4.2)
smallest collection of individuals to which treatments are applied
five-number summary (1.3)
smallest observation, first quartile, median, third quartile, and largest observation, written in order from smallest to largest. In symbols: Minimum Q1 Median Q3 Maximum
spread (1.2)
standard deviation, IQR, range
slope (b) (3.2)
suppose that y is a response variable and x is an explanatory variable. A regression line relating y to x has an equation of the form y hat = a + bx. In this equation, b is the _________ the amount by which y is predicted to change when x increases by one unit
y-intercept (a) (3.2)
suppose that y is a response variable and x is an explanatory variable. A regression line relating y to x has an equation of the form y hat = a + bx. In this equation, the number a is the _____________, the predicted value of y when x = 0
shape (1.2)
symmetric, skewed right, skewed left
the standard normal table (Table A) (2.2)
table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z
quantitative variable (intro)
takes numerical values for which it makes sense to find an average
distribution of a variable (intro)
tells us what values the variable takes and how often it takes these values
predicted value (y hat) (3.2)
the __________________ of the response variable y for a given value of the explanatory variable x
bias (4.1)
the design of a statistical study shows ______ if it would consistently underestimate or consistently overestimate the value you want to know
residual (3.2)
the difference between an observed value of the response variable and the value predicted by the regression line residual = observed y - predicted y = y - y hat
median (1.3)
the midpoint of a distribution; the number such that about half of the observations are smaller and about half are larger. To find the _______ of a distribution: 1) arrange all observations in order of size, from smallest to largest, 2) if the number of observations n is odd, the _______ is the center observation in the ordered list, 3) if the number of observations n is even, the ________ is the average of the two center observations in the order list
individuals (intro)
the objects described by a set of data; may be people, animals, or things
mean of a density curve (2.2)
the point at which a density curve would balance if made of solid material
median of a density curve (2.2)
the point with half the area under the curve to its left and the remaining half of the area to its right
percentile (2.1)
the pth ____________ of a distribution is the value with p percent of the observations less than it
variability (of a statistic) (1.3)
the spread of a statistic's sampling distribution. Statistics from larger samples have less ___________
explanatory variable (3.1)
the variable that may help explain or predict changes in a response variable
response variable (3.1)
the variable that measures the outcome of a study
extrapolation (3.2)
use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate
random sampling (4.1)
using a chance process to determine which members of a population are included in the sample
how to calculate the LSRL (3.2)
we have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means x bar and y bar and the standard deviations s sub x and s sub y of the two variables and their correlation r. To find slope, use the formula with the calculated standard deviations. To find the y-intercept, use the formula with the calculated means.
negative association (3.1)
when above-average values of one variable tend to accompany below-average values of the other
positive association (3.1)
when above-average values of one variable to accompany above-average values of the other and also of below-average values to occur together
confounding (4.2)
when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other
facts about correlation (3.1)
1. correlation makes no distinction between explanatory and response variables 2. r does not change when we change the units of measurement of x, y, or both 3. the correlation r itself has no unit of measurement 4. correlation doesn't imply causation 5. correlation requires that both variables be quantitative 6. correlations doesn't describe curved relationships between variables, no matter how strong the relationship is 7. a value of r close to 1 or -1 doesn't guarantee a linear relationship between two variables 8. like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations 9. correlation isn't a complete summary of two-variable data
What acronym do you use to describe the distribution of a quantitative variable? (1.2)
SOCS