AP Statistics S.1 Final Review

¡Supera tus tareas y exámenes ahora con Quizwiz!

how to find values from areas in any Normal distribution (2.2)

1. State the distribution and the values of interest 2. Perform calculations - show your work -use Table A or tech to find value of z and "unstandardize" -use the invNorm command, label each input 3. Answer the question

how to find areas in any Normal distribution (2.2)

1. State the distribution and the values of interest 2. Perform calculations - show your work -compute z-score and use Table A -use normalcdf command, label each input 3. Answer the question

interquartile range, IQR (1.3)

_____ = Q3-Q1

multimodal (1.2)

a distribution that has more than two clear peaks

bimodal (1.2)

a distribution that has two clear peaks

cumulative relative frequency graph (2.1)

a graph used to examine location within a distribution. _____________________ begin by grouping the observations into equal-width classes. The completed graph shows the accumulating percent of observations as you move through the classes in increasing order

cumulative relative frequency graph (1.1)

a graph used to examine location within a distribution; begin by grouping the observations into equal-width classes; the completed graph shows the accumulating percent of observations as you move through the classes in increasing order

regression line (3.2)

a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x

splitting stems (1.2)

a method for spreading out a stemplot that has too few stems

scatterplot (3.1)

a plot that shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point in the graph

standard deviation (1.3)

a statistic that measures the typical distance of the values in a distribution from the mean. It is calculated by finding an "average" of the squared distances and then taking the square root

experiment (4.2)

a study in which researchers deliberately impose treatments on individuals to measure their responses

census (4.1)

a study that attempts to collect data from every individual in the population

observational study (4.2)

a study that observes individuals and measures variables of interest but does not attempt to influence the responses

sample survey (4.1)

a study that uses an organized plan to choose a sample that represents some specific population. We base conclusions about the population on data from the sample

sample (4.1)

a subset of individuals in the population from which we actually collected data

frequency table (1.1)

a table that displays the count (frequency) of observations in each category or class

outlier (1.2)

an individual value that falls outside the overall pattern

subjects (4.2)

experimental units that are human beings

factors (4.2)

explanatory variables in an experiment

first quartile, Q1 (1.3)

if the observations in a data set are ordered from lowest to highest, the ______________ is the median of the observations who position is to the left of the median

standard deviation of the residuals (3.2)

if we use a least-squares regression line to predict the values of a response variable y from an explanatory variable x, the ________________________ (s) is given by the formula pictured. This value gives the approximate size of a "typical" prediction error (residual)

standardized score (z-score) (2.1)

if x is an observation from a distribution that has known mean and standard deviation, the ______________ of x is... (refer to image)

third quartile, Q3 (1.3)

in a data set in which the observations are ordered from lowest to highest, the median of the observations whose position is to the right of the median

population (4.1)

in a statistical study, the entire group of individuals we want information about

the coefficient of determination (r^2) (3.2)

the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x

Least-squares regression line (LSRL) (3.2)

the line that makes the sum of the squared vertical distances of the data points from the line as small as possible

range (1.3)

the maximum value minus the minimum value for a set of quantitative data

variance (1.3)

"average" squared deviation of the observations in a data set from their mean

conditional distribution (1.1)

(of a variable) describes the values of that variable among individuals who have a specific value of another variable; there is a separate ___________ __________________ for each value of the other variable

marginal distribution (1.1)

(of one of the categorical variables in the two-way table of counts) is the distribution of values of that variable among all individuals described by the table

How do you explore data? (1.1)

1) begin by examining each variable by itself 2) study relationships among the variables 3) start with a graph or graphs 4) add numerical summaries

pie chart (1.1)

a chart that shows the distribution of a categorical variable as a wedge whose slices are sized by the counts or percents for the categories; must include all the categories that make up the whole

density curve (2.2)

a curve that a) is always on or above the horizontal axis and b) has area exactly 1 underneath it. A ______________ describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval

Normal distribution (2.2)

a distribution described by a Normal density curve. Any particular ______________ is completely specified by two numbers, its mean mew and standard deviation sigma. The mean of a ______________________ is at the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side. We abbreviate the _______________ with mean mew and standard deviation sigma as N(mew, sigma)

boxplot (1.3)

a graph of the five-number summary. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol such as an asterisk (*)

histogram (1.2)

a graph that displays the distribution of a quantitative variable. The horizontal axis is marked in the units of measurement for the variable. The vertical axis contains the scale of counts or percents. Each bar int he graph represents an equal width class. The base of the bar covers the class, and the bar height is the class frequency or relative frequency

segmented bar graph (1.1)

a graph used to compare the distribution of a categorical variable in each of several groups. For each group, there is a single bar with "segments" that correspond to the different values of the categorical variable. The height of each segment is determined by the percent of individuals int he group with that value. Each bar has a total height of 100%.

bar graph (1.1)

a graph used to display the distribution of a categorical variable or to compare the sizes of different quantities; horizontal axis identifies the categories or quantities being compared; drawn with blank spaces between the bars to separate the items being compared

Normal probability plot (2.2)

a plot used to assess whether a data set follows a Normal distribution. To make a _________________, 1) arrange the data values from smallest to largest and record the percentile of each observation 2) use the standard Normal distribution to find the z-scores at these same percentiles, and 3) plot each observation x against the corresponding z. If the point son a _______________ lie close to a straight line, the plot indicates that the data are approx. Normal

Simple Random Sample (SRS) (4.1)

a sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample

convenience sample (4.1)

a sample collected by taking from the population individuals that are easy to reach

cluster sample (4.1)

a sample obtained by classifying the population into groups of individuals that are located near each other, called clusters, and then choosing an SRS of the clusters. All individuals in the chosen clusters are included in the sample

stratified random sample (4.1)

a sample obtained by classifying the population into groups of similar individuals, called strata, then choosing a separate SRS in each stratum and combining these SRSs to form the sample

residual plot (3.2)

a scatterplot of the residuals against the explanatory variable. ________________ help us assess whether a linear model is appropriate

dotplot (1.2)

a simple graph that shows each data value as a dot above a location on a number line

stemplot or stem-and-leaf plot (1.2)

a simple graphical display for fairly small data sets that gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. Each observation is separated into a stem, consisting of all but the final digit, and a leaf, the final digit. The stems are arranged in a vertical column with the smallest at the top. Each leaf is written in the row to the right of its stem, with the leaves arranged in increasing order out from the stem

treatment (4.2)

a specific condition applied to the individuals in an experiment. If an experiment has several explanatory variables, a _________ is a combination of specific values of these variables

resistant measure (1.3)

a statistic that is not affected very much by extreme observations

effect of adding/subtracting a constant (2.1)

adds a to (subtracts a from) measures of center and location (mean, median, quartiles, percentiles, but does not change the shape of the distribution or measures of spread (range, IQR, standard deviation)

random assignment (4.2)

an experimental design principle. Use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variables among the treatment groups

replication (4.2)

an experimental design principle. Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups

Normal curves (2.2)

an important class of density curves that are symmetric, single-peaked, and bell-shaped

outlier (1.3)

an individual value that falls outside the overall pattern of a distribution

variable (intro)

any characteristic of an individual; can take different values for different individuals

mean (x bar) (1.3)

arithmetic average. To find the _________ of a set of observations, add their values and divide by the number of observations

two-way table (1.1)

describes two categorical variables with a row variable and a column variable

inference (4.1)

drawing conclusions that go beyond the data at hand

describing a scatterplot (3.1)

in any graph of the data, look for the overall pattern and for striking departures from that pattern. Direction, form, and strength ________ the overall pattern of a ______________

the 68-95-99.7 rule (empirical rule) (2.2)

in the Normal distribution with mean (mew) and standard deviation (sigma), (a) approx. ____% of the observations fall within one sigma of the mew, (b) approx. ____% of the observations fall within 2 sigma of mew, and (c) approx. ____% of the observations fall within 3 sigma of mew.

association (1.1)

knowing the value of one variable helps predict the value of the other. If knowing the value of one variable does not help predict the value of the other, there is no ____________ between the variables

center (1.2)

mean, median

correlation r (3.1)

measures the direction and strength of the linear relationship between two quantitative variables. __________ is usually written as r.

effect of multiplying/dividing by a constant (2.1)

multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b, multiplies (divides) measures of spread (range, IQR, standard deviation) by b, but does not change the shape of the distribution

outliers and influential observations in regression (3.2)

observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals

nonresponse (4.1)

occurs when an individual chosen for the sample can't be contacted or refuses to participate

undercoverage (4.1)

occurs when some members of the population cannot be chosen in a sample

pictograph (1.1)

one of the worst ways to represent data where our eyes respond to the area of the pictures rather than the scales

voluntary response sample (4.1)

people decide whether to join a sample by responding to a general invitation

categorical variable (intro)

places an individual into one of several groups or categories

experimental units (4.2)

smallest collection of individuals to which treatments are applied

five-number summary (1.3)

smallest observation, first quartile, median, third quartile, and largest observation, written in order from smallest to largest. In symbols: Minimum Q1 Median Q3 Maximum

spread (1.2)

standard deviation, IQR, range

slope (b) (3.2)

suppose that y is a response variable and x is an explanatory variable. A regression line relating y to x has an equation of the form y hat = a + bx. In this equation, b is the _________ the amount by which y is predicted to change when x increases by one unit

y-intercept (a) (3.2)

suppose that y is a response variable and x is an explanatory variable. A regression line relating y to x has an equation of the form y hat = a + bx. In this equation, the number a is the _____________, the predicted value of y when x = 0

shape (1.2)

symmetric, skewed right, skewed left

the standard normal table (Table A) (2.2)

table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z

quantitative variable (intro)

takes numerical values for which it makes sense to find an average

distribution of a variable (intro)

tells us what values the variable takes and how often it takes these values

predicted value (y hat) (3.2)

the __________________ of the response variable y for a given value of the explanatory variable x

bias (4.1)

the design of a statistical study shows ______ if it would consistently underestimate or consistently overestimate the value you want to know

residual (3.2)

the difference between an observed value of the response variable and the value predicted by the regression line residual = observed y - predicted y = y - y hat

median (1.3)

the midpoint of a distribution; the number such that about half of the observations are smaller and about half are larger. To find the _______ of a distribution: 1) arrange all observations in order of size, from smallest to largest, 2) if the number of observations n is odd, the _______ is the center observation in the ordered list, 3) if the number of observations n is even, the ________ is the average of the two center observations in the order list

individuals (intro)

the objects described by a set of data; may be people, animals, or things

mean of a density curve (2.2)

the point at which a density curve would balance if made of solid material

median of a density curve (2.2)

the point with half the area under the curve to its left and the remaining half of the area to its right

percentile (2.1)

the pth ____________ of a distribution is the value with p percent of the observations less than it

variability (of a statistic) (1.3)

the spread of a statistic's sampling distribution. Statistics from larger samples have less ___________

explanatory variable (3.1)

the variable that may help explain or predict changes in a response variable

response variable (3.1)

the variable that measures the outcome of a study

extrapolation (3.2)

use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate

random sampling (4.1)

using a chance process to determine which members of a population are included in the sample

how to calculate the LSRL (3.2)

we have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means x bar and y bar and the standard deviations s sub x and s sub y of the two variables and their correlation r. To find slope, use the formula with the calculated standard deviations. To find the y-intercept, use the formula with the calculated means.

negative association (3.1)

when above-average values of one variable tend to accompany below-average values of the other

positive association (3.1)

when above-average values of one variable to accompany above-average values of the other and also of below-average values to occur together

confounding (4.2)

when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other

facts about correlation (3.1)

1. correlation makes no distinction between explanatory and response variables 2. r does not change when we change the units of measurement of x, y, or both 3. the correlation r itself has no unit of measurement 4. correlation doesn't imply causation 5. correlation requires that both variables be quantitative 6. correlations doesn't describe curved relationships between variables, no matter how strong the relationship is 7. a value of r close to 1 or -1 doesn't guarantee a linear relationship between two variables 8. like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations 9. correlation isn't a complete summary of two-variable data

What acronym do you use to describe the distribution of a quantitative variable? (1.2)

SOCS


Conjuntos de estudio relacionados

Political Science 103 Final Spring 16

View Set

ARDS LIppencott/Priority & New Priority

View Set

Principles of HACCP: Identifying Critical Control Points Assessment

View Set

bi 240 Unit 1 Review Guide part 2, Community and Ecosystem Ecology, Population Genetics, Kingdom Fungi, Bio 240: Evolution, Classification and Domain Eukarya 1 "Protists", Bio 240: Plant Evolution and Life Cycles

View Set

Varcarolis- Chapter 17 Somatic Symptom Disorders 1

View Set

Musculoskeletal Spine and TMJ Test 2

View Set