Algebra 2 Statistics Unit
Density curve
A curve that (a) is always on or above the horizontal axis, and (b) has exactly 1 area underneath it.
Block
A group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.
Regression line
A line that describes how a response variable y changes as an explanatory variable x changes; also known as "line of best fit"
Sampling Frame
A list of individuals from whom the sample is drawn
Stratified random sample
A method of sampling that involves dividing your population into homogeneous subgroups and taking a simple random sample in each subgroup. Internally homogeneous and externally heterogeneous.
Sample
A relatively small proportion of people who are chosen in a survey so as to be representative of the whole.
Cluster Sample
A sample in which a simple random sample of heterogeneous subgroups of a population is selected. Internally heterogeneous and externally homogeneous.
Simple Random Sample (SRS)
A sample of size n selected from the population in such a way that each possible sample of size n has an equal chance of being selected.
Voluntary Response Samples
A sample that consists of people who choose themselves by responding. They often over represent people with strong opinions. BIAS
Random sampling
A sample that fairly represents a population because each member has an equal chance of inclusion
Convenience Sample
A sample which consists of members of a population that are easily accessed. Generally leads to bias.
Census
A study that attempts to collect data from every individual in the population.
Observational study
A study that merely observes conditions of individuals in a population and records information; the population is disturbed as little as possible.
Lurking variable
A variable that has an important effect on the relationship among the variables in a study but is not one of the explanatory variables studied.
Explanatory variable
A variable that may help explain or influences changes in a response variable.
Response variable
A variable that measures an outcome of a study.
Positive association
Above-average values of one variable tend to accompany above-average values of the other, and below-average values also tend to occur together.
Negative association
Above-average values of one variable tend to accompany below-average values of the other, and vice versa.
Completely Randomized Design
All experimental units have an equal chance of receiving each of the treatments
The 68-95-99.7 Rule
Also known as the "Empirical Rule."
Sampling error
An error that occurs when a sample somehow does not represent the target population due to bad sampling methods and/or undercoverage
Factor
An explanatory variable in an experiment
Influential Point
An observation that if removed it would markedly change the result of the calculation.
Outlier
An observation that lies outside the overall pattern of the other observations.
Confidentiality
Any information gathered about a participant must not be revealed without the participants consent.
Random Assignment
Assigning participants to experimental and control conditions by chance, thus minimizing the effects of preexisting differences among those assigned to the different groups.
Mean of a density curve
Balance point
Response Bias
Bias that occurs when the behavior of the respondent or of the interviewer causes inaccurate results
Elements of Experimental Design
CONTROL, RANDOM ASSIGNMENT, AND REPLICATION
Describing a scatterplot
Can be described by the direction, form, and strength of the relationship.
Experiment
Deliberately imposes some treatment on individuals to measure their responses. Causality can be inferred if carried out well.
Replication
Enough units in each group so that any difference in the effects of the treatments can be distinguished from chance differences between the groups. Reduces sample variability
Median of a density curve
Equal areas point
Anonymity
Even the researcher cannot link participants to their data
Placebo effect
Experimental results are caused by expectations alone; double blindness is intended to mitigate this effect.
Randomized block design
Form blocks consisting of individuals that are similar in some way that is important to the response. Random assignment of treatments is then carried out separately within each block.
Standard Normal distribution
Has mean 0 and standard deviation 1
Control group
In an experiment, the group that is administered a placebo treatment (an active treatment) or no treatment; results are compared to the treatment group
Control
In an experiment, the standard that is used for comparison. Reduces lurking variables!
Normal Distribution
Is completely specified by two numbers, mean μ and standard deviation σ.
Least-squares regression line
Line that makes the sum of the squared vertical distances of the data points from the line as small as possible.
describing a distribution of quantitative data
SOCS (Shape-Outlier-Center-Spread)
Double Blind
This term describes an experiment in which neither the subjects nor the experimenter knows whether a subject is a member of the experimental group or the control group.
Cumulative relative frequency graph
Used to examine location within a distribution. Completed graph shows the accumulating percent of observations
Inference about cause and effect
Using experimental results to draw conclusions about causality
Inference about the population
Using sample data to draw conclusions about the population
Nonresponse
When the subjects refuse to cooperate or cannot be reached. This leads to non sampling bias.
μ (mu)
a population mean
statistical question
a question that can be answered by collecting data and where there will be variability in that data
Single Blind
a study in which the participants are unaware of whether they are in the control group or the experimental group
Explanatory Variable
a variable that we think explains or causes changes in the response variable
mean
arithmetic average, measure of center, NOT RESISTANT MEASURE OF CENTER, average value
boxplot
based on 5 number summary, useful for comparing distributions, shows spread of central half of distribution
conditional distributions
describes the values of that variable among individuals who have a specific value of another variable. Can be displayed with a SIDE-BY-SIDE BAR GRAPH or a SEGMENTED BAR GRAPH
distribution
describes what values the variable takes and how often it takes them
skewed to the right
if the right side of the graph with larger values is longer than the left
outlier
individual value that falls outside the overall pattern; it is an outlier if it is more than 1.5 x IQR above the third quarter or below the first quartile
dotplot
individual values on a number line; show distribution of a quantitative variable
standard deviation (s sub-x)
measures the average distance of the observations from their mean; measures spread about the mean, always greater or equal to 0, not resistant, use for reasonably symmetric distributions
IQR
measures the range of the middle 50% of the data; IQR=Q3-Q1; resistant
median (M)
midpoint of a distribution, typical value; in a skewed distribution, the mean is usually farther out
mode; modes
most frequent; major peaks
multimodal
multiple peaks
association
one of the variables tends to occur in common with specific values of the other
First Quartile (Q1)
one quarter up the list; resistant
two-way table
organizes data about two categorical variables; often used to summarize the large amounts of information by grouping outcomes into categories
histogram
plot the counts (frequencies) or percents (relative frequencies) of values in a equal-width classes; show distribution of a quantitative variable
stemplot
separate each observation into a stem and a one-digit leaf; show distribution of a quantitative variable
numerical summary
should report at least its center and spread, or variability
unimodal
single peak
range
subtract the smallest value from largest value
mean
the average
relative frequency table
the distribution of a categorical variable lists the categories and gives the percent of individuals that fall in each category
x-bar
the mean of a set of observations/sample (add their values and divide by the number of observations), use for reasonably symmetric distributions
symmetric
the right and left sides of the graph are symmetric
marginal distributions
the row totals and column totals
Experimental units
the smallest collection of individuals to which treatments are applied
Third Quartile (Q3)
three-quarters up the list; resistant
bimodal
two clear peaks
Correlation
Measures the direction and strength of the linear relationship between two quantitative variables.
five-number summary; summary of spread and center
Minimum, Q1, M, Q3, Maximum
Bias
Occurs when a study design favors some outcomes over others
Undercoverage
Occurs when some groups in the population are left out of the process of choosing the sample
Scatterplot
Plot that shows the relationship between two quantitative variables measured on the same individuals.
Statistically significant
Referring to a correlation, or a difference between two groups, that is larger than would be expected by chance alone.
Standardized values (z-scores)
Tells how many standard deviations a data point is from mean
Slope
The amount by which y is predicted to change when x increases by one unit.
Residual
The difference between an observed value of the response variable and the value predicted by the regression line.
Residual plot
The distribution of residuals; helps us assess how well a regression line fits the data.
Confounding
The effect of some variable on the response variable cannot be separated from the effect of the explanatory variable.
Population
The entire aggregation of individuals from which samples can be drawn
Matched Pair
The most extreme form of blocking. Subjects are matched in pairs as closely as possible and each subject in a pair is randomly assigned to receive one of the treatments.
y intercept
The predicted value of y when x = 0.
Extrapolation
The use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.
Predicted value
The value predicted by the regression model; read as "y hat"
Pth percentile
The value with P percent of the observations less than it.
pie charts, bar graphs
display the distribution of a categorical variable
frequency table
distribution of a categorical variable lists the categories and gives the count of individuals that fall in each category
