AP Stats Uths vocab
Geometric random variable
called a geometric random variable. event. The process of repeating a random Used to find the number of trials required to reach the first success in a random event until your first success occurs is
Independent events
event happened does not affect the likelihood of the other event happening. Two or more random events do not affect each other's probabilities. Knowing one
Confounding variable
A confounding variable is some third variable whose effect on the response variable cannot be separated from the effects of the explanatory variable.
Continuous variable
A continuous variable can take on an INFINITE amount of values between a minimum and maximum.
Discrete variable
A discrete variable can only take on a FINITE amount of values between a minimum and maximum.
Skewed
A distribution that is asymmetric. If there is a longer tail to the right of the center, the distribution is said to be skewed right and if there is a longer tail to the left of the center, the distribution is said to be skewed left.
Symmetric
A distribution with data values distributed equally above and below the center
Placebo group
A group in an experiment that receives a "placebo" rather than an actual treatment.
Control group
A group that does not receive treatment but is still measured.
Lurking variable
A lurking variable is one that is not included in a study, but may still have some effect on the other variables. A lurking variable might be completley unknown and its effects unsuspected by the experimenters
Statistic
A measure that describes a sample, A statistic is usually not denoted by a Greek letter.
Parameter
A measure that describes an entire population. Parameters are often denoted by a Greek letter.
Correlation
A mutual relationship between two variables. Note that just knowing that two variables have a strong correlation does not mean that one caused the other.
Cluster sampling
A naturally occurring heterogeneous group is selected and the entire group, or randomly selcted memvers are used in the sample
Non-resistant
A non-resistant measure is a variable that cannot resist the influence of an extreme value.
Probability
A number between 0 and 1 that describes the proportion of times a particular outcome should occur. This is typically written as a reduced fraction or a %. The probability of some event X is usually written as P(X)
Random variable
A number that describes some outcomes in a random behavior. It is usually denoted with a capital letter such as X
Binomial random variable
A random variable that counts the number of successes in repeated trials of a random event.
Continuous random variable
A random variable that takes on data that can be measured, can take on all values in a given interval, and the values cannot be counted.
Discrete random variable
A random variable whose outcomes are (able to be) counted
Resistant
A resistant measure is a variable that can resist the influence of an extreme value.
Representative sample
A sample that matches the populations' characteristics.
Systematic sampling
A starting point is chosen, and then subjects are selected using a jump number or specific interval
Event
A subset of the outcomes in a sample space. The probability of some event X is usually written as P(X).
Double-blind
A test or experiment in which information that may bias the results is concealed from both the tester and subject.
Single-blind
A test or experiment in which information that may bias the results is concealed from either the tester or subject.
Five number summary
For a dataset, the minimum, the lower quartile (Q1), the median (Q2), the upper quartile (Q3), and the maximum values.
Expected value
For a random variable, this is the average of all possible values, weighted by their probabilities
Dot plot
Graph showing individual data values placed as a dot above their corresponding value on a number line
Multivariate
Having to do with two or more variables
Bivariate
Having to do with two variables
Mean
Mean of a set of numbers is calculated by finding the sum of the set of numbers, then dividing by the amount of numbere in the set
Spread
Measures that indicate how closely (or not) the data values are distributed to another. Examples include standard deviation, variance, range, and IQR.
Response variable
Measures the outcome of a study. Also known as the dependent variable
Median
Median is the number in the middle of a set, when the set is listed in ascending (or descending) order. If the set is an even number, the median is the mean of the two middle numbers.
Mode
Mode is the number that occurs the most often.
Sampling bias
Not all the members of the population are equally likely to be selected. A sample with bias is one where certain groups (in the population) are over represented or under represented (in the sample).
Nonresponse bias
Occurs when people who were selected to participate in the survey cannot participate in the survey University
Intersection
Out of two random events, the set of all outcomes that satisfy BOTH.
Block design
Procedure by which experimental units are put into homogenous groups in an attempt to control for the effects of the group on the response variable
Categorical variable
Qualitative variables are also referred to as categorical variables because they describe data that fits into categories. Qualitative variables are usually not numeric but sometimes they can be.
Quantitative variable
Quantitative variables are also referred to as numeric variables because they describe data that can be measured numerically.
Quartiles
Quartiles are three values that separate a set into four subsets of equal size. Q1 represents the 25th percentile, Q2 represents the 50th percentile (median), and Q3 represents the 75th percentile.
Randomization
Random assignment of experimental units to treatments
Relative frequency
Ratio of the number of times a data value occurs to the total number of outcomes
Convenience sample
Sample chosen without any random mechanism. Often, data is collected merely based on ease of selection.
Sample
Samples are chosen at random to ensure the sample is representative of the population from which it comes.
Simple random sample
Sampling such that all possible samples of the same size are equally likely to be chosen
Blocking
See block design
Mean (for a random variable)
See expected value
Independent variable
See explanatory variable
Line of best fit
See least squares regression line
Regression line
See least squares regression line
Dependent variable
See response variable
Bias
See sampling bias
Blinding
See single-blind and double-blind
Back to back stemplot
See stemplot. In a back-to-back stemplot, the stems are located in the middle, with the leaves for one dataset to the left and the leaves for a second dataset to the right.
Proportional sampling
The population is first divided into strata, and then a simple random sample of a size that is proportional to the size of the stratum is selected from each stratum.
Residuals
The actual value (y-coordinate on the scatterplot) minus the predicted value (y- coordinate on the regression line). This represents the error of predictions made by using the LSRL
Stratified random sampling
The population is first divided into the singular term) that have some meaningful relationship with the variable we are homogeneous groups or 'strata' ('stratum' being trying to study.
Conditional probability
The probability that an event occurs given that we know another event to be true (or has already happened).
Replication
The process of giving a certain treatment numerous times in an experiment or applying it to a number of different expiremental units to try to reproduce the same results
Range
The range is the difference between the largest value and smallest value in a data set.
Sampling frame
The sampling frame is the list of subjects or units in a population from which the sample is chosen.
Experimental unit
The smallest unit of a population that will receive a treatment
Coefficient of determination
The square of the correlation coefficient. This represents the percent of variation in the dependent variable that can be explained by variation in the explanatory variable using the regression line
Standard deviation
The square root of the variance. Denoted as s instead of s?, the standard deviation is in the same units as the data set from which it is calculated. Standard Deviation
Observational study
The variables of interest are simply observed and recorded. The people conducting the study apply no treatment or influence in any way
Explanatory variable
This explains the changes in the response variable. Also known as the independent variable or the treatment variable.
Mutually exclusive events
Two event which cannot happen simultaneously
Sample survey
Using a sample from a population to obtain responses to questions from induviduals not all members of the population are studied
Sample space
All of the possible outcomes of some random behavior
Stemplot
Also known as a stem-and-leaf plot, stemplots separate the data into one section that consits of the final significant digit leaf and the remaining digits
Census
An attempt to contact and collect information from every member of a population
Treatment
An experiment is used when a researcher wants to show that assigning change the independent variable that we believe causes a change in the a treatment to a variable causes a change in another variable. A treatment is the dependent variable.
Experiment
An experiment is when the researcher measures the relationship between some variables, and then actively creates some change in one of those variables to examine its effect on the other variables.
Judgement sampling
An expert or group of experts hand-selects the individuals to be included in the study, thus causing bias due to their subjective choice.
Influential observation
An observation, usually in the x-direction, whose removal would have a significant impact on the slope of the regression line
Law of large numbers
As the number of trials of a random behavior increases, the proportion of a specific outcome should approach a single value.
Response bias
Bias that stems from an inaccurate or untruthful response from the respondent
Discrete data
Data which can be counted.
Shape
Describes if the distribution is uniform, symmetric, skewed left, or skewed right
Scattегplot
Displays a set of ordered pairs
Histogram
Displays the frequencies of numerical data with bars
Multimodal
Distribution with three or more peaks
Bimodal
Distribution with two peaks
Center
Either the mean or the median, whichever best describes the "middle"
Outliers
Extreme values that differ greatly from the other observations. Typically, outliers are more than the Q3 value by at least 1.5 times the IQR or less than the Q1 value by at least 1.5 times the IQR
Simulation
Imitating a random behavior by identifying the probability of a specific outcome 3. Happening then generating random digits to determine the result
Matched pairs design
In characteristic matched pairs design, experimental units are paired according to some common characeristic, and then a treatment is applied to each unit in the pair. In before and after matched pairs design, the expirmenter may apply both treatments to each experimental unit
Positive association
Larger values of one variable are associated with larger values of another variable (and smaller with smaller)
Negative association
Larger values of one variable are associated with smaller values of another variable
Complement
The collection of outcomes in a sample space that are not part of a certain event.
Correlation coefficient
The correlation coefficient, r, is a measure of the strength of a linear relationship between two variables. Values between -0.5 and 0.5 show a weak linear correlation and values between -1 and -0.8 or between 0.8 and 1 show a strong linear correlation. When r=0 there is either no relationship or there may be a relationship (e.g. quadratic, expontenial) that is not at all linear.
Probability model
The description of a random behavior's sample space and the probability of each outcome.
Normal distribution
The distribution is mound shaped and symmetrical. Also referred to as the "bell curve". Distributed according to the empirical rule such that approximately 68% of the data falls within one standard deviation of the mean 95% within two and 99.7% within three
IQR
The interquartile range (IQR) is the difference between the third quartile and the first quartile. The IQR represents the middle 50% of the data.
Least squares regression line (LSRL)
The line that has the least possible sum of squared errors (residuals).
Binomial coefficient
The number of possible combinations of n trials with k successes.
Frequency
The number of times that a data value for a variable occurs. A frequency table is a table that shows the total for each variable.
