AP Stats Uths vocab

Ace your homework & exams now with Quizwiz!

Geometric random variable

called a geometric random variable. event. The process of repeating a random Used to find the number of trials required to reach the first success in a random event until your first success occurs is

Independent events

event happened does not affect the likelihood of the other event happening. Two or more random events do not affect each other's probabilities. Knowing one

Confounding variable

A confounding variable is some third variable whose effect on the response variable cannot be separated from the effects of the explanatory variable.

Continuous variable

A continuous variable can take on an INFINITE amount of values between a minimum and maximum.

Discrete variable

A discrete variable can only take on a FINITE amount of values between a minimum and maximum.

Skewed

A distribution that is asymmetric. If there is a longer tail to the right of the center, the distribution is said to be skewed right and if there is a longer tail to the left of the center, the distribution is said to be skewed left.

Symmetric

A distribution with data values distributed equally above and below the center

Placebo group

A group in an experiment that receives a "placebo" rather than an actual treatment.

Control group

A group that does not receive treatment but is still measured.

Lurking variable

A lurking variable is one that is not included in a study, but may still have some effect on the other variables. A lurking variable might be completley unknown and its effects unsuspected by the experimenters

Statistic

A measure that describes a sample, A statistic is usually not denoted by a Greek letter.

Parameter

A measure that describes an entire population. Parameters are often denoted by a Greek letter.

Correlation

A mutual relationship between two variables. Note that just knowing that two variables have a strong correlation does not mean that one caused the other.

Cluster sampling

A naturally occurring heterogeneous group is selected and the entire group, or randomly selcted memvers are used in the sample

Non-resistant

A non-resistant measure is a variable that cannot resist the influence of an extreme value.

Probability

A number between 0 and 1 that describes the proportion of times a particular outcome should occur. This is typically written as a reduced fraction or a %. The probability of some event X is usually written as P(X)

Random variable

A number that describes some outcomes in a random behavior. It is usually denoted with a capital letter such as X

Binomial random variable

A random variable that counts the number of successes in repeated trials of a random event.

Continuous random variable

A random variable that takes on data that can be measured, can take on all values in a given interval, and the values cannot be counted.

Discrete random variable

A random variable whose outcomes are (able to be) counted

Resistant

A resistant measure is a variable that can resist the influence of an extreme value.

Representative sample

A sample that matches the populations' characteristics.

Systematic sampling

A starting point is chosen, and then subjects are selected using a jump number or specific interval

Event

A subset of the outcomes in a sample space. The probability of some event X is usually written as P(X).

Double-blind

A test or experiment in which information that may bias the results is concealed from both the tester and subject.

Single-blind

A test or experiment in which information that may bias the results is concealed from either the tester or subject.

Five number summary

For a dataset, the minimum, the lower quartile (Q1), the median (Q2), the upper quartile (Q3), and the maximum values.

Expected value

For a random variable, this is the average of all possible values, weighted by their probabilities

Dot plot

Graph showing individual data values placed as a dot above their corresponding value on a number line

Multivariate

Having to do with two or more variables

Bivariate

Having to do with two variables

Mean

Mean of a set of numbers is calculated by finding the sum of the set of numbers, then dividing by the amount of numbere in the set

Spread

Measures that indicate how closely (or not) the data values are distributed to another. Examples include standard deviation, variance, range, and IQR.

Response variable

Measures the outcome of a study. Also known as the dependent variable

Median

Median is the number in the middle of a set, when the set is listed in ascending (or descending) order. If the set is an even number, the median is the mean of the two middle numbers.

Mode

Mode is the number that occurs the most often.

Sampling bias

Not all the members of the population are equally likely to be selected. A sample with bias is one where certain groups (in the population) are over represented or under represented (in the sample).

Nonresponse bias

Occurs when people who were selected to participate in the survey cannot participate in the survey University

Intersection

Out of two random events, the set of all outcomes that satisfy BOTH.

Block design

Procedure by which experimental units are put into homogenous groups in an attempt to control for the effects of the group on the response variable

Categorical variable

Qualitative variables are also referred to as categorical variables because they describe data that fits into categories. Qualitative variables are usually not numeric but sometimes they can be.

Quantitative variable

Quantitative variables are also referred to as numeric variables because they describe data that can be measured numerically.

Quartiles

Quartiles are three values that separate a set into four subsets of equal size. Q1 represents the 25th percentile, Q2 represents the 50th percentile (median), and Q3 represents the 75th percentile.

Randomization

Random assignment of experimental units to treatments

Relative frequency

Ratio of the number of times a data value occurs to the total number of outcomes

Convenience sample

Sample chosen without any random mechanism. Often, data is collected merely based on ease of selection.

Sample

Samples are chosen at random to ensure the sample is representative of the population from which it comes.

Simple random sample

Sampling such that all possible samples of the same size are equally likely to be chosen

Blocking

See block design

Mean (for a random variable)

See expected value

Independent variable

See explanatory variable

Line of best fit

See least squares regression line

Regression line

See least squares regression line

Dependent variable

See response variable

Bias

See sampling bias

Blinding

See single-blind and double-blind

Back to back stemplot

See stemplot. In a back-to-back stemplot, the stems are located in the middle, with the leaves for one dataset to the left and the leaves for a second dataset to the right.

Proportional sampling

The population is first divided into strata, and then a simple random sample of a size that is proportional to the size of the stratum is selected from each stratum.

Residuals

The actual value (y-coordinate on the scatterplot) minus the predicted value (y- coordinate on the regression line). This represents the error of predictions made by using the LSRL

Stratified random sampling

The population is first divided into the singular term) that have some meaningful relationship with the variable we are homogeneous groups or 'strata' ('stratum' being trying to study.

Conditional probability

The probability that an event occurs given that we know another event to be true (or has already happened).

Replication

The process of giving a certain treatment numerous times in an experiment or applying it to a number of different expiremental units to try to reproduce the same results

Range

The range is the difference between the largest value and smallest value in a data set.

Sampling frame

The sampling frame is the list of subjects or units in a population from which the sample is chosen.

Experimental unit

The smallest unit of a population that will receive a treatment

Coefficient of determination

The square of the correlation coefficient. This represents the percent of variation in the dependent variable that can be explained by variation in the explanatory variable using the regression line

Standard deviation

The square root of the variance. Denoted as s instead of s?, the standard deviation is in the same units as the data set from which it is calculated. Standard Deviation

Observational study

The variables of interest are simply observed and recorded. The people conducting the study apply no treatment or influence in any way

Explanatory variable

This explains the changes in the response variable. Also known as the independent variable or the treatment variable.

Mutually exclusive events

Two event which cannot happen simultaneously

Sample survey

Using a sample from a population to obtain responses to questions from induviduals not all members of the population are studied

Sample space

All of the possible outcomes of some random behavior

Stemplot

Also known as a stem-and-leaf plot, stemplots separate the data into one section that consits of the final significant digit leaf and the remaining digits

Census

An attempt to contact and collect information from every member of a population

Treatment

An experiment is used when a researcher wants to show that assigning change the independent variable that we believe causes a change in the a treatment to a variable causes a change in another variable. A treatment is the dependent variable.

Experiment

An experiment is when the researcher measures the relationship between some variables, and then actively creates some change in one of those variables to examine its effect on the other variables.

Judgement sampling

An expert or group of experts hand-selects the individuals to be included in the study, thus causing bias due to their subjective choice.

Influential observation

An observation, usually in the x-direction, whose removal would have a significant impact on the slope of the regression line

Law of large numbers

As the number of trials of a random behavior increases, the proportion of a specific outcome should approach a single value.

Response bias

Bias that stems from an inaccurate or untruthful response from the respondent

Discrete data

Data which can be counted.

Shape

Describes if the distribution is uniform, symmetric, skewed left, or skewed right

Scattегplot

Displays a set of ordered pairs

Histogram

Displays the frequencies of numerical data with bars

Multimodal

Distribution with three or more peaks

Bimodal

Distribution with two peaks

Center

Either the mean or the median, whichever best describes the "middle"

Outliers

Extreme values that differ greatly from the other observations. Typically, outliers are more than the Q3 value by at least 1.5 times the IQR or less than the Q1 value by at least 1.5 times the IQR

Simulation

Imitating a random behavior by identifying the probability of a specific outcome 3. Happening then generating random digits to determine the result

Matched pairs design

In characteristic matched pairs design, experimental units are paired according to some common characeristic, and then a treatment is applied to each unit in the pair. In before and after matched pairs design, the expirmenter may apply both treatments to each experimental unit

Positive association

Larger values of one variable are associated with larger values of another variable (and smaller with smaller)

Negative association

Larger values of one variable are associated with smaller values of another variable

Complement

The collection of outcomes in a sample space that are not part of a certain event.

Correlation coefficient

The correlation coefficient, r, is a measure of the strength of a linear relationship between two variables. Values between -0.5 and 0.5 show a weak linear correlation and values between -1 and -0.8 or between 0.8 and 1 show a strong linear correlation. When r=0 there is either no relationship or there may be a relationship (e.g. quadratic, expontenial) that is not at all linear.

Probability model

The description of a random behavior's sample space and the probability of each outcome.

Normal distribution

The distribution is mound shaped and symmetrical. Also referred to as the "bell curve". Distributed according to the empirical rule such that approximately 68% of the data falls within one standard deviation of the mean 95% within two and 99.7% within three

IQR

The interquartile range (IQR) is the difference between the third quartile and the first quartile. The IQR represents the middle 50% of the data.

Least squares regression line (LSRL)

The line that has the least possible sum of squared errors (residuals).

Binomial coefficient

The number of possible combinations of n trials with k successes.

Frequency

The number of times that a data value for a variable occurs. A frequency table is a table that shows the total for each variable.


Related study sets

PBS 1 - Emotion - Does affect help or hinder with regard to cognitive processing?

View Set

The psychology that is being investigated in the studies

View Set

Types of insurance policies life

View Set

Prep U Chapter 7: The Nursing Role in Genetic Assessment and Counseling

View Set