AP Stats summer work vocab
frequency table
When a table shows frequency counts for a categorical variable, it is called a frequency table Below, the bar chart and the frequency table display the same data.
relative frequency table
When a table shows relative frequencies for different categories of a categorical variable, it is called a relative frequency table.
dot plot
A dotplot is a type of graphic display used to compare frequency counts within categories or groups. As you might guess, a dotplot is made up of dots plotted on a graph. Here is how to interpret a dotplot. Each dot can represent a single observation from a set of data, or a specified number of observations from a set of data. The dots are stacked in a column over a category, so that the height of the column represents the relative or absolute frequency of observations in the category.
segmented bar graph
A graph of frequency distribution for categorical data set. Each category is represented by a segment of the bar and the segment is proportional to the corresponding frequency or relative frequency. SENTENCE: Segmented bar graphs are used for frequency distribution for categorical data sets.
resistant
A statistic which is relatively unaffected by unusual observations. The median and inter-quartile range are examples of resistant statistics, while the mean, standard deviation, and range are not.
symmetric distribution
A symmetric distribution is a type of distribution where the left side of the distribution mirrors the right side
two way table
A two-way table (also called a contingency table) is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies (just like a one-way table ).
back to back stem plot
Back-to-back stemplots are a graphic option for comparing data from two populations. The center of a back-to-back stemplot consists of a column of stems, with a vertical line on each side. Leaves representing one data set extend from the right, and leaves representing the other data set extend from the left. The back-to-back stemplot on the right shows the amount of cash (in dollars) carried by a random sample of teenage boys and girls. The boys carried more cash than the girls - a median of $42 for the boys versus $36 for the girls. Both distributions were roughly bell-shaped, although there was more variation among the boys. And finally, there were neither gaps nor outliers in either group.
unimodal
Distributions of data can have few or many peaks. Distributions with one clear peak are called unimodal
multimodal
Distributions of data have many peaks
marginal distribution
Entries in the "Total" row and "Total" column are called marginal frequencies or the marginal distribution. Entries in the body of the table are called joint frequencies.
continuous variable
If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable
categorical variables
Qualitative Categorical. Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables.
quantitative variable
Quantitative. Quantitative variables are numerical. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable.
quartiles
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. Note the relationship between quartiles and percentiles. Q1 corresponds to P25, Q2 corresponds to P50, Q3 corresponds to P75. Q2 is the median value in a set of data.
round off error
Roundoff error is the difference between an approximation of a number used in computation and its exact (correct) value. In certain types computation, roundoff error can be magnified as any initial errors are carried through one or more intermediate steps.
side by side bar graph
Side-By-Side bar charts are used to display two categorical variables. The two categorical variables, cylinders and gears are used to show how to create a bar chart.
splitting stems
Split stems is a term used to describe stem-and-leaf plots that have more than 1 space on the stem for the same interval. Example would be 1 with leaves 1-4, and a 2nd 1 containing leaves 5-9. This is done to help avoid "bunched" data.
symmetric
Symmetry is an attribute used to describe the shape of a data distribution. When it is graphed, a symmetric distribution can be divided at the center so that each half is a mirror image of the other. A non-symmetric distribution cannot.
center
The center of a distribution is the middle of a distribution. For example, the center of 1 2 3 4 5 is the number 3
distribution
The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur. When a distribution of categorical data is organized, you see the number or percentage of individuals in each group.
first quartile (Q1)
The first quartile, denoted by Q1 , is the median of the lower half of the data set. This means that about 25% of the numbers in the data set lie below Q1 and about 75% lie above Q1 .
five number summary
The five number summary includes 5 items: The minimum. Q1 (the first quartile, or the 25% mark). The median. Q3 (the third quartile, or the 75% mark). The maximum. The five number summary gives you a rough idea about what your data set looks like. for example, you'll have your lowest value (the minimum) and the highest value (the maximum). Although it's useful in itself, the main reason you'll want to find a five-number summary is to find more useful statistics, like the interquartile range, sometimes called the middle fifty.
Interquartile Range (IQR)
The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. Q1 is the "middle" value in the first half of the rank-ordered data set. Q2 is the median value in the set. Q3 is the "middle" value in the second half of the rank-ordered data set. The interquartile range is equal to Q3 minus Q1. For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. Q1 is the middle value in the first half of the data set. Since there are an even number of data points in the first half of the data set, the middle value is the average of the two middle values; that is, Q1 = (3 + 4)/2 or Q1 = 3.5. Q3 is the middle value in the second half of the data set. Again, since the second half of the data set has an even number of observations, the middle value is the average of the two middle values; that is, Q3 = (6 + 7)/2 or Q3 = 6.5. The interquartile range is Q3 minus Q1, so IQR = 6.5 - 3.5 = 3.
median
The median is a simple measure of central tendency. To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in a sample of four families, we might want to compute the median annual income. Suppose the incomes are $30,000 for the first family; $50,000, for the second; $90,000, for the third; and $110,000, for the fourth. The two middle values are $50,000 and $90,000. Therefore, the median annual income is ($50,000 + $90,000)/2 or $70,000.
shape (mode)
The mode is the most frequently appearing value in a population or sample. Suppose we draw a sample of five women and measure their weights. They weigh 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds. Since more women weigh 100 pounds than any other weight, the sample mode would equal 100 pounds.
mosaic plot
The mosaic plot is a graphical representation of the two-way frequency table or Contingency Table. A mosaic plot is divided into rectangles; the vertical length of each rectangle is proportional to the proportions of the Y variable in each level of the X variable.
conditional probability
The probability that event A occurs, given that event B has occurred, is called a conditional probability. The conditional probability of A, given B, is denoted by the symbol P(A|B)
spread (range)
The range is a simple measure of variation in a set of random variables. It is difference between the biggest and smallest random variable. Range = Maximum value - Minimum value Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 - 3 or 4
range
The range is a simple measure of variation in a set of random variables. It is difference between the biggest and smallest random variable. Range = Maximum value - Minimum value Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 - 3 or 4.
standard deviation
The standard deviation is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the standard deviation is big; and vice versa. It is important to distinguish between the standard deviation of a population and the standard deviation of a sample. They have different notation, and they are computed differently. The standard deviation of a population is denoted by σ and the standard deviation of a sample, by s. The standard deviation of a population is defined by the following formula: σ = sqrt [ Σ ( Xi - X )2 / N ] where σ is the population standard deviation, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population. The standard deviation of a sample is defined by slightly different formula: s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ] where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. And finally, the standard deviation is equal to the square root of the variance.
third quartile (Q3)
The third quartile (Q3) is the middle value between the median and the highest value (maximum) of the data set.
Variance
The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa. It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently. The variance of a population is denoted by σ2; and the variance of a sample, by s2. The variance of a population is defined by the following formula: σ2 = Σ ( Xi - X )2 / N where σ2 is the population variance, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population. The variance of a sample is defined by slightly different formula: s2 = Σ ( xi - x )2 / ( n - 1 ) where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the variance of the population. And finally, the variance is equal to the square of the standard deviation.
skewed distribution
When they are displayed graphically, some distributions of data have many more observations on one side of the graph than the other. Distributions with fewer observations on the right (toward higher values) are said to be skewed right; and distributions with fewer observations on the left (toward lower values) are said to be skewed left
bimodal
distributions with two clear peaks are called bimodal.
bar graph
A bar graph is a chart that plots data using rectangular bars or columns (called bins) that represent the total amount of observations in the data for that category. Bar charts can be displayed with vertical columns, horizontal bars, comparative bars (multiple bars to show a comparison between values), or stacked bars (bars containing multiple types of information)
box plot
A boxplot, sometimes called a box and whisker plot, is a type of graph used to display patterns of quantitative data. A boxplot splits the data set into quartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3). Within the box, a vertical line is drawn at the Q2, the median of the data set. Two horizontal lines, called whiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier. Smallest non-outlier Q1 Q2 Q3 If the data set includes one or more outliers, they are plotted separately as points on the chart. In the boxplot above, two outliers precede the first whisker (on the left side of the plot).
census
A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required.
histogram
A histogram is made up of columns plotted on a graph. Here is how to read a histogram. The columns are positioned over a label that represents a continuous, quantitative variable . The height of the column indicates the size of the group defined by the column label. The histogram below shows per capita income for five age groups.
mean
A mean score is an average score, often denoted by X. It is the sum of individual scores divided by the number of individuals. Thus, if you have a set of N numbers ( X1 , X2 , X3 , . . . XN ), the mean of those numbers would be defined as: X = ( X1 + X2 + X3 + . . . + XN ) / N = [ Σ Xi ] / N For example, the mean of the numbers 1, 2, and 3 would be (1 + 2 + 3)/3 or 2. Note: The mean score of a random variable (also called the expected value) is defined somewhat differently
parameter
A parameter is a measurable characteristic of a or a population, such as a mean or a standard deviation
pie chart
A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents.
discrete variable
If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable. Some examples will clarify the difference between discrete and continuous variables. Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.
association
In Statistics, association tells you whether two variables are related. The direction of the association is always symbolized by a sign either positive (+) or negative (-). There are two directions of association: positive association and negative association.
stem plot
In a stemplot, the entries on the left are called stems; and the entries on the right are called leaves.stems and leaves are explicitly labeled for educational purposes. In the real world, however, stemplots usually do not include explicit labels for the stems and leaves. Some stemplots include a key to help the user interpret the display correctly
outlier
In regression analysis, a data point that diverges greatly from the overall pattern of data is called an outlier. In more general usage, an outlier is an extreme value that differs greatly from other values in a set of values. As a "rule of thumb", an extreme value is considered to be an outlier if it is at least 1.5 interquartile ranges below the first quartile (Q1), or at least 1.5 interquartile ranges above the third quartile (Q3). To illustrate, consider the following example. Suppose we sample 10 households and note the annual income of each household. Suppose we find that nine of the households have incomes between $20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That tenth household is an outlier. The figure below shows a distribution with an outlier. Except for one lonely observation (the outlier on the extreme right), all of the other observations appear on the left side of the distribution.
Variable
In statistics, a variable has two defining characteristics: A variable is an attribute that describes a person, place, thing, or idea. The value of the variable can "vary" from one entity to another
inference
Inference, in statistics, the process of drawing conclusions about a parameter one is seeking to measure or estimate.
Individuals
the people or objects included in the study. A variable is the characteristic of the individual to be measured or observed.