Chapter 3 - Stat 1
Unimodal
A data set that has only one value that occurs with the greatest frequency.
Symmetric Distribution
A distribution in which the data values are uniformly distributed about the mean.
Population Variance
Symbol: σ(square) (lower case sigma) The average of the squares of the distance each value is from the mean.
Modal Class
The class with the largest frequency.
Mean
The sum of the values, divided by the total number of values. Also known as the arithmetic average. µ Ex: The mean of 3, 2, 6, 5, and 4 is? ( 3 + 2 + 6 + 5 + 4 ) / 5 = 4
Mode
The value that occurs most often in a data set.
Outlier
An extreme value in a data set; it is omitted from a box-plot.
Sample Mean
Denoted by X (pronounced "X bar"), is calculated by using sample data.
Parameter
A characteristic or measure obtained by using all the data values for a specific population.
Statistics
A characteristic or measure obtained by using the data values from a sample.
Data Array
A data set that has been ordered.
Multimodal
A data set with three or more modes
Range Rule of Thumb
Dividing the range by 4, given an approximation of the standard deviation The range rule of thumb is only an approximation and should be used when the distribution of data values is unimodal and roughly symmetric. - The range rule of thumb can be used to estimate the largest and smallest data values of a data set. The smallest data value will be approximately 2 standard deviations below the mean, and the largest data value will be approximately 2 standard deviations above the mean of the data set.
Five-Number Summary
Five specific values for a data set that consist of the lowest and highest values, Q1 and Q3, and the median.
∑X
Means to find the sum of the X values in the data set.
Interquartile Range (IQR)
Q3 - Q1 (i.e. the distance between the first and third quartiles)
The spread or variability of data is shown commonly by what measures?
Range, variance, and standard deviation.
Bimodal
A data set with two modes
Positively Skewed or Right-Skewed Distribution
A distribution in which the majority of the data values fall to the left of the mean.
Negatively Skewed or Left-Skewed Distribution
A distribution in which the majority of the data values fall to the right of the mean.
Boxplot
A graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through the median or Q2.
Decile
A location measure of a data value; it divides the distribution into 10 groups.
Percentile
A location measure of a data value; it divides the distribution into 100 groups.
Coefficient of Variation
A measure of the variation of the dependent variable that is explained by the regression line and the independent variable; the ratio of the explained variation to the total variation. -denoted by CVar, is the standard deviation divided by the mean. The result is expressed as a percentage.
Empirical Rule
A rule that states that when a distribution is bell-shaped (normal), approximately 68% of the data values will fall within 1 standard deviation of the mean; approximately 95% of the data values will fall within 2 standard deviations of the mean; and approximately 99.7% of the data values will fall within 3 standard deviations of the mean.
Resistant Statistic
A statistic that is not affected by the extremely skewed distribution brought by outliers.
Nonresistant Statistic
A statistic that is relatively less affected by outliers.
Chebyshev's Theorem
A theorem that states that the proportion of values from a data set that fall within k standard deviations of the mean will be at least 1 - 1/k2, where k is a number greater than 1. It helps you find 75% or 88.98% of the range of data the given studied data values. -Chebyshev's theorem applies to any distribution regardless of its shape.
Population Mean
Denoted by µ (pronounced "mew"), is calculated by using all the values in the population. The population mean is a parameter.
Exploratory Data Analysis (EDA)
The act of analyzing data to determine what information can be obtained by using stem and leaf plots, medians, interquartile ranges, and box-plots. - The purpose of exploratory data analysis is to examine data to find out what information can be discovered about the data, such as the center and the spread.
z Score or Standard Score
The difference between a data value and the mean, divided by the standard deviation.
Deviation
The difference or distance each data value is from the mean.
Range
The highest data value minus the lowest data value.
Weighted Mean
The mean found by multiplying each value by its corresponding weight and dividing by the sum of the weights.
Median
The midpoint of a data array.
Population Standard Deviation
The square root of the variance.
Midrange
The sum of the lowest and highest data values, divided by 2.
Quartile
Values that separate the data set into approximately equal groups. - Quartiles divide the distribution into four equal groups, denoted by Q1, Q2, Q3. Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile, or the median; Q3 corresponds to the 75th percentile.