Ch. 2 Descriptive Statistics
Variable
A characteristic or a quantity of interest that can take on different values.
Histogram
A common graphical presentation of quantitative data. Provide information about the shape, or form, of a distribution.
Coefficient of variation
A descriptive statistic that indicates how large the standard deviation is relative to the mean. Expresed as a percentage.
Geometric Mean
A measure of location that is calculated by finding the nth root of the product of n values. Used in analyzing growth rates in financial data.
Random Variable/Uncertain Variable
A quantity whose values are not know with certainty
Random Sampling
A sampling method to gather a representative sample of the population data
Observation
A set of values corresponding to a set of variables.
Frequency Distribution
A summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes. Typically referred to as "bins" when dealing with distributions
Relative frequency distribution
A tabular summary of data showing the relative frequency for each bin
Experimental Study
A variable of interest is first identified. The one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest.
Cumulative Frequency Distribution
A variation of the frequency distribution that provides another tabular summary of quantitative data. Uses the number of classes, class widths, and class limits developed for the frequency distribution. Shows the # of data items with values less than or equal to the upper class limit of each class
Population
All elements of interest.
Mean/Arithmetic Mean
Average value of a variable
Cross-sectional Data
Data collected from several entities at the same, or approzimately the same, point in time
Time Series Data
Data collected over several time periods
Categorical Data
Data on which arithmetic operations cannot be performed
Quantitative Data
Data on which numeric and arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed.
Legitimately Missing Data
Data sets that contain observations with missing values for one or more variables. Generally, no remedial action is taken for legitimately missing data.
Covariance
Descriptive measure of the linear association between two variables.
Outliers
Extreme values in a data set. Can be identified using standardized values (z-scores)
Box Plots
Graphical summary of the distribution of data. Developed from the quartiles for a data set.
Skewness
Lack of symmetry. Important characteristic of the shape of a distribution
Non-experimental Study or Observational Study
Makes no attempt to control the variables of interest. A survey is perhaps the most common type of observational study
Variance
Measure of variability that utilizes all the data. Based on the deviation about the mean, which is the difference between the value of each observation (xi) and the mean.
Correlation Coefficient
Measures the relationship between two variables. Not affected by the units of measurement for x and y.
Z-score
Measures the relative location of a value in the data set. Helps to determine how far a particular value is from the mean relative to the data set's standard deviation. Often called the standardized value.
Quick Analysis
Provides shortcuts for Conditional Formatting, adding Data Bars, and other operations
Sample
Subset of the population
Range
Subtracting the smaller value from the largest value in a data set. Drawback: range is based on only two of the observations and thus is highly influenced by extreme values.
Percent frequency distribution
Summarizes the percent frequency of the data for each bin. Used to provide estimates of the relative likelihoods of different values of a random variable.
Variation
The difference in a variable measured over observations
Data
The facts and figures collected, analyzed, and summarized for presentation and interpretation.
Standard Deviation
The positive square root of the variance. Measured in the same units as the original data.
Dimension Reduction
The process of removing variables from the analysis without losing crucial information.
Imputation
The systematic replacement of missing values with values that seem reasonable
Missing at Random (MAR)
The tendency for an observation to be missing a value for some variable is related to the value of some other variable(s) in the data
Missing completely at random (MCAR)
The tendency for an observation to be missing the value for some variable is entirely random; whether data are missing does not depend on either the value of the missing data or the value of any other variable in the data.
Missing not at random (MNAR)
The tendency for the value of a variable to be missing is related to the value that is missing
Empirical rule
Used to determine the percentage of data values that are within a specified number of standard deviation of the mean. Only used when the distribution of data exhibits a symmetric bell-shaped distribution.
Scatter Chart
Useful graph for analyzing the relationship between two variables.
Median
Value in the middle when the data are arranged in ascending order. Middle value, for an odd number of observations. Average of two middle values, for an even number of observations.
Percentile
Value of a variable at which a specified (approximate) percentage of observations are below that value.
Mode
Value that occurs most frequently in a data set
Quartiles
When the data is divided into four equal parts. Each part contains approximately 25% of the observations. Division points are referred to as quartiles.