Ch. 2 Descriptive Statistics

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Variable

A characteristic or a quantity of interest that can take on different values.

Histogram

A common graphical presentation of quantitative data. Provide information about the shape, or form, of a distribution.

Coefficient of variation

A descriptive statistic that indicates how large the standard deviation is relative to the mean. Expresed as a percentage.

Geometric Mean

A measure of location that is calculated by finding the nth root of the product of n values. Used in analyzing growth rates in financial data.

Random Variable/Uncertain Variable

A quantity whose values are not know with certainty

Random Sampling

A sampling method to gather a representative sample of the population data

Observation

A set of values corresponding to a set of variables.

Frequency Distribution

A summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes. Typically referred to as "bins" when dealing with distributions

Relative frequency distribution

A tabular summary of data showing the relative frequency for each bin

Experimental Study

A variable of interest is first identified. The one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest.

Cumulative Frequency Distribution

A variation of the frequency distribution that provides another tabular summary of quantitative data. Uses the number of classes, class widths, and class limits developed for the frequency distribution. Shows the # of data items with values less than or equal to the upper class limit of each class

Population

All elements of interest.

Mean/Arithmetic Mean

Average value of a variable

Cross-sectional Data

Data collected from several entities at the same, or approzimately the same, point in time

Time Series Data

Data collected over several time periods

Categorical Data

Data on which arithmetic operations cannot be performed

Quantitative Data

Data on which numeric and arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed.

Legitimately Missing Data

Data sets that contain observations with missing values for one or more variables. Generally, no remedial action is taken for legitimately missing data.

Covariance

Descriptive measure of the linear association between two variables.

Outliers

Extreme values in a data set. Can be identified using standardized values (z-scores)

Box Plots

Graphical summary of the distribution of data. Developed from the quartiles for a data set.

Skewness

Lack of symmetry. Important characteristic of the shape of a distribution

Non-experimental Study or Observational Study

Makes no attempt to control the variables of interest. A survey is perhaps the most common type of observational study

Variance

Measure of variability that utilizes all the data. Based on the deviation about the mean, which is the difference between the value of each observation (xi) and the mean.

Correlation Coefficient

Measures the relationship between two variables. Not affected by the units of measurement for x and y.

Z-score

Measures the relative location of a value in the data set. Helps to determine how far a particular value is from the mean relative to the data set's standard deviation. Often called the standardized value.

Quick Analysis

Provides shortcuts for Conditional Formatting, adding Data Bars, and other operations

Sample

Subset of the population

Range

Subtracting the smaller value from the largest value in a data set. Drawback: range is based on only two of the observations and thus is highly influenced by extreme values.

Percent frequency distribution

Summarizes the percent frequency of the data for each bin. Used to provide estimates of the relative likelihoods of different values of a random variable.

Variation

The difference in a variable measured over observations

Data

The facts and figures collected, analyzed, and summarized for presentation and interpretation.

Standard Deviation

The positive square root of the variance. Measured in the same units as the original data.

Dimension Reduction

The process of removing variables from the analysis without losing crucial information.

Imputation

The systematic replacement of missing values with values that seem reasonable

Missing at Random (MAR)

The tendency for an observation to be missing a value for some variable is related to the value of some other variable(s) in the data

Missing completely at random (MCAR)

The tendency for an observation to be missing the value for some variable is entirely random; whether data are missing does not depend on either the value of the missing data or the value of any other variable in the data.

Missing not at random (MNAR)

The tendency for the value of a variable to be missing is related to the value that is missing

Empirical rule

Used to determine the percentage of data values that are within a specified number of standard deviation of the mean. Only used when the distribution of data exhibits a symmetric bell-shaped distribution.

Scatter Chart

Useful graph for analyzing the relationship between two variables.

Median

Value in the middle when the data are arranged in ascending order. Middle value, for an odd number of observations. Average of two middle values, for an even number of observations.

Percentile

Value of a variable at which a specified (approximate) percentage of observations are below that value.

Mode

Value that occurs most frequently in a data set

Quartiles

When the data is divided into four equal parts. Each part contains approximately 25% of the observations. Division points are referred to as quartiles.


Ensembles d'études connexes

Foreign Policy in countries (rest of unit 1 lectures)

View Set

IGCSE Physics: Electronics components

View Set

PrepU- Chapter 3: Collecting Objective Data: The Physical Examination

View Set