Chapter 2 terms Ba 201

Ace your homework & exams now with Quizwiz!

histogram

A common graphical presentation of quantitative data

cumulative frequency distribution

A variation of the frequency distribution that provides another tabular summary of quantitative data

legitimately missing data

Data sets commonly include observations with missing values for one or more variables. In some cases missing data naturally occur; these are called

Missing at random (MAR)

If the tendency for an observation to be missing a value for some variable is related to the value of some other variable(s) in the data, the missing value is called

Missing completely at random (MCAR)

If the tendency for an observation to be missing the value for some variable is entirely random, then whether data are missing does not depend on either the value of the missing data or the value of any other variable in the data. In such cases the missing value is called

illegitimately missing data

In other cases missing data occur for different reasons; these are called

Quartiles

It is often desirable to divide data into four parts, with each part containing approximately one-fourth, or 25 percent, of the observations. These division points are referred to as the

Outliers

Sometimes a data set will have one or more observations with unusually large or unusually small values. These extreme values are called

Interquartile Range (IQR)

The difference between the third and first quartiles is often referred to as

random variable, uncertain variable

a quantity whose values are not known with certainty

Observation

a set of values corresponding to a set of variables

population

all elements of interest

z-score

allows us to measure the relative location of a value in the data set

median

another measure of central location, is the value in the middle when the data are arranged in ascending order (smallest to largest value).

cross sectional data

are collected from several entities at the same, or approximately the same, point in time.

Data

are the facts and figures collected, analyzed, and summarized for presentation and interpretation.

range

can be found by subtracting the smallest value from the largest value in a data set.

Empirical Rule

can be used to determine the percentage of data values that are within a specified number of standard deviations of the mean.

time series data

collected over several time periods.

sample

data from a subset of the population

categorical data

if arithmetic cannot be performed on the data it is

quantitative data

if numeric and arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed on them.

Missing Not at Random (MNAR)

if the tendency for the value of a variable to be missing is related to the value that is missing.

coefficient of variation

indicates how large the standard deviation is relative to the mean.

Covariance

is a descriptive measure of the linear association between two variables.

box plot

is a graphical summary of the distribution of data.

geometric mean

is a measure of location that is calculated by finding the nth root of the product of n values.

variance

is a measure of variability that utilizes all the data.

frequency distribution

is a summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes, typically referred to as bins.

relative frequency distribution

is a tabular summary of data showing the relative frequency for each bin.

scatter chart

is a useful graph for analyzing the relationship between two variables.

standard deviation

is defined to be the positive square root of the variance.

Variation

is the difference in a variable measured over observations (time, customers, items, etc.).

dimension reduction

is the process of removing variables from the analysis without losing crucial information

Percentile

is the value of a variable at which a specified (approximate) percentage of observations are below that value.

mode

is the value that occurs most frequently in a data set.

approximate bin width

largest data value minus smallest data value divided by the number of bins

correlation coefficient

measures the relationship between two variables, and, unlike covariance, the relationship between two variables is not affected by the units of measurement for x and y.

Mean

most commonly used measure of location is the mean (arithmetic mean), the average, measure of central location

% frequency distribution

summarizes the percent frequency of the data for each bin.

imputation

systematic replacement of missing values with values that seems reasonable

A characteristic or a quantity of interest that can take on different values is known as a

variable


Related study sets

Chapter 4: Telecommunications and Networking (Before You Go On)

View Set

Topic 1: Intro to Density & The Practice of Science

View Set

practice teaching reading praxis

View Set

Chapter 24: Asepsis and Infection Control

View Set