Descriptive Statistics

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Cumulative Frequency Distributions

A tabular display of the number of observations in a batch of data that have a value less tan or equal to each value of the measurement.

Limits of Cumulative Relative Frequency Distribution

Requires that the data be at least ordinal so that the values can be arranged from smallest to largest.

Limitations of skewness

Requires the measurement to be interval

Frequency Polygons

Similar to a histogram except the frequency or relative frequency is represented by the height of a line

Sensitivity to extreme criteria:

The mean is the most sensitive, the mode is the least.

Information content criteria:

The mean uses the most of the information, the mode the least.

The Objective of descriptive statistics

The objective of descriptive statistics is to summarize data. To provide a method to convey impressions about the data.

The Purpose of Skewness

To describe if values on one side of central tendency are more or less common than values on the other side. If the tail on the right is longer it is positively skewed or skewed to the right.

Purpose of Dispersion Statistics

To describe something about the atypical values or the spread of values.

Purpose of Univariate Graphical Summaries

To provide a rapid way to summarize tabular information

Purpose of Univariate Numerical Descriptions

To provide an even more compact summary of data that yields impressions that one could get from a graphical summary.

Purpose of Central Tendency Statistics

To provide some description of the common, typical, or representative values of the measurement.

Data sumarizing problems

Univraiate Problems Bivariate problems Multivariate problems

Benefit to Cumulative Frequency Distribution

When compared to a frequency distribution it provides positional information about each value. Works even if x is a continuous measurement!

The loss function criteria:

When we use the measures of central tendency to represent the entire batch of data, there are errors associated with this process. We should choose a way to represent the data with the descriptive statistic that has the lowest possible error of the type we consider to be most important.

What is the issue with grouping?

You end up with a lot of arbitrary features.

The computation criteria:

does the computation make sense for the type of data it is.

Measures of dispersion

range, interquartile range, mean absolute deviation, mean squared deviation, standard deviation

Cumulative Relative Frequency Distribution

A tabular display of the fraction not observations in a batch of data that have a value less than or equal to each value of the measurement.

Relative Frequency Distribution

A tabular display of the fraction of observations in a batch of data that is associated with each value of the measurement.

Limitations to Cumulative Frequency Distribution

1. Must be discrete measurement, no commonness information provided for individual values. 2. Requires that the data be at least ordinal so that the values can be arranged from smallest to largest.

Frequency Distribution Limitations

1. Provides no useful summary if the measurement is continuous unless the values are grouped. 2. Grouping data involves introducing both error and arbitrariness 3. The frequency in isolation provides no information about the commonness of values.

Limitations to Relative frequency distributions

1. Provides no useful summary if the measurement is continuous unless the values are grouped. 2. Grouping data involves introducing both error and arbitrariness.

Benefits of Cumulative Relative Frequency Distribution

1. Provides the commonness of positional information about each value. 2. Works even if the measurement is continuous!!

Relative Frequency Density Curve

A curve which represents frequency per unit width as a limiting process in which the width of an interval (amount of grouping) gets smaller and smaller. This approach is designed to combat the arbitrary aspects of grouping when the measurement is continuous. Not used for discrete measurements.

Ogives

A polygon based on the cumulative relative frequency. Commonness is portrayed by steepness. Positional information portrayed by value.

Frequency Distribution

A tabular display of how often each value of a measurement occur in a batch of data.

Limitations to measures of dispersion

Almost all dispersion statistics require that the measurement be interval

Limitation of the Mode statistic

Because it is based on frequency, frequency must be meaningful. Requires discrete data.

Limitation of the Median Statistic

Because the construction requires the use of the greater than or equal to priority of values, the measurement must be at least ordinal.

Limitation of the mean statistic

Because the construction requires the values to be added together, sums and differences must make sense. So, the measurements must be at least interval for the mean to be a useful statistic.

Mean

Calculated as the sum of all values divided by the number of observations in the batch of data.

Types of Univariate Numerical Descriptions

Central tendency, dispersion, skewness, kurtosis, locational information

The 5 criteria to guide the choice of statistic

Computation, information content, purpose, sensitivity to extremes, and the loss function.

Mode

Describes the most frequency occurring value of the measurement

Median

Describes the value suc that at least 50% of the data is less than or equal to that value and at least 50% is greater than or equal to that value.

The two key properties for choosing a descriptive approach

Is the measurement continuous or discrete What is the level of the measurement?

Summarizing Strategy

List Oriented Descriptions Graphical Descriptions Numerical Descriptions

Purpose criteria:

Mode is best for defining typical value, median is best for describing typical individual, mean is best for representing entire batch of data.

Types of central tendency statistics

Mode, median, arithmetic mean (mean), geometric mean, trimmed mean, weighted mean, etc.

Choice between alternatives of dispersion:

Most dispersion measures are also loss functions. Sensitivity to extremes is a common rationale for choice. But, we do want it to be sensitive. Naturalness or interpretability of the statistic (MSD and variance are unnatural).

Relative Frequency Polygons

Picture conveys the same impression as a frequency polygon except the vertical scale is changed.

Benefits of Relative Frequency Distribution

Provides useful information about the commonness of each value.


Set pelajaran terkait

Medical Ethics and Law: Chapter 9: The Medical Record

View Set

EMT: Chapter 39 [incident management]

View Set

Ch.2 Study Guide (Physical Science)

View Set

Unit 5 - Elizabethan Drama : The Tragedy of Hamlet, Prince of Denmark

View Set

AU 67: Chapter 4 - Loss Sensitive Plans

View Set