DESCRIPTIVE STATISTICS CHAPTER 2: MEASURE OF CENTRAL TENDENCY
Standard deviation
is a number used to tell how measurements for a group are SPREAD OUT FROM THE AVERAGE (mean or expected value).
CV or the coefficient of variation
is a statistical measure of the DISPARITY (a difference in level) of data points in a data set around the mean.
Median
it is the positional average
Mean, Median and the Mode
One important measure used to describe a whole set of data with a single value that represents the middle or center of its distribution is the
MEDIAN
______ is the positional average. There is a need to organize first the data from lowest to highest, before we can get the "middle score or value".
Outliers
are EXTREME, or atypical data value(s) that are notably different from the rest of the data.
Negative
A kurtosis value that is NEGATIVE shows a LIGHT TAILS which means that there are little amount of data in the tails of a distribution
positive
A kurtosis value that is ____ pictures a HEAVY TAILS- that is there are a lot of data in the tail part of the distribution
Standard Deviation
A quantity calculated to indicate the extent of DEVIATION for a group as a whole. =stdev(highlight the data sets)
MEAN
Also known as the ARITHMETIC MEAN, is the most reliable measure of central tendency. Operationally, it is derived by adding up all the observed values divided by the total number of observed values
mean
As the ____ includes every value in the distribution the mean is influenced by outliers and skewed distributions.
good
CV is 10-20
Acceptable
CV is 20-30
very good
CV is <10
not acceptable
CV>30
relative dispersion of data
Coefficient of Variation is sometimes called?
LEPTOKURTIC
More values in the distribution tails and more values close to the mean (i.e. sharply peaked with heavy tails) • Having greater kurtosis than the normal distribution; more concentrated about the mean. Coefficient of K > 3
discrete variable
_____ is a numeric variable. Observations can take a value based on a count from a set of distinct WHOLE VALUES.
coefficient of variation
represents the ratio of the standard deviation to the mean,
SD
tandaan, kapag greater ABSOLUTE dispersion, saan ang icocompare, mean or standard deviation?
Mean
tandaan, kapag greater RELATIVE dispersion, saan ang icocompare, mean or standard deviation?
median
the _____ cannot be identified for categorical NOMINAL data, as it cannot be logically ordered.
median
the preferred measure of central tendency when the distribution is NOT SYMMETRICAL.
continuous
variable is a NUMERIC VARIABLE. Observations can take any value between a certain set of REAL NUMBERS.
Skewed to the left (Negatively skewed)
when the tail on the left side of the distribution is longer than the right side.
Skewed to the right (Positively skewed)
when the tail on the right side of the distribution is longer than the left side
CV
-volatility -variability
symmetrical - normal skewed to the right - positively skewed skewed to the left - negatively skewed
REMEMBER:
continuous and discrete
The MEAN can be both use for?
MESOKURTIC
A distributions that are moderate in breadth and curves with a medium peaked height.
MESOKURTIC
A distributions that are moderate in breadth and curves with a medium peaked height. It is described to be normal. K = 3
Variance
It measures how far a set of numbers is SPREAD out from their average value. =standard devation^2
Skewed to the right (Positively skewed)
In this case, the mean is higher than the Median and the mode. Then median tends to be higher than the mode. MEAN ---> MEDIAN ----> MODE
Skewed to the right (Positively skewed) (2)
In this case, the mean is higher than the Median and the mode. Then median tends to be higher than the mode. The mean is 'pulled' toward the right tail of the distribution. COEFFCIENT OF SKEWNESS > 0 (meme)
Skewed to the right (positively skewed) (2)
In this case, the mean is less than the Median and the mode. Then mode is higher than the median. The mean is 'PULLED' toward the RIGHT tail of the distribution. Coefficient of sk < 0
Kurtosis
It describes the SHARPNESS of the peak of a frequency distribution curve.
CV or the coefficient of variation
It is a useful indicator of COMPARISON on the differences of one data series to another, even if the means are drastically different from one another.
A low standard deviation means that most of the numbers are close to the average, while a high standard deviation means that the numbers are more spread out.
REMEMBER:
A small variance indicates that the data points tend to be very close to the mean, and to each other. A high variance indicates that the data points are very spread out from the mean, and from one another. Variance is the average of the SQUARED distances from each point to the mean.
REMEMBER:
median
The ____ is LESS affected by OUTLIERS and SKEWED data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.
Skewed to the left (Negatively skewed)
The mean is 'pulled' toward the left tail of the distribution.
Range (2)
The___ is the simplest measure of variation. Operationally, it measures the distance between the highest observed value from the smallest observed value.
PLATYKURTIC
This is characterized by a distributions that have NEGATIVE kurtosis. The tails are very thin compared to the normal distribution, resulting in fewer extreme positive or negative events. Coefficient of K < 3
Range
This value indicates "how far are the 2 extreme values" ? = H - L 'Max minus Min"
Range
This value indicates "how far are the 2 extreme values" Range = H - L 'Max minus Min"
SYMMETRICAL DISTRIBUTION
When the mean, median and the mode becomes equal, then the distribution assumed normality. It is said to be symmetrical or balanced. COEFFICIENT OF SKEWNESS = 0
normality
When the mean, median and the mode becomes equal, then the distribution assumed? It is said to be symmetrical or balanced.
Measures of Variation
gives us the information on how data in a given distribution are being DISPERSED or SPREAD. • It also described how similar or varied the set of observed values
Categorical variables
have values that describe a 'QUALITY' or 'CHARACTERISTIC' of a data unit, like 'what type' or 'which category'.
Numeric variables
have values that describe a measurable quantity as a number, like 'HOW MANY' or 'HOW MUCH'.
numbers are more spread
high average standard deviation = ?
MODE
is a central measure that refers to MOST FREQUENT data in a given distribution.
Kurtosis
is a measure of the "tailedness" of the probability distribution of a real-valued random variable. It describes the sharpness of the peak of a frequency- distribution curve.
Skewness
is the tendency for the values to be more frequent around the high or low ends of the x-axis.
Skewness (2)
is the tendency for the values to be more frequent around the high or low ends of the x-axis.
close to average or close to mean
low standard deviation = ?