GS ECO 302 CH 3 Describing Data: Numerical Measures

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

How is the median determined for an even number of observations?

1. As before, the observations are ordered. 2. Then by convention to obtain a unique value we calculate the mean of the two middle observations. So for an even number of observations, the median may not be one of the given values.

Properties of the Arithmetic Mean

1. Every set of interval and ratio-level data has a mean. 2. All the values in the data set are included in computing the mean. 3. The mean is unique. 4. The sum of the deviations of each value from the mean is zero. ∑(X - X̅ ) = 0 Example: the mean of 3, 8, and 4 is 5. Then: ∑(X - X̅ ) = (3-5)+(8-5)+ (4-5) = -2+3-1 = 0

Standard Deviation (SD)

A measure of variability that indicates the average difference between the scores and their mean. The square root of the variance. 2 versions: ○ Population Standard Deviation ○ Sample Standard Deviation

If a distribution is nonsymmetrical, or skewed, the relationship among the three measures changes.

A positively skewed distribution (right, the tail trails smaller to the right), the arithmetic mean is the largest of the three measures. Why? Because the mean is influenced more than the median or mode by a few extremely high values. Mean > Median > Mode If the distribution is highly skewed, such as the weekly incomes in Chart 3-3, the mean would not be a good measure to use. The median and mode would be more representative.

Measures of Location

A statistic that describes a location within a data set. Measures of central tendency describe the center of the distribution. Measures of location are often referred to as averages. The purpose of a measure of location is to pinpoint the center of a distribution of data.

Parameter

Any measurable characteristic of a population

Dispersion the pattern of spacing of a population within an area A small value for a measure of dispersion indicates that the data are clustered closely, say, around the arithmetic mean. The mean is therefore considered representative of the data.

Conversely, a large measure of dispersion indicates that the mean is not reliable. Refer to Chart 3-5. The 100 employees of Hammond Iron Works Inc. a steel fabricating company, are organized into a histogram based on the number of years of employment with the company. The mean is 4.9 years, but the spread of the data is from 6 months to 16.8 years. The mean of 4.9 years is not very representative of all the employees. 1. You would want to know something about the variation

If a distribution is nonsymmetrical, or skewed, the relationship among the three measures changes. (cont)

Conversely, if a distribution is negatively skewed (left, the tail starts on the left and values increase), the mean is the lowest of the three measures. Modal > Median > Mean Again, if the distribution is highly skewed, such as the distribution of tensile strengths shown in Chart 3-4, the mean should not be used to represent the data.

range

Distance between highest and lowest scores in a set of data. = Largest value - smallest value The range is widely used in statistical process control (SPC) applications because it is very easy to calculate and understand. A defect of the range is that it is based on only two values, the highest and the lowest; it does not take into consideration all of the values.

Mean Deviation (example) Step 1. Find the average of the values provided. Step 2. Subtract that number from each of the individual values, list the absolute value (positive) of the result. Step 3. Add those results together. Step 4. Divide the total by the # of observations (count of, not sum).

Example 5 days Day 1 :20 Day 2 :49 Day 3 :50 Day 4 :51 Day 5 :80 Step 1. Find the average of = (20+49+50+51+80) / 5 = 50 (average) Step 2. Subtract average from individual values, list absolute value = Day 1: 20 - 50 = | 30 | = Day 2: 49 - 50 = | 1 | = Day 3: 50 - 50 = 0 = Day 4: 51 - 50 = 1 = Day 5: 80 - 50 = 30 Step 3. Add those results together = 30 + 1 + 0 + 1 + 30 = 62 Step 4. Divide by the total observations = 62 / 5 days = 12.4 (Mean Deviation)

The Relative Positions of the Mean, Median, and Mode

For any symmetric distribution the mode, median, and mean are located at the center and are always equal.

Chebyshev's Theorem For any shape of distribution The Russian mathematician P. L. Chebyshev (1821-1894) developed a theorem that allows us to determine the minimum proportion of the values that lie within a specified number of standard deviations of the mean. For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1 - (1/k^2), where k is any constant greater than 1.

For example, according to Chebyshev's theorem, at least three of four values, or 75 percent, must lie between the mean plus two standard deviations and the mean minus two standard deviations. This relationship applies regardless of the shape of the distribution. σ = standard deviation ○ 2σ = contain 75% or 3/4 ○ 3σ = contain 88.9% or 8/9 ○ 5σ = contain 96% or 24/25 of values All you need is the standard deviation (may need to calculate) then plug the standard deviation as "k" in the formula; can reverse and solve for k if you know the % of population or sample.

Geometric Mean (example) Suppose you receive a 5 percent increase in salary this year and a 15 percent increase next year. The average annual percent increase is 9.886, not 10.0. Why is this so?

GM = n√(1.05)(1.15) = 1.09886 This can be verified by assuming that your monthly earning was $3,000 to start and you received two increases of 5 percent and 15 percent. n=2 in this case because 2 raises/periods Raise 1 = $3K * 5% = $150 Raise 2 = $3,150 * 15% = $427.50 Total = $622.50 Equivalent to $3K * .09886 = $296.58 $3,296.58 * .09886 = $325.90 Total = $622.48 If you had done this at flat 10% $3K * 10% = $300 $3,300 * 10% = $330 Total = $630.00 which is off from $622.48 See image for additional example. Note how they wrote a negative loss as .6 instead of -.4.

Geometric Mean is useful in finding the average change of percentages, ratios, indexes, or growth rates over time. It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the Gross Domestic Product, which compound or build on each other. How to enter on a calculator: https://www.youtube.com/watch?v=LaezBiaqg0M

GM = n√(x1)(x2)...(xn) ○ Must be positive values ○ The geometric mean will always be less than or equal to (never more than) the arithmetic mean. ○ n = the nth root of the product of n values. I.e. if there are 2 values it is square root, if there are 5 values it is the 5th root

Example of table format for Arithmetic Mean of Grouped Data Sample Variance of Grouped Data Standard Deviation of Grouped Data

Mean Formula Grouped Data X̅ = ∑fM n Sample Var. Grouped Data s^2 = f(M - X̅)^2 n - 1 Standard Dev. Grouped Data s = √∑f(M - X̅)^2 n - 1 Note use of M (midpoint) instead of X (value)

The mean does have a weakness. Recall that the mean uses the value of every item in a sample, or population, in its computation.

Mean unduly affected by unusually large or small values. If one or two of these values are either extremely large or extremely small compared to the majority of data, the mean might not be an appropriate average to represent the data. For example, suppose the annual incomes of a small group of stockbrokers at Merrill Lynch are $62,900, $61,600, $62,500, $60,800, and $1,200,000. The mean income is $289,560. Obviously, it is not representative of this group, because all but one broker has an income in the $60,000 to $63,000 range. One income ($1.2 million) is unduly affecting the mean.

2. second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions.. Suppose, for example, that the new Vision Quest LCD computer monitor is assembled in Baton Rouge and also in Tucson. The arithmetic mean hourly output in both the Baton Rouge plant and the Tucson plant is 50. Based on the two means, you might conclude that the distributions of the hourly outputs are identical.

Production records for 9 hours at the two plants, however, reveal that this conclusion is not correct (see Chart 3-6). Baton Rouge production varies from 48 to 52 assemblies per hour. Production at the Tucson plant is more erratic, ranging from 40 to 60 per hour. Therefore, the hourly output for Baton Rouge is clustered near the mean of 50; the hourly output for Tucson is more dispersed.

A second application of the geometric mean is to find an average percentage change over a period of time.

RATE OF INCREASE OVER TIME GM = n√Value at end of period - 1 Value at start of period n = number of periods

Normal Distribution Example to calculate Range and Standard deviation Range Example: If average is 100 & standard deviation (s) is 10 then High = 100 + 3s = 100 + 3*10 = 130 Low = 100 - 3s = 100 - 3*10 = 70 Range = 130 (H) - 70 (L) Range = 60

Standard Deviation Example s = Range / 6 s = 60 / 60 s = 10

Mean as a balance point

Suppose three bars of equal weight were placed on the board at numbers 3, 4, and 8, and the balance point was set at 5, the mean of the three numbers. We would find that the board is balanced perfectly! The deviations below the mean (3) are equal to the deviations above the mean (3). Shown schematically:

Variance based on the deviations from the mean. However, instead of using the absolute value of the deviations, the variance and the standard deviation square the deviations. (gets rid of the negatives)

The arithmetic mean of the squared deviations from the mean. The variance is non-negative and is zero only if all observations are the same. 2 versions: ○ Population Variance ○ Sample Variance

Measures of Dispersion

The general term for any measure of the spread or variation in a set of data. To describe the dispersion, we will consider the range, the mean deviation, the variance, and the standard deviation.

Median (example) However, checking the prices of the individual units might change your mind. They are $60,000, $65,000, $70,000, and $80,000, and a super deluxe penthouse costs $275,000. The arithmetic mean price is $110,000, as the real estate agent reported, but one price ($275,000) is pulling the arithmetic mean upward, causing it to be an unrepresentative average.

The median price of the units available is $70,000. To determine this, we order the prices from low ($60,000) to high ($275,000) and select the middle value ($70,000). For the median, the data must be at least an ordinal level of measurement.

Population Mean Formula The Population Mean

The population mean is the sum of all the values in the population divided by the number of values in the population. μ = (ΣX)/ N where: μ represents the population mean. It is the Greek lowercase letter "mu." N = the number of values in the population X = represents any particular value Σ is the Greek capital letter "sigma" and indicates the operation of adding. means "the sum of." ΣX is the sum of the X values in the population.

Interpretation and Uses of the Standard Deviation The standard deviation is commonly used as a measure to compare the spread in two or more sets of observations.

The smaller the deviation (in the comparison) the less dispersed the data and the more reliable the mean (average). [closer clustering]

Mode The mode does have disadvantages, however, that cause it to be used less frequently than the mean or median. For many sets of data, there is no mode because no value appears more than once. Conversely, for some data sets there is more than one mode.

The value of the observation that appears most frequently ○ The mode is especially useful in summarizing nominal-level data. In summary, we can determine the mode for all levels of data— nominal, ordinal, interval, and ratio. The mode also has the advantage of not being affected by extremely high or low values.

Example : Geometric Mean of change over a period of time. During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada, was the fastest-growing city in the United States. The population increased from 258,295 in 1990 to 607,876 in 2009. This is an increase of 349,581 people, or a 135.3 percent increase over the period. The population has more than doubled. What is the average annual percent increase?

There are 19 years between 1990 and 2009, so n 19. Then formula (3-5) for the geometric mean as applied to this problem is: (see image) The value of .0461 indicates that the average annual growth over the period was 4.61 percent. To put it another way, the population of Las Vegas increased at a rate of 4.61 percent per year from 1990 to 2009.

Arithmetic Mean of Grouped Data

X̅ = ∑fM n X̅ = is the designation for the sample mean M = is the midpoint of each class f = is the frequency in each class. fM is the frequency in each class times the midpoint of the class. ∑fM = is the sum of these products. n is the total number of frequencies

Weighted Mean Formula

X̅w = ∑(wX) ∑w The weighted mean is a special case of the arithmetic mean. It occurs when there are several observations of the same value. We multiply each observation by the number of times it happens. We will refer to the weighted mean as This is read "X bar sub w." Note that the denominator of a weighted mean is always the sum of the weights.

Statistic

a numerical measurement describing some characteristic of a sample The mean of a sample, or any other measure based on sample data

measures of dispersion (variability)

range, variance, standard deviation

Sample Standard Deviation Formula

s = √∑ (X - X̅ )^2 n-1 The sample standard deviation is the square root of the sample variance. Not to be used for Grouped Data

Sample Variance It requires a change in the denominator. Instead of substituting n (number in the sample) for N (number in the population), the denominator is n - 1. Although the use of n is logical since is X̅ is used to estimate µ, it tends to underestimate the population variance, σ^2. The use of (n - 1) in the denominator provides the appropriate correction for this tendency. Because the primary use of sample statistics like s^2 is to estimate population parameters like σ^2, (n - 1) is preferred to n in defining the sample variance. We will also use this convention when computing the sample standard deviation.

s^2 = ∑ (X - X̅ )^2 n-1 where: s^2 = is the sample variance. X = is the value of each observation in the sample. X̅ = is the mean of the sample. n = is the number of observations in the sample.

Empirical Rule (68-95-99.7) Rule Sometimes call the "Normal Rule" However, for a symmetrical, bell-shaped distribution such as the one in Chart 3-7, we can be more precise in explaining the dispersion about the mean.

states that, in a normal distribution, ○ about 68% of the terms are within one standard deviation of the mean, ○ about 95% are within two standard deviations, and ○ about 99.7% are within three standard deviations Using this rule you can estimate range and standard deviation For range: Must know: average (X̅ ) Calculate: high value = X̅ + 3*(s) low value = X̅ - 3*(s) 3 = 3σ to capture 99.7% of values s = σ standard deviation (provided) Range = high - low For s = σ standard deviation Must know: Range Calculate : Range ÷ 6 = standard deviation "6" is used because that is +/- 3 deviations (99.7% of values)

Mean Deviation Why do we ignore the signs of the deviations from the mean? If we didn't, the positive and negative deviations from the mean would exactly offset each other, and the mean deviation would always be zero. Such a measure (zero) would be a useless statistic. The mean deviation has two advantages. ○ First, it uses all the values in the computation. ○ Second, it is easy to understand—it is the average amount by which values deviate from the mean. However, its drawback is the use of absolute values. Generally, absolute values are difficult to work with and to explain, so the mean deviation is not used as frequently as other measures of dispersion, such as the standard deviation.

the arithmetic mean of the absolute values of the deviations from the arithmetic mean MD = ∑||X - X̅ | n where: X is the value of each observation. X̅ is the arithmetic mean of the values. n is the number of observations in the sample. indicates the absolute value

Median

the middle score in a distribution; half the scores are above it and half are below it The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. 1. It is not affected by extremely large or small values. Therefore, the median is a valuable measure of location when such values do occur. 2. It can be computed for ordinal-level data or higher. Recall from Chapter 1 that ordinal-level data can be ranked from low to high.

Sample Mean Formula

x̄ = ∑X / n Where x̄ represents the sample mean. It is read "X bar." n is the number of values in the sample X represents any particular value ∑ is the Greek capital letter "sigma" and indicates the operation of adding. ∑x is the sum of the x values in the sample.

Population Standard Deviation Formula The variance is difficult to interpret for a single set of observations. The variance of 124 for the number of citations issued is not in terms of citations, but citations squared.

σ = √ ∑ (x - µ)^2 N There is a way out of this difficulty. By taking the square root of the population variance, we can transform it to the same unit of measurement used for the original data. The square root of 124 citations-squared is 11.14 citations. The units are now simply citations. The square root of the population variance is the population standard deviation.

Population Variance Formula Steps: 1. Begin by finding the mean. 2. Find the difference between each observation and the mean, and square that difference. 3. Sum all the squared differences. 4. Divide the sum of the squared differences by the number of items in the population. For populations whose values are near the mean, the variance will be small. Like the range and the mean deviation, the variance can be used to compare dispersion in two or more sets of observations.

σ^2 = ∑ (x - µ)^2 N where: σ^2 = is the population variance x = is the value of a particular observation in the population µ = is the arithmetic mean of the population N = is the number of observation in the population


Ensembles d'études connexes

Rheum & HIV & Allergies Quiz Questions

View Set

Chapter 31: Eating disorders - Management of eating and weight

View Set

Anaphylaxis and Transplant Review-Prep U

View Set