Stats Chapter 3
The Sample Mean
(Statistic) we often select a sample from a population to estimate a specific characteristic of the population SM = Sum of all the values in the sample / Number of values in the sample
Range
(simplest measure) Range = Maximum value- Minimum value *it is influenced by extreme values
Rate of increase over time
*Formula Example: what is the average annual % increase? 1990 population is 258,295 2014 population is 613,599 Increase of 355,304 people GM= (24 square root 613,599/258,295)-1 =.0367 *24 years between 1990 and 2014 so n=24
Step 4) The sum of the deviations of each value from the mean is zero
Example: The mean of 3,8,4 is 5 then, (3-5) + (8-5) + (4-5) = -2 +3 -1 = 0
The Empirical Rule
For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus and minus one standard deviation of the mean, about 95% of the observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will lie within 3 standard deviations of the mean.
Chebyshev's Theorem
For any set of observations (sample/population) the proportion of the values that lie within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1.
A Mean's weakness
If one or two values are either extremely large or small compared to majority of the data Example: $62,900, $61,600, $62,500, $60,800, $1,200,000 Average will be effected by $1,200,000
Sample Variance and Standard Deviation
*Formula denominator (n-1) -Standard Deviation is square root of Sample Variance -cannot be negative -most widely used measure of dispersion
Characteristics of a Median
-At least the ordinal scale of measurement is required (ranking/rating) -It is not influenced by extreme values -It is unique to a set of data
Arithmetic Mean consist of..
-Population mean -Sample mean *most widely used
Geometric Mean
-The geometric mean of a set of n positive numbers is defined as the nth root of the product of n values. -Useful in finding the average change of percentages, ratios, indexes, or growth rates over time *Formula -will always be less than or equal to mean -all data must be positive
Ethics and Reporting Results
-Useful to know the advantages and disadvantages of mean, median, and mode -Important to maintain an independent and principled point of view -Statistical reporting requires objective and honest communication of any results
Mode Characteristics
-can be found for nominal level data -a set of data can have more than one mode: Bimodal Example: 1 2 5 2 6 7 Mode: 2 Weakness: can be no mode in a set of data
Symmetric Distribution
-has same shape on either side of the center
Positively Skewed Distribution
Mean is the largest Example: distribution of weekly incomes (Mean is most influenced vs Mode and Median)
Negatively Skewed
Mean is the lowest of the 3 measures
Measures of Location
referred to as averages, purpose is to pinpoint the center of a distribution of data Example: average US home changes ownership every 11.8 years
If the distribution is non-symmetrical or skewed...
the relationship among the 3 measures changes (mean, median, mode)
The Weighted Mean
the weighted mean is found by multiplying each observation, x by it's corresponding weight, w *Formula Example: Biggie-sized soft drinks for $1.84, $2.07, and $2.40, respectively. Of the last 10 drinks sold, 3 were medium, 4 were large, and 3 were Biggiesized =3($1.84)+4($2.07)+3($2.40) / 10 =$21.00/10 =$2.10
Population mean
(Parameter) involve all the values in a population Example: The closing price for Johnson and Johnson stock for the last 5 days is $95.47 Population mean= sum of all the values in the pop / Number of values in the pop.
Properties of the Arithmetic Mean
1) The data must be measured at the interval or ratio level 2) All data values are used in the calculation 3)It is unique, only one mean in a set of data 4)The sum of the deviations from the mean equals zero
Population Variance
1)Find mean 2)Find the difference between each observation and mean, square that difference 3) sum all square deviations 4) Divide sum by Number of items in population -units are somewhat difficult to work with, are the original units squared *formula
Geometric Mean Example
5% increase in salary in 2020 15% increase in salary in 2021 5% increase = 105% = 1.05 10% increase =115%= 1.15 GM= Square root of (1.05)(1.15) = 1.09886
Arithmetic mean of grouped data Example Applewood Auto
Profit:$200 up to $600 Midpoint(M): 400 Frequency(F): 8 fm: 3,200 fm found by 400x8=3,200 Arithmetic mean= total FM/total F =333,200/180=$1,851.11
To find Median for EVEN number set of data:
Sort observations and calculate the average of the 2 middle values Example: 1 3 3 5 5 7 9 9 10 17 5+7=12 12/2=6 Median=6
Step 1 for Standard Deviation of Grouped Data
Step 1) Midpoint-Arithmetic mean (M-X) 400-1,851= -$1451 800-1,851= -$1051 etc
Step 2 for Standard Deviation of Grouped Data
Step 2) (Midpoint-Arithmetic mean)^2 (M-X)^2 (-1451)^2=2,105,401 (-1051)^2=1,104,601
Step 3 for Standard Deviation of Grouped Data
Step 3) Frequency(Midpoint-Arithmetic mean)^2 F (M-X)^2 8(400-1851)^2=16,843,208
Step 4 for Standard Deviation of Grouped Data
Step 4) Find the sum of F (M-X)^2 total= 76,169,920 Standard Deviation= square root of 76,169,920 / 180 - 1 *denominator n-1 =652.33
Population Standard Deviation
Taking the square root of the population variance
Chebyshev's Theorem Example
The arithmetic mean biweekly amount contributed by the Dupree Paint employees to the company's profit-sharing plan is $51.54, and the standard deviation is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean? 1 - 1/(3.5)^2 = 0.92 =92%
Variance
The arithmetic mean of the squared deviations from the mean Example: Number coffee sold: 20 Value-Mean: 20-50=-30 Squared Deviation: 900 *get total number of squared deviation: 2000 and divide by number of items:5 2000/5= 400
The Median
The midpoint of the values after they have been ordered from the minimum to the maximum values Example 60,000 65,000 70,000 80,000 275,000 Median: 70,000
Mode
The value of the observation that appears most frequently
Dispersion
also called variation or the spread in the data Example: Salaries for executives for internet films $70,000-$90,000 (avg $80,000) Salaries for marketing executives $40,000-$120,000 (avg $80,000)
5 measures of location
arithmetic mean the median the mode the weighted mean the geometric mean