Elementary Statistics Exam 1
(Clu/Conv/Sim/Ran/Str/Sys?): Look in the newspaper and consider the first 10 apartment units that list rent per month.
Convenience sampling
(Clu/Conv/Sim/Ran/Str/Sys?): Divide apartment units according to the number of bedrooms and then sample from each group.
Stratified sampling
Weighted Average= ??? (formula)
WA= ΣXW/ΣW ("x" is a data value and "w" is the weight assigned to that data value.)
(Ex. of Weighted Average): Suppose your midterm test score is 83 and your final exam score is 95. Using weights of 40% for the midterm and 60% for the final exam, compute the weighted average of your scores. If the minimum average for an A is 90, will you earn an A?
WA= ΣXW/ΣW= (83x0.4)+(95x0.6)/0.4+0.6= 90.2 Answer: 90.2% & Yes, you'll get an A.
Chebyshev's Theorem
Used to determine how data is spread out around the mean for ANY distribution.
Σx²= ???
Square each data value, then add them together.
Examples of "Ordinal"
course letter grades, type of car you drive (size), HS class rank, etc.
Examples of "Interval"
dates, time, temperature, size of groups dining together, etc.
Examples of "NominaL"
eye color, SSN, degree/major, favorite color, state ID #, residence, gender, etc.
Relative Frequency= ???
frequency/total # of data values
Location formula=
loc= n+1/2
Sample Mean formula
x̄ =Σx/n (Σx= add up all the data values, "n" is the # of data values in the set.)
Sample Mean symbol
x̅
Population Mean symbol
μ (mu) -We will not have to calculate this.
Population Standard symbol
σ (sigma)
Population Variance symbol
σ² (sigma squared)
(Clu/Conv/Sim/Ran/Str/Sys?): Select 5 zip codes at random and include every apartment unit in the selected zip codes.
Cluster sampling
Cumulative Frequency= ???
# of data values in that class AND the previous classes (running total)
the p^th percentile
-A value such that p% of the data fall AT or BELOW the % value. - (1</= P</= 99)
Chebyshev's Theorem tells us that:
-At least 75% of the data values fall within 2 standard deviations of the mean: x̅+/-2s --> (x̅-2s, x̅+2s) -At least 88.9% of the data values fall within 3 standard deviations of the mean: x̅+/-3s--> (x̅-3s, x̅+3s) -At least 93.8% of the data values fall within 4 standard deviations of the mean: x̅+/-4s--> (x̅-4s, x̅+4s)
Stratified Sampling
-Divide the entire population into distinct subgroups called strata. Draw random samples from each stratum. -sampling SOME from ALL of the groups. (kinda the opposite of cluster sampling)
Cluster Sampling
-Divide the entire population into pre-existing segments or clusters (usually geographic). Make a random selection of clusters. Include every member of each selected cluster in the sample. -sampling ALL from SOME of the groups. (kinda the opposite of stratified sampling)
Box and Whisker Plot
-Know how to draw,
Ordinal
-Qualitative. -Data that can be arranged in order, differences between data values either cannot be determined or are meaningless.
Nominal
-Qualitative. -Data that consists of names, labels, or categories. There are no implied criteria by which the data can be ordered from smallest to largest.
Interval
-Quantitative. -Data that can be arranged in order, difference between data values are meaningful. -No natural 0 starting point.
Ratio
-Quantitative. -Data that can be arranged in order. Both differences between data values and ratios of data values are meaningful. Data at the ratio level have a true zero.
Class Limits:
-Start with the smallest data value and increase by class width. Continue this sequence. LCL+CW-1
Population Data
-The complete collection of all elements (scores, people, measurements, and so on) to be studiesd; the collection is complete in that it includes data from EVERY individual or object of interest. -Sub-collection from the bigger picture.
Coefficient of Variation
-The ratio of the standard deviation to the mean. This calculation gives us a way to describe the distribution of the variable in a way that doesn't depend on the measurement unit. -CV=standard deviation/mean
Sample standard deviation...
-is based on the difference between each data value and the mean of the data set. -gives an average of the data spread about the mean. -The larger the standard deviation, the more spread out the data values are from the mean. -A smaller standard deviation indicates that the data tend to be closer to the mean.
Range tells us...
-the difference between the highest value and the lowest value. -about the spread of the data, but it does NOT tell us if most of the data is or is not closer to the mean.
How to compute a 5% trimmed mean:
1. Order the data from smallest to largest. 2. Delete bottom 5% of the data and the top 5% of the data. (ex. for 100 data values....100(0.5)=5....remove the 5 smallest #'s and the 5 largest #'s.) 3. Compute the mean of the remaining 90% of the data.
Procedure to Compute Quartiles:
1. Order the data smallest to largest. 2. Find the median. This is the 2nd quartile. 3. The first quartile (Q1) is the median of the lower half of the data. 4. The third quartile (Q3) is the median of the upper half of the data.
Ex. You scored a 90% on the 1st statistics test. This was the 82nd percentile. What percentage of the scores are at or below yours?
82%
Population Parameter
A numerical measure that describes an aspect of a population.
Sample Statistic
A numerical measure that describes an aspect of a sample.
Random Sample
A subset of a population is selected such that each individual from the population has an equal chance of being selected.
Simple Random Sample
A subset of the population is selected such that every sample of size n from the population has an equal chance of being selected.
Outlier
A value that is very different from other measurements in the data set.
(Σx²)= ???
Add the data values, then square the answer.
Weighted Average
Assigns levels of importance, or weight, to some numbers (ex: to calculate grades)
Coefficient of Variation formula
CV= standard deviation/mean - (on HW, not on test.)
Class Width(CW)= ??? (give formula)
CW= (largest-smallest)/# of classes
Convenience Sampling
Create a sample by using data from population members that are readily available.
Class Boundaries:
Lower class limit (LCL) -0.5, Upper class limit (UCL) +0.5. Continue for each class limit set.
5th-Number Summary
Lowest value, Q1, Q2, Q3, Highest Value
Midpoint= ??? (give formula)
Midpoint= (lower class limit + upper class limit)/2
Does the median change when you trim the mean?
No.
For a set population, does a parameter ever change?
No.
(Clu/Conv/Sim/Ran/Str/Sys?): The subjects in which college students major.
Nominal
Systematic Sampling
Number all members of the population sequentially. Then, from a starting point selected at random, include every kth member of the population in the sample.
(Clu/Conv/Sim/Ran/Str/Sys?): Survey responses of "satisfied, unsatisfied, undecided."
Ordinal
Frequency Distribution/Table
Partitions data into classes or intervals and shows how many data values are in each class. The classes or intervals are constructed so that each data value falls into exactly one class. The frequency table displays each data class along with the number (frequency) of data in that class.
Which quartile is the median of the lower half of the data?
Q1
Which quartile is the median?
Q2
Which quartile is the median of the upper half of the data?
Q3
(Clu/Conv/Sim/Ran/Str/Sys?): Salaries of employees at the Humane Society.
Ratio
Sample Standard Deviation symbol
S - (a larger standard deviation means more variability in the data.)
(Clu/Conv/Sim/Ran/Str/Sys?): A sample consists of every 18th student from a group of 200 students.
Systematic
(Clu/Conv/Sim/Ran/Str/Sys?): Call every 5th apartment complex listed in the yellow pages and record the rent of the unit.
Systematic sampling
Sample Data
The collection of elements (scores, people, measurements, and so on) from ONLY SOME of the individuals of interest.
Population Parameters: (different symbols for population data)...
The formulas for Population Mean, Population Variance, & Population Standard Deviation are the same as above, except... -N is used instead of n-1 in the denominator of σ² -The notation (symbols) is different.
Trimmed Mean
The mean of the data values left after "trimming" a specified percentage of the smallest and largest data values from the data set.
Variance and Standard Deviation
The measure of the spread of data around the mean.
n= ???
The number of data values.
Quartiles
The percentiles that divide the data into fourths.
Interquartile Range (IQR)
The spread of the middle half of the data: IQR = Q3 - Q1. (middle 50%)
Statistics
The study of how to collect, organize, analyze, and interpret numerical information from data.
If there are 3 different samples of the same size from a set population, is it possible to get 3 different values for the same statistic? Explain.
Yes, taking a sample from the same population can have different outcomes.
Ex. Consider the following cotinine levels of 40 smokers: 0 1 1 3 17 32 35 44 48 86 87 103 112 121 123 130 131 149 164 167 173 173 198 208 210 222 227 234 245 250 253 265 266 277 284 289 290 313 477 491. a) Find the quartiles (Q1, Q2, & Q3) b) What is the 25th percentile? c) What is the median? d) What is the 75th percentile? e) What does the 75th percentile mean?
a) Q1= 86+87/2=86.5 Q3= 250+253/2= 251.5 Q2= 170 b) Q1= 86.5 c) 167+173/2= 170. So Q2= 170 d) Q3= 251.5 e) 75% of smokers use 251.5 or less levels of cotinine.
Ex. The table below represents the calorie count of 22 ice cream bars. [Table 3-6] Calories in Vanilla-Flavored Ice Cream Bars [Table 3-7] Ordered Data
n=22 1. Arrange the data values in order. 2. Find the quartiles. 3. Determine the interquartile range.
(Statistic or Parameter?) The average score on the GRE for all Rutgers applicants.
population parameter
(Statistic or Parameter?) The average score on the GRE for all U.S. students.
population parameter
Examples of "Ratio"
price of a textbook, weight, age, cumulative GPA, amount of fat (in grams) in cookies, etc.
Sample Standard Deviation formula
s= sq. root of s²
(Statistic or Parameter?) The average score on the GRE for a random sample of California residents.
sample statistic
Sample Variance symbol
s²
Sample Variance formula
s²= Σx²-(Σx)²/n -------------- n-1
Frequency= ??? (give formula)
the number of data values that fall in that class.