Statistics
Q1
- find median of lower half - 25% of observations smaller, 75% larger
numerical data organized
- ordered array - frequency distribution - cumulative distribution
numerical data with two variables
- scatter plot - time series plot
to find score within 1 SD above and below
- subtract or add the mean to the SD
categorical data is organized
- summary table (1 variable) - contingency table (2 variables)
five number summary
- used in boxplots - min, max - Q1, Q2, Q3
frequency distribution
1. sort data in ascending order 2. find range 3. select number of classes 4. class interval (range/class then round up)
z score
1. subtract each data point in distribution by the mean 2. divide by standard deviation 3. outlier is greater than 3 or less than -3
simple random sample
every member of the population has a known and equal chance of selection
secondary data
information that already exists somewhere, having been collected for another purpose
stem
leading digits
primary data sources
Data that you gather yourself - Telephone surveys, online surveys, mail surveys, and personal interviews are the most common forms
DCOVA
Define, Collect, Organize, Visualize, Analyze
measures of central tendency
mean, median, mode
sample
portion of the population, "small group"
categorical data
qualitative do you have a facebook? yes, no, blue, green, brown
numerical
quantitative, values that represent a counted or measured quality
Covariance Correlation Coeffecient
r = covariance/ square root of variance X times square root var. Y 1. input the covariance 2. divide by the square root of variance of X and Y multiplied
measures of variation
range, variance, standard deviation, coefficient of variation
Interquartile range
Q3-Q1
range
the difference between the highest and lowest scores in a distribution
Median
the middle score in a distribution; half the scores are above it and half are below it
Mode
the most frequently occurring score(s) in a distribution
leaf
trailing digits
Q2
- Find median of distribution - 50% smaller, 50% bigger
Q3
- Median of upper half of distribution - 25% of observations are greater
systematic sample
- a sample drawn by selecting individuals systematically from a sampling frame - K=N/n
variance
1. calculate the mean of the distribution 2. subtract the mean from each number in the distribution 3. square each number 4. add all together then divide by n-1 if sample, if population just divide by n
coefficient of variation
1. divide the standard deviation by the mean 2. multiply by 100 to get %
covariance
1. find the mean and variance of x and y 2. multiply X and Y together 3. add XY together and divide by n 4. subtract (meanX)(meanY)
population
all items you will be drawing a conclusion from
numerical discrete data
arise from a counting process, number of kids
numerical continuous data
arise from a measuring process, how long it will take, weight
mean
average
Standard deviation
take the square root of the variance
no skew
when median and mean are equal to each other
right skew
when the mean is greater than the mode/median
Left Skew
when the mean is lower than the mode/median