Ch 10: Descriptive Statistics
Data Transformation
-collapsing/combining similar categories -used to understand data better ex: agree and strongly agree get combined
Descriptive statistics: step two
frequency distribution -in %'s not #'s -determine what the values are of your attributes -allows you to draw meaning from data ex: values: male and female raw #: male (100) and female (300) translated to %: male (25% of sample) and female (75% of sample)
frequency distribution
providing % of responses in categories
Nominal level of measurement (data)
only mode is meaningful
Measures of Central Tendency: mode
(can get from ALL measurement level data - nominal, ordinal, interval AND ratio) ADV: HELPS describe data by providing... -most frequently occurring #/answer (= mode) -isn't affected by outliers -provides a more characteristic description of data than the mean (displays distribution) DISADV: -weak form of descriptive data (used for mainly nominal data) -can't determine outliers, variance (distribution), mean, or standard deviation ex: do you think online news sources are credible or NOT credible? (nominal) -25% credible; 75% not credible - doesn't give you much info... if it was a SCALE question (i.e. 1-10) you'd get more info.
Measures of Dispersion/Variability: variance
(can get from only INTERVAL and RATIO measurement level data) = how data is distributed relative to the mean (how variance score differs from mean) ADV: HELPS describe data by... -large variance = a lot of answers/opinions and widely scattered scores -small variance = similar answers and close to average.
Measures of Dispersion/Variability: standard deviation
(can get from only INTERVAL and RATIO measurement level data) = mean of variance scores ADV: HELPS describe data by... -(gives variance) - higher standard deviation, more varied the data is. -allows for easy interpretation
Measures of Central Tendency: median
(can get from only INTERVAL and RATIO measurement level data) ADV: HELPS describe data by providing... -mid point (= median) -extreme outliers don't effect median as much as they do the mean. ex: ages of freshman entering BU -child prodigy (Age 14) would skew mean #; they are an OUTLIER
Measures of Central Tendency: mean
(can get from only INTERVAL and RATIO measurement level data) ADV: HELPS describe data by... -only measure than can be defined algebraically DISADV: -outliers pull mean towards their direction (skewing data) -avg. doesn't provide much information (doesn't display data distribution) -can't get sample size. ex: average test score = 80. unknown: -how many people took test -distribution (outliers)
Measures of Dispersion/Variability: range
(can get from only INTERVAL and RATIO measurement level data) range = highest score - lowest score ADV: HELPS describe data by... -gives distribution -removes outliers (acts like book ends) DISADV: -unknown highest and lowest answer -unknown distribution WITH IN range -range increases with sample size
Inferential Statistics
(describes relationships between variables!) -based on probability theory (statistically significant) -formally tests hypotheses to see if relationships truly exist and if they're statistically significant. -do so using 2 types of analysis: 1. correlation analysis 2. multi-variate analysis
Descriptive Statistics (does what)
(helps describe variables!) -focus is on understanding MEANING BEHIND data, not just data itself. -describes and organizes # data -provides frequency distribution and graphical presentation of data (i.e. histograms; pie charts) -2 steps: 1. tabulation of data 2. frequency distribution
Statistical processing software
1. SAS 2. SPSS (easy; common for social science) 3. Excel
Measures of Central Tendency (know how they help describe data and with what level(s) of measurement can you calculate.)
1. mean 2. median 3. mode (all!!!)
Measures of Dispersion/Variability (know how they help describe data and with what level(s) of measurement can you calculate.)
1. range 2. variance 3. standard deviation
How to cross tab
1. segment entire sample population (Larger the sample, more segmented) 2. look for variation between 2 variables. 3. remember a 5 point difference is statistically significant!
Elaboration analysis
a correlation analysis (regarding cross-tabs) -looks at a 3rd variable! ex: favorability by age AND gender
Cross Tabs
a correlation analysis and multi-variate analysis -every survey question is cross-tabbed -make hypothesis about what you want to test, then choose an INDEP. VARIABLE to hold constant. ***a 5 point difference is statistically significant.
Interval and Ratio levels of measurement (data)
mean, median OR mode is meaningful.
Multi-Variate analysis
method of Inferential Statistics -holding 1 or more variables constant, see which variable has the most impact on the others. (which independent variable is strongly correlated to your dependent variable) ex: cross tabs
Correlation analysis
method of Inferential Statistics -studies how variables are correlated; relationships between them ex: cross tabs
Ordinal level of measurement (data)
mode OR median is meaningful
Descriptive statistics: step one
tabulation of data -organize data (in table, etc.) -tallying = when this is done by hand