chapter 11 - basic data analysis for quantitative research
chi-square value
(observed - expected)^2 / e degrees of freedom (df) -number of categories minus 1 p-value: probability of incorrectly rejecting H0 the larger the chi-square value, the more likely it is that the two variables are related
cross-tabulation (cont.)
-do you think that the top speed of a motorcycle is an important consideration when purchasing? -do you think that motorcycle manufacturers should continue to increase the top speed of their motorcycles? -do you think that motorcycle manufacturers should provide safe riding education to their customers? respondents: -300 buyers (male and female) -50 motorcycle dealers -78 insurance companies gender of survey responders and top speed is important - # of M and F who said is important (M = 292, F = 190) - # of M and F who said is not important (M = 8, F = 110) males are more likely than females to think that top speed is more important buyers, dealers, and insurance companies and safety education is important - # of B, D, and IC who said is important (B = 197, D = 10, IC = 60) - # of B, D, and IC who said is not important (B = 103, D = 40, IC = 18)
measures of central tendency
-mean -median -mode -range -variance -standard deviation
mean
-the arithmetic average of the sample -all values of a distribution of responses are summed and divided by the number of valid responses
standard deviation
-the average distance of the distribution values from the mean -a quantity calculated to indicate the extent of deviation for a group as a whole -tells you how tightly all the various examples are clustered around the mean in a set of data -affects the shape of a distribution curve -square root of the variance
median
-the middle value of a rank-ordered distribution -exactly half of the responses are above and half are below the median value
mode
-the most common value in the set of responses to a question -the response most often given to a question
analysis of variance (ANOVA)
a statistical technique that determines whether three or more means are statistically different from one another null hypothesis for ANOVA always states that there is no difference between the dependent variable group m1 = m2 = m3
t-test
a statistical test to determine whether there is a measurable difference between two groups uses H0 and H1 -H0: standard assumption that there is no difference between variables (groups) -H1: a difference exists between variables (groups) assumptions: -randomly sampled from a defined population -interval or ratio scale
n-way ANOVA
a type of ANOVA that can analyze several independent variables (groups) at the same time multiple independent variables (groups) in an ANOVA can act together to affect dependent variable group means
variance
average of the squared differences from the mean
preparation of charts
charts and other visual communication approaches should be used whenever practical -help information users to quickly grasp the essence of the results developed into data analysis -can be an effective visual aid to enhance the communication process -add clarity and impact to research reports and presentations
choosing the appropriate statistical technique
considerations that influence the choice of a particular technique: -number of variables -scale of measurement (nominal, ordinal, interval, ratio) -parametric versus nonparametric statistics -parametric: normally distributed data assumed -non-parametric: normal distribution not assumed
chi-square analysis
enables researchers to test for statistical significance between the frequency distributions of two or more nominally scaled variables in a cross tabulation table to determine if there is any association between the variable -compares the observed frequencies of the responses with the expected frequencies -referred to as a "goodness-of-fit" test uses H0 and H1 -H0: standard assumption that there is no interaction between variables -H1: an interaction exists between variables
statistical analysis
every set of data collected needs some summary information developed that describes the numbers it contains -central tendency and dispersion -relationships of the sample data -hypothesis testing
ANOVA
follow-up tests (post-hoc): a test that flags the means that are statistically different from each other -performed after an ANOVA determines there are differences between means
choosing the appropriate statistical technique: interval or ratio
measure of central tendency -mean measure of dispersion -standard deviation statistic -t-test, ANOVA (parametric)
choosing the appropriate statistical technique: ordinal
measure of central tendency -median measure of dispersion -percentile statistic -chi-square (non-parametric)
choosing the appropriate statistical technique: nominal
measure of central tendency -mode measure of dispersion -none statistic -chi-square (non-parametric)
types of t-tests
one sample: compare one group to a fictional group with a hypothesized mean independent samples: two groups of responses that are tested as though they may come from different populations paired samples: two groups of responses that originated from the sample population
1-tailed vs 2-tailed tests
one-tailed -use if hypothesizing that one mean is higher than the other -hypothesizing direction -mean IQ for females > mean IQ for males mean IQ for females < mean IQ for males two-tailed -use if hypothesizing that two means are different -mean IQ for females ≠ mean IQ for males
how to develop hypothesis
researchers have preliminary ideas regarding data relationships based on research objectives -hypotheses: ideas derived by researchers from previous research, theory, and/or the current business situation developed prior to data collection -as a part of the research plan
sample statistics and population parameters
sample statistics are useful in making inferences regarding the population's parameter population parameter: a variable or some sort of measured characteristic of the entire population -gender -language -marriage/divorce -education -average length of hospital stay for all infants born in the US
bivariate statistical tests
test hypotheses that compare the characteristics of two groups or two variables types of bivariate hypothesis tests: -cross-tabulation -chi-square -t-test: independent and paired samples -analysis of variance (ANOVA) -regression
range
the distance between the smallest and largest values in a set of responses
one sample t-test (univariate)
used to compare one sample to fictional population with a hypothesized mean want to ascertain whether sample's mean is any different than the hypothesized mean -IQ of students at an elite university
independent samples t-test
used to compare two samples (groups) to determine if they came from the same population -means of samples should deviate slightly from population -are means so different that we can say that they are not from the same population? -H0: males and females are from the same population -H1: males and females are not from the same population
f-test
used to statistically evaluate the differences between the group means for statistical significance in ANOVA f-rate = variance between groups/variance within groups larger f ratios imply significant differences between the groups; the larger the f ratio, the more likely it is that the null hypothesis will be rejected
univariate statistical tests
used to test hypotheses when the researcher wishes to test a proposition about a sample characteristic against a known or given standard -one-sample t-test ex: -the new product or service will be preferred by 80% of our current customers -the average monthly electric bill in Miami exceeds $250 -the market share for Community Coffee in south Louisiana is at least 70% -more than 50% of current Diet Coke customers will prefer the new Diet Coke that includes a lime taste
paired samples t-test
used when testing for differences in two means for variables in the same sample used in "before-after" studies, or when the samples are matched pairs -employee training -math proficiency tests
cross-tabulation
useful for examining relationships and reporting the findings for two variables -are two different variables related to each other? purpose is to determine if differences exist between subgroups of the total sample a frequency distribution of responses on two or more sets of variables responses for each of the groups are tabulated and compared used to summarize data categorical data only -mutually exclusive -belong to only ONE group contingency table