Marketing Research OSU
4. The exact probability of getting a computed test statistic that was largely due to chance is referred to as the______________. a. p-value b. chi-square value c. z-test value d. median value e. none of the above
a. p-value
In a regression, we can test the predictive value of each variable in the model. If there are three variables in the model, what would be the null hypotheses: a. t1, t2 , t3 = 0 b. t1 not= t2 not= t3 c. t1 - t2 - t3 = 0 d. t1 = t2 = t3
a. t1, t2 , t3 = 0
Which of the following techniques is not appropriate when using categorical independent variables? a. regression analysis b. ANOVA c. independent sample t-test d. all are appropriate
d. all are appropriate
The change in the dependent variable given a one-unit change in the independent variable is expressed by the ________________. a. coefficient of determination b. correlation coefficient c. coefficient of covariation d. regression coefficient e. none of the above
d. regression coefficient
Y
dependent variable
What is the procedure to code open ended questions?
generate a lengthy list of possible responses before coding
Coding
grouping and assigning numeric codes to various responses in a question
_________________ variables needs to be dummy coded for regression
independent
continuous DV and categorical IV
independent sample t-test (if two groups) ANOVA (if two or more groups) Regression (if dummy coded)
X
independent variable; used to model changes in Y
categorical DV and continuous IV
logistic regression
correlation coefficient
measure of association between two continuous variables
What is null/alternative hypothesis?
null: the hypothesis to be tested; do NOT want to be true; includes equality alternative: what is believed to be true if the null hypothesis is fasle; does not include equality
one continuous variable
one-sample t-test
What is statistical significance? What is a p-value?
p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true
two continuous variables
paired t-test (if comparing means) correlation (if testing association)
What is a hypothesis?
unproven proposition that explains a certain phenomena AND is empirically testable
simple linear regression model
using one continuous variable to predict another continuous variable
multicollinearity
when your Xs are highly collinear with each other
range of correlation
+1.0 to -1.0 +1=perfect positive relation -1=perfect negative relation 0=NO relation
Chi-Square test
-2 categorical variables -compare the frequency of one option under different groups
What are the steps in preparing the data?
-Validating the accuracy, clarity, and details of data is necessary to mitigate any project defects -Without validating data, you run the risk of basing decisions on data with imperfections that are not accurately representative of the situation at hand
When do we accept / fail to accept null hypothesis? What does that mean?
-accept/fail to reject null when there is not enough evidence, p>0.05 -reject null when there is enough evidence, p<0.05
What is a normal distribution? What are its properties?
-all 3 CT are similar -bell shaped and symmetric
correlation
-analysis of the degree to which changes in one variable are associated with changes in another -statistical measure of covariation between two variables
adjusted R squared
-corrects for number of independent variables -prefer using over R squared if you have more than 4-5 variables, or if there is a discrepancy between R squared and adjusted R squares
Differences between correlation and regression
-correlation summarizes the DEGREE and DIRECTION of association with a single number -regression generates a mathematical function linking the variables -regression has INDEPENDENT and DEPENDENT variables -correlation does not make this distinction -regression can be used for PREDICTIONS -correlations can NOT BE used for predictions
How do we code closed-ended variables?
-for each question, assign different codes to different answers, including missing answers -across questions, assign the same code to the similar answer
What are the types of graphical representations? When is a good time to use each?
-frequency distribution -histogram: rectangles with width proportional to range of value and height proportional to frequency -charts (pie, line, bar)
What is a frequency table? When it is used? How is it useful?
-frequency table shows the distribution of observations based on the options in a variable. -helpful to understand which options occur more or less often in the dataset
Why is validation and coding important? How do they help the later steps in data analysis?
-identify interviewer fraud, omissions, ambiguities, and errors in response -identify mistakes in the way data is written out by software
properties of R squared
-it increases as more IV's are added -it can NOT decrease as more IV's are added
What are the measures of central tendency? What are the different types? How are they different from each other When are they used? What types of scales allow for which measure? How are they useful?
-mean, median, mode -mean:simply the average of all the items in a sample -mode: the value that occurs most often -median: midpoint of the distribution -mean is susceptible to outliers -if all 3 are similar - normal distribution
What is the difference between sample and population?
-population: entire group you want to draw conclusions about -sample: specific group that you will collect data from
multiple linear regression
-procedure for predicting the level or magnitude of a dependent variable based on the levels of MULTIPLE independent variables & single dependent variable
What is the relationship between coding and choice of data analyses?
Coding is a qualitative data analysis strategy in which some aspect of the data is assigned a descriptive label that allows the researcher to identify related content across the data
categorical DV and categorical IV
Logistic Regression Chi Square
Simple (Bivariate) Regression
Analyzing the strength of relationship between the dependent and the independent variable (both continuous)
You are interested in whether your product is more positively evaluated by different ethnic groups (Caucasian, African American, Asian American, Latin American, Other). You ask your respondents to indicate their ethnicity and to indicate whether or their evaluation of the product (on a 7-point scale). What test would you conduct to determine whether different ethnicities are equally interested in buying your magazine? a. paired t-test b. independent samples t-test c. Chi-square d. ANOVA e. regression
d. ANOVA
managerial implications arise from the size of the ______ coefficient
slope
How do we choose the test to use?
the choice of analyses method is directly influenced by the type of variables of interest
What are the measures of dispersion? What are the different types? How are they different from each other? When are they used? What types of scales allow for which measure? How are they useful?
-range, mean absolute deviation, standard deviation, variance -range: distance between the smallest and largest value in the set -deviation score: the differences between each observation value and the mean -variance: measures variability from the average or mean -standard deviation: square root of variance -tightness of distribution -even if CT are the same, your spread might vary
What are cross-tabs? How do we create them?
-simply data tables that present the results of the entire group of respondents, as well as results from subgroups of survey respondents --examine relationships within the data that might not be readily apparent when only looking at total survey responses
T-tests
-single group (one continuous variable) & compare mean to a number -multiple groups (one continuous and one categorical variable) & compare mean of one group to another -repeated variables (two continuous variables) & compare mean of two variables asked to the SAME person
ANOVA
-test for the difference among the means of two or more groups -test for interaction between the variables
What is descriptive statistics?
-used to describe the basic features of the data in a study -provide simple summaries about the sample and the measures -together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data
Stages in hypothesis testing
1. Formulate the null and alternative hypotheses 2. Choose the significance level 3. Compute the test-statistic 4. Prepare a statistical decision (p-value) 5. Make a statistical decision: reject or not reject 6. Make a managerial decision/interpretation
Given an arch plot, which of the following provides the best description? a. perfect linear association b. strong non-linear association c. u-shape association d. no association e. weak negative association
b. strong non-linear association
You are interested in understanding the factors that contribute to your sales. You have measures of advertisement spending, whether a promotion was present or not and the price charged for your product. You want to see whether sales is affected by these marketing variables. Which analyses method would you use? a. ANOVA b. paired t-test c. multiple regression d. correlation e. none of the above
c. multiple regression
one categorical variable
chi-sqaure
continuous DV and continuous IV
regression