Data and Analytics Chapter 12
In using pie charts to illustrate data in written reports, it is advisable not to exceed
5-6 slices: It's best to stick to a maximum of 5 or 6 slices or the pie chart becomes confusing and difficult to read.
The result of the estimation process through rigorous guessing procedures is
A distribution of the various values of each parameter: The Markov chain Monte Carlo process produces a posterior distribution of the values for each parameter.
_______________ is a data-based selection bias that occurs when data are deemed "bad" using arbitrary criteria, often as they come in, instead of using explicitly-stated or previously agree-upon standards.
Ad hoc rejection: With ad hoc rejection, the researcher decides arbitrarily that a piece of data looks bad and is, therefore, rejected.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 Which type of movie is rated the highest?
Animation: Animation had a mean rating of 2.110, the highest.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 In terms of ratings, which movies have the highest heterogeneity?
Art and Horror: The standard deviation for the ratings is highest for Art (0.809) and Horror (0.905).
Having used a Bayesian estimation process, to determine the best value for a parameter, researchers look
At the highest point of the posterior distribution: The "best" value is the highest point of the posterior distribution for that parameter.
To illustrate multiple comparisons and complex relationships, the best illustrative aid to use is
Bar charts: Bar charts are good for multiple comparisons and complex relationships.
Heterogeneity models go by all of the following names except
Bayesian estimation model: Heterogeneity models include discrete (also called latent class or finite mixture model) and continuous types.
With _______________, there is no search for the best parameter values, but instead guesses are made about the value of each parameter repetitively to arrive at a solution.
Bayesian estimation: By definition, the Bayesian estimation process guesses the value of the parameters in an iterative process until a solution is found.
Name the two main types of heterogeneity models.
Bayesian model and Heckman model?
Correcting for selection bias relies on the _______________ for selection
Binary probit model: For the selection, the Heckman model uses the binary probit model because its normally-distributed error is somewhat easier to understand.
A discussion of the methodology used in the research would be found in which section of the written report?
Body: The methodology is part of the body of the report.
Identify the 4 components of the "body" of the written report.
Check syllabus
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 Which types of movies are not selected often, yet are rated highly when selected?
Classic and Art: Classic movies have a selection mean of -2.070, but a rating mean of 1.375; Art movies have a selection mean of -0.523, but a rating mean of 1.462.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 Which kind of movies are selected the least often?
Classic and Thriller: Classic and Thriller have the highest negative means for selection.
What is the Heckman model, how does it work, and what is it used for?
Corrects selection bias?
Within marketing research, "heterogeneity" is generally used to refer to
Differences in sensitivity: In marketing research, it is primarily differences in sensitivity or responsiveness that is called heterogeneity. This is beta or coefficient heterogeneity.
Suppose that the distribution of the value of b for a sample of respondents were as follows: 23% have b=2 17% have b=3 30% have b=4 30% have b=5 Which type of heterogeneity model does this example fit?
Discrete: Because there are distinct, discrete values for the beta, it fits the discrete heterogeneity model.
Since many managers only read the _______________ of a written research report, it is important for it to be accurate, concise, and well-written
Executive summary: The executive summary should be 1-2 pages in length and provide a brief summary of the crucial findings of the report.
When writing a report, use plenty of clichés and marketing jargon, because this is the language that management understands.
False: Avoid clichés; they convey lazy, hackneyed thinking.
The selection equation of the Heckman model is an ordered logit model.
False: It is a binary probit model.
When respondents can decide whether or not they want to participate, it is called unequal selection bias.
False: It is called self-selection bias.
The Markov chain Monte Carlo process will yield a single optimal value for each parameter.
False: It produces a distribution for each parameter, called a posterior distribution.
It is important to remember that a sample that is collected non-randomly will always have a bias.
False: Non-random selection does not necessarily mean that the sample is biased, but it is likely to be bias.
In terms of presenting the research results in the written paper, objectivity refers to the lack of bias while framing the results so as to make them palatable to management.
False: Objectivity refers to the lack of bias while stating in clear, unequivocal terms what the research implies.
Although hierarchical models are excellent for allowing for individual differences among respondents and measuring that difference in a meaningful way, they cannot provide information about where it comes from.
False: One of the strongest benefits of hierarchical models is that they do provide information about where the heterogeneity comes from.
The predictions from homogenous models are far superior to those from hierarchical models.
False: Predictions arising from hierarchical models have been found to be far superior to those from simple homogeneous models, in terms of providing a better fit to the actual data.
When there is little data for an individual respondent, nonhierarchical methods permit the borrowing of information from other respondents to obtain better estimates.
False: This is an important element of hierarchical methods and why hierarchical methods are popular.
With the Markov chain Monte Carlo method of estimating parameters, the process finds the best fit for each parameter before it moves on to the next parameter.
False: This process immediately goes to the next parameter, assuming the previous guess was the right value. It continues to cycle through the parameters thousands of times.
In the Heckman model, if the selection model and the prediction model are estimated separately, the two errors remain uncorrelated, and selection bias is automatically controlled for.
False:The error terms remain uncorrelated, which means that the selection biases also remain.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 In terms of selection, which two movies have the highest heterogeneity?
Family and Classic: The standard deviation for the selection distribution is highest for Family (0.888) and Classic (1.118) movies.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 From the following list of movies, which one is rated the lowest?
Family: Family was rated the lowest (-0.389 mean).
Performing multiple, related studies and reporting only those that support the favored hypotheses is called
File drawer effect: Reporting only a study that supports a favored hypothesis or repeating a study until the desired effects are obtained is called the file drawer effect.
An example of a study-based selection bias is
File drawer effect: The two types of study-based selection bias are file drawer effect and fishing (data mining).
The most general form of heterogeneity model in current use is called
Finite normal mixture heterogeneity: Finite normal mixture heterogeneity is the most general form of heterogeneity model.
What increasingly common web feature uses a Heckman model to correct of massive amounts of missing data, parameter heterogeneity, and Bayesian estimation methods?
Growth curve models?
It is rare that an overt selection bias will
Have no effect on the outcome of a study: An overt selection bias is extremely likely to have some type of effect on a study's results.
The problem of selection bias was long been recognized by statisticians as a problem until the introduction of the
Heckman model: In 2000, James Heckman was awarded the Nobel Prize in Economics for his development of the Heckman model to deal with selection bias.
The most common type of selection bias correction model used by researchers is the
Heckman model: The Heckman model is the most frequently used selection bias correction model.
The fact that different people respond differently to the same sale, advertisement or other marketing variables is
Heterogeneity: refers to the fact that not everything is identical; in marketing research, it refers to the difference in individual's responses.
A hierarchical model that is estimated by Bayesian methods is called a
Hierarchical Bayes model: By definition, the hierarchical Bayes model is a hierarchical model that is estimated using a Bayesian method.
In real applications, all of the following complexities of heterogeneity are common except
Homogenous solutions are more accurate: In real applications, it is not uncommon to have a large number of humps for each coefficient, to have many coefficients, and also to find that the coefficients may be correlated (e.g., people who care more about price may care less about styling).
Running a separate regression model for each person would yield the best results and effectively deal with heterogeneity. We do not do this for all of the following reasons except
Individuals tend to give the same answers anyway: Heterogeneity implies that people do not tend to give the same answers.
In examining the differences between men's and women's weight, it can be said that women have a lower baseline weight then men. This is an example of
Intercept heterogeneity: The intercept heterogeneity in this example refers to the different points where the regression line would intercept the Y-axis for men and for women.
The hierarchical Bayes model is called "hierarchical" because
It consists of a number of levels: It is called "hierarchical" because it consists of a number of levels.
The hierarchical model allows for all of the following except
It includes individual regressions for each respondent: The purpose of the hierarchical model is to be able to get away from individual regressions for each person, yet capture the heterogeneity that is present.
The model for selection in question #23 is used instead of others because
Its normally-distributed error is somewhat easier to work with: Because the binary probit model produces an error that is normally-distributed and easier to work with, it is used in the Heckman model.
NOTE: The following results from the Application Case Study presented in the text show means for selection of movies (Selection Distribution) and how they were rated (Prediction Distribution). Refer to this table to answer the following questions. Hierarchical Bayes Sample Selection Model for Movie Ratings Predictions Selection Distribution Prediction Distribution Mean StdDev Mean StdDev Coefficients Heterogeneous across Customers Action 0.854 0.346 0.772 0.445 Animation 0.809 0.345 2.110 0.389 Art -0.523 0.362 1.462 0.809 Classic -2.070 1.118 1.375 0.666 Comedy 0.233 0.263 0.531 0.415 Drama 0.117 0.279 0.506 0.698 Family -0.585 0.888 -0.389 0.263 Horror 0.395 0.398 0.312 0.905 Romance 0.594 0.232 0.562 0.510 Thriller -0.607 0.602 0.253 0.188 Coefficients Heterogeneous across Movies Age -0.046 0.122 0.035 0.176 Gender (M = 1; F = 0) -0.022 0.170 0.123 0.176 Error Correlation 0.105 Which gender is less likely to select a movie?
Males: The -0.022 indicates males are slightly less likely to select a movie.
In Bayesian analysis, guesses follow a rigorous process called
Markov chain Monte Carlo: The guessing process used is Markov chain Monte Carlo, which is set of rigorous guidelines on how the guessing should progress.
Explain the general procedure of how the Markov chain Monte Carlo works.
No idea
All of the following are participant-based selection biases except
Premature termination: Premature termination is a data-based selection bias.
The methodology component of the written report should contain all of the following information except
Research findings: The research findings should be presented in the results component of the report.
In the equation Yi = Aj + Bj + Ej how do the various variables and coefficients relate to heterogeneity?
SOS
Identify and describe the 8 types of selection bias.
Selection, unequal...
In examining the differences between men's and women's weight, it can be said that each inch of height adds more to men's weight than to women's. This is an example of
Slope heterogeneity: In this example, the difference refers to the difference in the slope of the regression line.
For most reports, what is the typical format (of 7 sections)?
Syllabus
Discuss 5 general guidelines for a written marketing research report.
Syllabus?
Identify the 4 items that should be summarized in the executive summary section of a written research report.
The executive summary should be 1-2 pages in length and provide a brief summary of the crucial findings of the report. (check syllabus)
The coefficients of selection bias correction models will automatically correct for any selection bias if the
The selection and prediction models are estimated jointly: By estimating the two equations jointly, the Heckman model automatically corrects for any bias.
In the Heckman model, the more correlated are the errors between the selection model and the prediction model, the greater the evidence of a selection bias.
True: A high correlation between the errors indicates the presence of a selection bias.
Product recommendation systems use heterogeneity, hierarchical Bayes modeling, and sample selection bias correction in a single application to predict products a customer will like.
True: A product recommendation system is a statistical model that predicts which products a target customer may like by examining the preferences and purchases of similar consumers.
The vast majority of statistical models used in marketing yield only one value for each "b (slope)," which marketers assume is valid for everyone; yet this concept is in direct contrast to the idea of heterogeneity.
True: Although heterogeneity implies people react differently, most marketing models assume everyone will react the same way when a particular situation is faced or a variable is modified.
The executive summary is a condensed, accurate statement of the report's most crucial findings that is the essential element of nearly all research reports.
True: Because many managers read only the executive summary, it is crucial that this section be accurate, concise, and well written.
Generally speaking, selection biases are the most problematic when items are selected based on values of the outcome variables.
True: By selecting items based on an outcome variable, what researchers wish to study.
The concept of heterogeneity is very important to marketing because people respond differently to the same marketing variable.
True: Heterogeneity implies people are different and will, therefore, react differently to the same stimuli.
If a sample is collected through a non-random means, it is likely to have a selection bias.
True: It is rare that an overt selection bias has no effect on a study.
The Heckman model can be thought of in terms of two equations: a "selection model" and a "prediction model."
True: It's a two step process. Selection describes whether the variable is seen, and the prediction involves how well it predicts.
In preparing a written report, a simple graph can convey far more than a table with the same information.
True: Most individuals reading a written report can quickly gather the message presented in a graph.
When writing a research report, it is important to remember that the report is designed to provide information to decision makers, not to showcase intriguing findings of the research.
True: Research reports are written for decision makers so they can make intelligent decisions.
For prediction, the Heckman model can use any regression-based model.
True: The type of regression model does not matter as long as the error term is normally distributed.
One of the most important analytical advances in marketing research in recent years has been the advent of hierarchical models and Bayesian estimation.
True: This advance has been especially important when conjoined in the form of Hierarchical Bayes, or simply HB, models.
With end-point selection bias, data collection begins as soon as data fall above or below an acceptable threshold.
True: This is a data-based selection bias that occurs when the researcher makes a judgment that values are looking good or bad before beginning to record them.
The selection bias of "data mining" occurs when researchers perform a huge number of analyses and present the most significant results as if they were the only a priori hypotheses.
True: This is the definition of "data mining" or "fishing" among the study-based selection biases.
All of the following are data-based selection biases except
Unequal selection criteria: Unequal selection criteria is a participant-based selection criteria.
_______________ occurs when researchers pre-screen potential participants on an outcome variable.
Unequal selection criteria: With unequal selection criteria, a researcher uses an outcome variable to determine who should participate in the study or in what capacity.