MKTG 434 - Adv MKTG Analytics EXAM 2
marketing mix modeling is used to
-A marketing mix model can help you identify the most effective future marketing strategies based on the analysis of history sales and marketing costs information -evaluate different components of marketing plans such as advertising, promotion, packaging, media, sales force numbers, etc.
stepwise regression advantages:
-A researcher can manage large numbers of potential independent variables to choose the best-fitting regression models -It provides the order in which variables are removed or added.
you will run a logistic regression if you have these research questions:
-If you want to predict whether a person will subscribe to a magazine or not -If you want to predict whether a person will respond to a direct mail campaign or not -If you want to predict whether a person will purchase a new product or not -If you want to predict whether a cell phone customer will "churn" by the end of year and switch to another service or not
when to use time series technique:
-Sales quantities/revenues -Airline passenger volume -Traffic volumes -Economic metrics such as interest rates, unemployment rates
adjusted r-squared
-a modified version of R2 that has been adjusted for the number of independent variables in the regression model -increases only if a new independent variable improves the model more than would be expected by chance -decreases if the new independent variable improves the model less than would be expected by chance
actual sales vs. predicted sales from regression model
-actual value is the value that is obtained by observation or by measuring the available data. -predicted value is the value of the variable predicted based on the regression analysis.
Review 1. Look at the coefficients table and the line graph comparing actual Levi's sales and predicted Levi's sales and discuss how to improve the overlap between actual sales and predicted sales.
-create a dummy variable to capture seasonality -lagged variable to capture a carry-over effect of marketing -if you have additional list of variables and want to check the best regression model in a systematic way, then do stepwise regression
R-squared
-indicates what proportion of the variances of your dependent variable is explained by your independent variable(s) -shows the overall strength of your regression model. -The range of this R2 is between 0 and 1 -The closer R2 is to 1, the higher the proportion of the variable your dependent variable explains, so we might say the better the regression model you set up.
Review 4 (1a). What is the difference between R-squared and Adjusted R-squared?
-r-squared increases but never decreased -adjusted r-squared increases only if new IV improves the model and only decreases if new IV did not improve the model
use adjusted r-squared when
-to compensate for the addition of variables and only increases if the new predictor enhances the model above what would be obtained by probability. Conversely, it will decrease when a predictor improves the model less than what is predicted by chance. -if you are building Linear regression on multiple variable, it is always suggested that you use Adjusted R-squared to judge goodness of model.
stepwise regression disadvantages
1. Multicollinearity is usually a major issue.(VIF) 2. Some variables may be removed from the model even though they are important to include. -these variables can be manually added back in
marketing mix modeling... (4 things)
1. identifies which marketing inputs should be considered 2. identify the best weights for marketing inputs-marketing objectives 3. understand how a change in the marketing budget will affect future sales 4. determine how to optimally spend current marketing budget
interpret unstandardized coefficients of independent variables using Levi's example:
Here we see 312.393 as an unstandardized coefficient (B); this means that as you increase one billboard advertising, the Levi's sales increases on average across the three years by $312.39.
interpretation of r-squared if number is 0.601
If r-squared is 0.601 then the regression model explains 60.1% of the variances in the dependent variable.
standardized coefficient (beta)
If you have several marketing variables with different scales of units, you want to ignore the differences in scales of units and compare across different variables
Review 4 (3c). At the 95% confidence level, based on the p-values for the three independent variables, which independent variable(s) do you think significantly decrease(s) a customer's willingness to patronize an upscale restaurant?
Prefer simple décor (negative value)
when to use stepwise regression:
Stepwise regression is an appropriate analysis when you have many variables and you're interested in identifying a useful subset of the predictors. -Stepwise regression is used to see how the variance explained, r-squared, changes by adding (or removing) each predictor to the model one at a time
Review 2a. Explain the concept of the carry-over effect of marketing and which variable can be used to explain it?
The carryover effect of advertising states that time lag between the consumers being exposed to the advertisement and their response to the same advertisement -If the coefficient of the lagged sales is higher, then we can expect a longer carry-over effect of marketing - which means marketing efforts are persistent. If the coefficient of the lagged sales is smaller, then we can expect a shorter carry-over effect of marketing.
moving average forecasting
The idea of a simple moving average is that the pattern of random components will continue to happen in the future. -Assumption: future observations will be similar to the recent past.
correct mathematical model for the predicted sales of Levi's:
The predicted weekly sales of Levi's = 751.862 + 0.268(the previous-week sales of Levi's) + 9.432(news ad inches for Levi's) + 124.897(number of billboards up during that week) +79.455(news ad inches for cheaper hats) + 0.726(seconds of radio advertising) + 2813.587(the first week of school) + 5918.145(the weeks of Christmas season)
multiple regression analysis example (identify the independent and the dependent variable): Among the store interior, taste of food, variety of options and price, which variables will influence a customer's willingness to visit the restaurant?
independent: store interior, taste of food, variety of options and price dependent: a customer's willingness to visit the restaurant
Unstandardized coefficients (B)
indicate how much the value of your dependent variable increases as your independent variable increases by one unit.
interpret the results from Cox & Snell R. Square
is 0.489, which indicates that the logistic model explains 48.9% probability of customer retention.
Using age, driving license years, and annual kilometers, segment the customers into three groups. Who are they?
k-means cluster
the logistic model solves these problems:
log[p/(1-p)] = a + B1X1 + B2X2 + e -p is the probability that event Y occurs, p(Y=1) -p/(1-p) is the "odds"
Analysis: Among customer type, fuel, and driving license years, which factor influences annual kilometers the most?
multiple regression
Analysis: What model can be used to predict annual kilometers and to what extent?
multiple regression
Review 4 (2a). What is a null hypothesis for this research question.
no linear relationship between IV and DV
simple regression
one IV and one continuous DV
ANOVA
one categorical variable with three or more groups, one continuous variable
independent samples t-test
one categorical variable with two groups, one continuous variable
logistic regression
one or more than one IV, one categorical DV -Binary logistic regression if DV is a binary response (e.g., choice or no choice) -Multinomial logistic regression if DV is a range of finite options (e.g., Sprint, AT&T and T-Mobile)
k-means cluster
only continuous variables
time-series analysis: seasonality component
pattern of regular fluctuations in the data over time
prediction vs. forecast
prediction: used to uncover and understand relationships between variables (IV, DV) -example: which marketing promotion will most increase the sales of Levi's jeans? forecast: an estimation of the value of one variable in the future. -example: revenue forecasting for next year
Review 4 (3a). Based on the coefficients for preference for waterfront view, preference for unusual desserts, and preference for simple decor, which variable has the strongest effect on a customer's willingness to patronize an upscale restaurant?
prefer simple decorations (check with standardized coefficients)
triple exponential smoothing technique
provides a means for decomposing data that have both trend and seasonality. The Holt-Winters method is commonly used for triple exponential smoothing.
Be able to interpret R-squared, p-value with specific examples and cases
r-squared shows how well the regression model fits the observed data -ex: r-squared is 60% then 60% of the data fir the regression model If p-value ≤ 0.05 then it is statistically significant and rejects the null hypothesis. If p-value > 0.05 then the results are not statistically significant and supports the null hypothesis.
regression-based forecasting is
studying the relationships between data points, which can help you -Predict sales in the near and long term -Understand inventory levels -Understand supply and demand.
Review 4 (2c). What is the p-value from the ANOVA table, and how do you interpret the p-value?
table says p-values is > 0.05 so we're 95% confident there is a linear relationship between at least one IV and DV in model
Key difference between a logistic regression model and a linear regression model is
the Dependent Variable Simple & Multiple Linear regression: continuous DV (e.g., sales) Logistic regression: categorical DV -binary logistic regression if DV is binary response -Multinomial logistic regression if DV is a range of finite options (e.g., Sprint, AT&T and T-Mobile)
definition of cross-elasticity
the change of sales/the change of other department's advertising.
find forecasted dependent if the regression model is given:
the dependent is the Y variable. The regression equation is Y = bX + a where Y is the dependent variable.
basic assumption in multiple regression: independence assumption
the independent variables must be statistically independent and uncorrelated with one another
Multicollinearity
the presence of strong correlations among independent variables
significance (p-value)
the probability that the null hypothesis (the population coefficient of the variable is not significantly different from zero) is rejected although it is true
multiple regression analysis uses...
the same concepts as bivariate analysis but uses more than one independent variable
Review 4 (2b). What is the alternative hypothesis for this research question.
there is a linear relationship between IV and DV
Chi-square test
two categorical variables
correlation
two continuous variables
multiple regression
two or more than two IVs, one continuous DV
Analysis: Among customer type, fuel, driving license years, car category, which factor influences whether customers claim an insurance or not?
two-step cluster
time-series analysis: random
unpredictable residuals
Review 4 (1b). Between R-squared and Adjusted R-squared, which one should be used to interpret the model summary? Select the more appropriate value and interpret it.
use adjusted r-squared for this model. This model explains 60.1% variation of DV (likelihood visiting the restaurant)
single exponential smoothing technique
uses a data smoothing parameter referred to as α (alpha). This parameter α represents smoothing of the time series. -Effective for data with purely random component -No trend or seasonality
how to develop a mathematical equation and how to interpret the equation using a coefficient table from Levi's:
weekly sales of Levi's = constant + lagged_sallevis + newspaper ad inches for Levis + number of billboards + news ad inches for cheaper hats + seconds of radio + first week of school + weeks of Christmas season
multiple regression equation: y= a + b1x1 + b2x2 + b3x3 +... +bxxm
y = the dependent, or predicted variable xi= independent variable a = the intercept bi= the slope of independent variable m = the number of independent variables in the equation
example of interpretation of unstandardized coefficients:
you can explain how much sales will be generated by each promotion plan and how much sales are generated due to seasonality (Christmas and school seasons) or due to the carry-over effect of advertising.
Analysis: Is there a significant difference between males and females in the years of holding a driving license?
independent sample t-test
Analysis: Is there a significant relationship (=association) between driving license years and annual kilometers?
Pearson correlation analysis
How to set up the best model:
1. select a subset of marketing variables -three criteria: 2. check seasonality and create dummy variable: -how to detect seasonality: what weeks are included such as the start of school season or Christmas -how to create dummy variables: (values 0 or 1) indicates the absence or presence of some categorical effect. If the dummy variable is 0, the variable will not affect the dependent variable, and vice versa if the dummy variable is 1. [Transform] to [Compute Variable] and Type [XMAS] in the Target Variable and type "0" in Numeric Expression to Click [OK] -interpret the unstandardized coefficient of dummy variables: 3. consider the "carry-over effect of advertising" and create a lagged variable -carry-over effect of advertising: "the portion of advertising that retains its effect and affects consumers even beyond the period of its exposure". -how to create a lagged variable: Go to [Analyze] [Regression] to [Linear] and Include "lag_sallevis" into your existing regression model.
multiple regression analysis tells us... (3 things)
1. which factors predict the dependent variable 2. which way each factor influences the dependent variable 3. how much each factor influences the dependent variable
bivariate regression
=simple regression -this analysis uses only one independent variable
stepwise regression definition:
A method of fitting regression models by using an automatic procedure to choose the optimal sets of independent variables.
Is there a significant difference among different categories of car owners in the annual driving kilometers?
ANOVA
how to identify multicollinearity
One commonly used method is the variance inflation factor (VIF). The VIF is a single number, and a rule of thumb is that as long as the VIF is less than 10, multicollinearity is not a concern.
explain the significance levels for unstandardized and standardized coefficients:
Unstandardized coefficient represents the amount of change in a dependent variable Y due to a change of 1 unit of independent variable X. -are produced by the linear regression model after its training using the independent variables which are measured in their original scales i.e, in the same units in which we are taken the dataset from the source to train the model Standardized coefficient compares the strength of the effect of each individual independent variable to the dependent variable. The higher the absolute value of the beta coefficient, the stronger the effect -If your regression model contains independent variables that are statistically significant, a reasonably high R-squared value makes sense. The statistical significance indicates that changes in the independent variables correlate with shifts in the dependent variable.
basic assumption in multiple regression: variance inflation factor (VIF)
VIF can be used to assess and eliminate multicollinearity -VIF is a statistical value that identifies what independent variable(s) contribute to multicollinearity and should be removed -Any variable with VIF of greater than 10 should be removed
Mean absolute deviation (MAD) / Forecast Accuracy
Where Ar= actual value of the time series at time Fr=forecast value for time n= number of forecast values Lower mad means better forecasting model
r-squared definition
a measure of goodness of fit of the regression model
marketing mix modeling definition
a statistical analysis to: 1. estimate the impact of various marketing mix strategies on sales 2. forecast the impact of marketing mix strategies on future sales
Time-series analysis
a technique that analysts use to (a) uncover any implicit structure (patterns or trends) in the data and (b) model that structure to make forecasts. -The assumption is that the future, at least in the short term, will continue the structure of the past.
double exponential smoothing technique
adds a second parameter β (beta) to α. The β parameter is called the trend smoothing factor. -For data that exhibit trends but not seasonality
Review 4 (3b). Is there any multicollinearity problem in this regression model? Explain this.
as all VIF > 10, there is no multicollinearity issue in this model
how you got the predicted sales
ask SPSS to save the unstandardized predicted dependent variables -SPSS calculates predicted sales: go to [Analyze] to [Regression] to [Linear] and Select the same set of the independent variables in the seventh regression as in the stepwise regression procedure. Click [Save] and Check "Unstandardized" under Predicted values to Click [Continue] and Click [OK]
Analysis: Using age, car category, and annual kilometers and claims, segment the customers. How many clusters did you find? Who are they?
binary logistic regression
interpret the odds ratio
can be interpreted as the effect of one unit of change in X in the predicted odds ratio with the other independent variables in the model held constant.
two-step cluster
categorical and continuous variables/without pre-determined number of clusters
Analysis: Is there a significant relationship (=association) between gender and making an insurance claim?
chi-square test
Review 3b. explain the concept of Constant. Interpret the unstandardized coefficients of Constant.
constant = average $ earned given regression and are not explained by IV
time-series analysis: trend component
direction of the data changing over time
how to interpret the cross-elasticity of advertising in the regression model:
for the Levi's equation, the cross-elasticity is 79.743 x news ad inches for cheaper hats
Interpret the overall percentage of cases
from the classification table, the overall percentage of cases that are correctly predicted by the model. In this model, 94.1% cases are correctly predicted.(18939+4180)/(4180+857+586+18939) =0.941 94.1%