Chapter 4 and 5 quantitative methods of business
Regression analysis
- very valuable tool for a manager Understand the relationship between variables Predict the value of one variable based on another variable
Forecast error
= Actual value - Forecast value
more responsive
A high value of B makes the forecast ---to changes in trend
Forecasting - Trend and Random
A more complex model can be used The basic approach Develop an exponential smoothing forecast Adjust it for the trend
(errors)
A plot of the residuals ----- often highlights glaring violations of assumptions
Cautions and Pitfalls
A t-test for the intercept (b0) may be ignored as this point is often outside the range of the model A linear relationship may not be the best relationship, even if the F test returns an acceptable value A nonlinear relationship can exist even if a linear relationship does not Even though a relationship is statistically significant it may not have any practical value
Exponential smoothing
A type of moving average Easy to use Requires little record keeping of data
low value
A----- of B gives less weight to the recent trend and tends to smooth out the trend
Sales Force Composite
Allows individual salespersons estimates Reviewed for reasonableness Data is compiled at a district or national level
Correlation Coefficient
An expression of the strength of the linear relationship Always between +1 and -1
average season higher than average lower than average
An index of 1 indicates an An index > 1 indicates the season is An index < 1 indicates a season
model building
As the number of variables increases, the adjusted r2 gets smaller unless the increase due to the new variable is large enough to offset the change in k
Jury of Executive Opinion
Collects opinions of a small group of high-level managers May use statistical models for analysis
Measures of Forecast Accuracy
Compare forecasted values with actual values See how well one model works To compare models
Common qualitative techniques
Delphi method Jury of executive opinion Sales force composite Consumer market surveys
Assumptions of the Regression Model
Errors are independent Errors are normally distributed Errors have a mean of zero Errors have a constant variance
variance
Estimated using the mean squared error (MSE), s2
dependent
Explanatory or predictor variable
Multiple Regression Analysis
Extensions to the simple linear model
Time-Series Models
Extrapolations of past values of a series
Trend Projections
Fits a trend line to a series of historical data points
Components of a Time Series
Four possible components Trend (T) Seasonal (S) Cyclical (C) Random (R)
low
If the F statistic is large, the significance level (p-value) will be , - unlikely would have occurred by chance
Cautions and Pitfalls
If the assumptions are not met, the statistical test may not be valid Correlation does not necessarily mean causation Multicollinearity makes interpreting coefficients problematic, but the model may still be good Using a regression model beyond the range of X is questionable, as the relationship may not hold outside the sample data
rejected
If the null hypothesis can be ----, we have proven there is a relationship
useful
If there is very little error, MSE would be small and the F statistic would be large - model is
Time-Series Models
Ignores factors such as Economy Competition Selling price
collinear
In some cases variables contain duplicate information When two independent variables are correlated, they are said to be
nonlinear regression
In some situations, variables are not linear Transformations may be used to turn a nonlinear model into a linear model
Qualitative Models
Incorporate judgmental or subjective factors Useful when subjective factors are important or accurate quantitative data is difficult to obtain
Consumer Market Survey
Information on purchasing plans solicited from customers or potential customers Used in forecasting, product design, new product planning
Delphi Method
Iterative group process Respondents provide input to decision makers Repeated until consensus is reached
Trend Projections
Linear model developed using regression analysis is simplest
Measure of accuracy
Mean absolute deviation (MAD):
Measures of Forecast Accuracy
Mean squared error (MSE) Mean absolute percent error (MAPE) Bias is the average error
Multiple Regression Analysis
Models with more than one independent variable
Forecasting Random Variations
No other components are present Averaging techniques smooth out forecasts Moving averages Weighted moving averages Exponential smoothing
Testing the Model for Significance
Performing a statistical hypothesis test
Time-Series Models
Predict the future based on the past
Trend Projections
Projected into the future for medium- to long-range forecasts
simple linear regresssion
Random error
Seasonal Variations
Recurring variations over time may indicate the need for seasonal adjustments in the trend line
Main purpose of forecasting
Reduce uncertainty and make better estimates of what will happen in the future
simple linear regresssion
Regression models used to test relationships between variables
test statistic
Reject the null hypothesis if the ----- is greater than the F value from the table in Appendix D. Otherwise, do not reject the null hypothesis:
level of significance
Reject the null hypothesis if the observed significance level, or p-value, is less than the ----(). Otherwise, do not reject the null hypothesis:
Three measures of variability
SST - Total variability about the mean SSE - Variability about the regression line SSR - Total variability that is explained by the model
Subjective methods
Seat-of-the pants methods, intuition, experience
Selecting the Smoothing Constant
Selecting the appropriate value for alpha is key to obtaining a good forecast The objective is always to generate an accurate forecast The general approach is to develop trial forecasts with different values of alpha and select the alpha that results in the lowest MAD
Components of a Time Series
Sequence of values recorded at successive intervals of time
Multiple Regression Models
Similar to simple linear regression models The p-value for the F test and r2 interpreted the same The hypothesis is different because there is more than one independent variable The F test is investigating whether all the coefficients are equal to 0 at the same time
significance
Testing the model for ----- helps determine if the values are meaningful
b1 dont equal 0
The alternate hypothesis is that there is a linear relationship
model building
The best model is a statistically significant model with a high r2 and few variables As more variables are added to the model, the r2 value increases For this reason, the adjusted r2 value is often used to determine the usefulness of an additional variable The adjusted r2 takes into account the number of independent variables in the model
r2.
The coefficient of determination is
new variable
The easiest approach - develop a
Exponential Smoothing with Trend
The equation for the trend correction uses a new smoothing constant B Ft and Tt must be given or estimated Three steps in developing FITt 1, Compute smoothed forecast Ft+1 2Update the trend 3Calculate the trend-adjusted exponential smoothing forecast (FITt +1)
quadratic model
The nonlinear model is a
dummy variables
The number of ---- must equal one less than the number of categories of the qualitative variable
The coefficient of determination
The proportion of the variability in Y explained by the regression equation
Multiple Regression Models
The test statistic is calculated and if the p-value is lower than the level of significance (), the null hypothesis is rejected
Trend Projections
Trend equations can be developed based on exponential or quadratic models
simple linear regression
True values for the slope and intercept are not known Estimated using sample data
Time-Series Models
Two basic forms Multiplicative Demand = T x S x C x R Additive Demand = T + S + C + R Combinations are possible
Moving Averages
Used when demand is relatively steady over time The next forecast is the average of the most recent n data values from the time series Smooths out short-term irregularities in the data series
Time-Series Models
Uses only historical data on one variable
dependent
Value depends on the value of the independent variable(s)
trial-and-error approach
Values are often selected using a ---based on the value of the MAD for different values of
dependent variable or response variable
Variable to be predicted is called the
Testing the Model for Significance
We use the F statistic
multicollinearity
When --- is present, hypothesis tests for the individual coefficients are not valid but the model may still be useful
multicollinearity
When more than two independent variables are correlated,-----exists
reject the null hypothesis
When the F value is large, we can ------and accept that there is a linear relationship between X and Y and the values of the MSE and r2 are meaningful
MSE and r2
When the sample size is too small, you can get good values for ---- even if there is no relationship between the variables
ANOVA table
With software models, an --- is typically created that shows the observed significance level (p-value) for the calculated F value This can be compared to the level of significance () to make a decision
Errors
are assumed to have a constant variance (Q 2), usually unknown
Binary (or dummy or indicator) variables
are special variables created for qualitative data
Backward stepwise regression
begins with all the independent variables and deletes the least helpful
Regression models
can be developed for any variables X and Y
Describes an F distribution with
degrees of freedom for the numerator = df1 = k degrees of freedom for the denominator = df2 = n - k - 1
Exponential smoothing
does not respond to trends
Multiple regression models
have more than one independent variable
A seasonal index
indicates how a particular season compares with an average season
bo
intercept (value of Y when X = 0)
alpha
is a weight (or smoothing constant) with a value 0 ≤ alpah ≤ 1
A dummy variable
is assigned a value of 1 if a particular condition is met and a value of 0 otherwise
Regression analysis
minimizes the sum of squared errors
Simple linear regression
models have only two variables
Independent variable
normally plotted on X axis
Dependent variable
normally plotted on Y axis
k =
number of independent variables
n =
number of observations in the sample
Scatter diagram or scatter plot
often used to investigate the relationship between variables
With average error
positive and negative errors cancel each other out
A forward stepwise procedure
puts the most significant variable in first, adds the next variable that will improve the model the most
e
random error
b1
slope of the regression line
Stepwise regression
systematically adds or deletes independent variables
b1=0
the null hypothesis is that there is no relationship between X and Y
Weighted moving averages
use weights to put more emphasis on previous periods Often used when a trend or other pattern is emerging