Lecture 7 - Business Analytics - MIS171 (Predictive analytics part 1: Simple Linear Regression and Prediction)
Explain the *analytics process* (6 stages)
- Business Understanding (business problems _> analytics problem) - Data understanding (strength an limitation, biases, cost, reliability, availability) - Data preparation (converting - Modelling (build a predictive model to answer the analytics problem) - Evaluation (Assess developed model(s) rigorously to gain confidence that they are valid and reliable - Deployment (implementing a predictive model in some information systems or business process)
Most common predictive modelling tasks
- Estimation (predicting a continuous value) - Classification (classify a *categorical* outcome into one or two or more categories based on various data attributes) - Clustering (finding natural groups or clusters in data) - Association (identify attributes that frequently occur together) - Text mining (extracting information from unstructured data) - Anomaly detection (finding changes or outliers)
What are the two purposes for Regression analysis?
- Explanation - Prediction
What are three things you try to look for on a scatter diagram?
- no relationships - linear relationships - non-linear relationships
Interpreting b0
1. Geometrically • On the graph, b0 is where the line cuts the vertical axis. • Our example: The line cuts the Y axis at 8.48%. 2. Algebraically • b0 isthevalueofYwhenX=0. • Our example: Y = 8.48 when X = 0 workers. 3. Practically • b0 will not always have a useful interpretation as X = 0 may be well outside the range of X values used for the regression equation. Sometimes it is useful. • Our example: "Percentage (8.48) of customers who complained, on average, when there are no workers" is nonsensical.
Briefly describe the *analytics process* (generally)
Analytics is the process of converting a business problem into an analytics problem, i.e. one that analytics can solve.
Interpreting b1
Geometrically • On the graph, b1 is the slope of the line. • Our example: The slope is -0.144. Algebraically • b1 is the change in the value of Y when X changes by 1. • Our example: If X increases by 1, Y decreases by 0.144%. Practically • b1 indicates the impact on Y from a change in X. • Our example: Very useful, as it implies that "for each extra worker employed on a shift, on average, the percentage of complaints decreases by 0.144%".
Describe Regression
Gives the mathematical model of a relationship between two variables
Describe a Scatter Diagram
Graphical representation of a possible relationship between two variables
Describe Correlation
Measures the strength and direction of a linear relationship between two variables
How should the dependent and independent variable be plotted on a scatter diagram in terms of axis?
The dependent variable should always be on the vertical Y axis. The independent variable should be on horizontal X axis.
*Exam* Learn how to interpret B0 and B1 from a mathematical equation
The estimated simple linear regression equation, is given by: ˆ Y=b+bX 01 Where: • • • • Yˆis the dependent variable X is the independent variable b0 is the Y-intercept (i.e. where the line cuts the vertical axis) b1 is the slope of the line
The estimated linear equation
Y=b0+b1X Where: Yˆis the dependent variable X is the independent variable b0 is the Y-intercept (i.e. where the line cuts the vertical axis) b1 is the slope of the line
Explain the types of variables which are used for regression analysis?
You cannot have more than one dependent variable. Always one dependent variable plus a number of independent variables (at least one). E.g. Investigating a possible relationship between sales and price data.
What type of diagram is best used to show the nature of the relationship between two numerical variables?
a *scatter diagram*
The regression coefficients "b0" and "b1" can be interpreted in three ways...
• Geometrically (i.e., graphically) • Algebraically (i.e., in equation form) • Practically (i.e., practical interpretation)