BANA 2 - Exam 1

utility theory

Assigns values to outcomes based on the decision maker's attitude toward risk, loss, and other factors

Prescriptive Analytics

Indicates a best course of action to take

Regression Model

The equation that describes how y is related to x and an error term Simple Linear Regression Model: y = β0 + β1x + ε Intercept + Variable 1 x(variable unknown) Parameters: The characteristics of the population, β0 and β1 Random variable: Error term, ε. The error term accounts for the variability in y that cannot be explained by the linear relationship between x and y

Data security

The protection of stored data from destructive forces or unauthorized users

Experimental region

The range of values of the independent variables in the data used to estimate the model The regression model is valid only over this region

Coefficient of Determination

The ratio SSR/SST used to evaluate the goodness of fit for the estimated regression equation. Interpreted as the percentage of the total sum of squares that can be explained by using the estimated regression equation R2 = SSR / SST


The reader needs to refer to specific numerical values The reader needs to make precise comparisons between different values and not just relative comparisons The values being displayed have different units or very different magnitudes Table Design Principles Avoid using vertical lines in a table unless they are necessary for clarity Horizontal lines are generally necessary only for separating column titles from data values or when indicating that a calculation has taken place


Useful for visualizing hierarchical data along multiple dimensions

Dependent variable

Variable being predicted y = dependent variable x = independent variable

Independent variable

Variables being used to predict the value of the dependent variable y = dependent variable x = independent variable


Visual methods of displaying data

The geographic information system

_____ merges maps and statistics to present data collected over different geographies.


involves the use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision

scatter chart

is a graphical presentation of the relationship between two quantitative variables


is a line that provides an approximation of the relationship between the variables

Which one of the following statements is not true concerning PivotTables in Excel?

PivotTables can only be used if one variable is categorical and the other is quantitative data


Prediction of the value of the dependent variable outside the experimental region It is risky

advanced analytics

Predictive and prescriptive analytics are sometimes referred to as

Deleting the grid lines in a table and the horizontal lines in a chart

increases the data-ink ratio

Data-ink is the ink used in a table or chart that

is necessary to convey the meaning of the data to the audience.

A disadvantage of stacked-column charts and stacked-bar charts is that

it can be difficult to perceive small differences in areas.


slide 149 refers to the correlation among the independent variables in multiple regression analysis In t tests for the significance of individual parameters, the difficulty caused by multicollinearity is that it is possible to conclude that a parameter associated with one of the multicollinear independent variables is not significantly different from zero when the independent variable actually has a strong relationship with the dependent variable

Prescriptive analytics

use techniques that take input data and yield a best course of action.

Strategic Decisions

Involve higher-level issues concerned with the overall direction of the organization. Define the organization's overall goals and aspirations for the future

Dummy variable

A variable used to model the effect of categorical independent variables in a regression model which generally takes only the value zero or one is called

Operational decisions

Affect how the firm is run from day to day. Are the domain of operations managers, who are the closest to the customer.

Combined PPT Chapter Start Dates

Chapter 3: 38 Chapter 7: 96

Parallel-coordinates plot

Chart for examining data with more than two variables Includes a different vertical axis for each variable Each observation is represented by drawing a line on the parallel coordinates plot connecting each vertical axis The height of the line on each vertical axis represents the value taken by that observation for the variable corresponding to the vertical axis

An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a

Clustered column chart

Optimization models

Models that give the best decision subject to constraints of the situation

Marketing Analytics

Fastest growing out of Financial and HR A better understanding of consumer behavior through the use of scanner data and data generated from social media has led to an increased interest in marketing analytics


Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as

Bubble chart

Graphical means of visualizing three variables in a two-dimensional graph. Sometimes a preferred alternative to a 3-D graph

Scatter chart

Graphical presentation of the relationship between two quantitative variables

Bubble Chart

In order to visualize three variables in a two-dimensional graph, we use a


A line chart that has no axes but is used to provide information on overall trends for time series data is called a

Trend line

A line that provides an approximation of the relationship between the variables

Data dashboards

Collections of tables, charts, maps, and summary statistics that are updated as new data become available Uses of dashboards: To help management monitor specific aspects of the company's performance related to their decision-making responsibilities For corporate-level managers, daily data dashboards might summarize sales by region, current inventory levels, and other company-wide metrics Front-line managers may view dashboards that contain metrics related to staffing levels, local inventory levels, and short-term sales forecasts

Simulation optimization:

Combines the use of probability and statistics to model uncertainty with optimization techniques to find good decisions in highly complex and highly uncertain

Pie charts

Common form of chart used to compare categorical data

Tactical decisions

Concern how the organization should achieve the goals and objectives set by its strategy. Are usually the responsibility of midlevel management

Predictive analytics

Consists of techniques that use models constructed from past data to predict the future or ascertain the impact of one variable on another. Survey data and past purchase behavior may be used to help predict the market share of a new product Techniques used Linear regression Time series analysis Data mining is used to find patterns or relationships among elements of the data in a large database; often used in predictive analytics

Data Visulation

Creating a summary table for the data. Generating charts to help interpret, analyze, and learn from the data Uses of data visualization: Helpful for identifying data errors Reduces the size of your data set by highlighting important relationships and trends in the data

Line Chart

DJ needs to display data over time. Which of the following charts should he use?

Descriptive analytics

Data dashboards are a type of _________ analytics

Descriptive analytics

Encompasses the set of techniques that describes what has happened in the past; examples Data queries Reports Descriptive statistics Data visualization (including data dashboards) Data-mining techniques Basic what-if spreadsheet models

quadratic regression

Process to make regression 1. Square data set in question (typically b1) 2. Make regression with b1 and b2 (squared data set) 3. draw conclusion An estimated regression equation given by: 𝑦 ̂ = b0 + b1x1 + b2𝑥_1^2 + e In the Reynolds example, to account for the curvilinear relationship between months employed and scales sold we could include the square of the number of months the salesperson has been employed in the model as a second independent variable

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio?


Multiple regression

Regression analysis involving one dependent variable and more then one independent variable known as

Multiple linear regression

Regression analysis involving two or more independent variables


Results from creating an overly complex model to explain idiosyncrasies in the sample data Results from the use of complex functional forms or independent variables that do not have meaningful relationships with the dependent variable If a model is overfit to the sample data, it will perform better on the sample data used to fit the model than it will on other data from the population Thus, an overfit model can be misleading about its predictive capability and its interpretation

Grouping in Excel

Right click on first data set Group Enter in custom directions

Sum of Squares

SST = SSR + SSE Measures how much the 𝑦 ̂ values on the estimated regression line deviate from 𝑦 ̅

Stepwise procedure

The analyst establishes both a criterion for allowing independent variables to enter the model and a criterion for allowing independent variables to remain in the model In the first step of the procedure, the independent variable that best satisfies the criterion for entering the model is added First, the remaining independent variables not in the current model are evaluated, and the one that best satisfies the criterion for entering is added to the model Then the independent variables in the current model are evaluated, and the one that violates the criterion for remaining in the model to the greatest degree is removed The procedure stops when no independent variables not currently in the model meet the criterion for being added to the regression model, and no independent variables currently in the model violate the criterion for remaining in the regression model


The degree of correlation among independent variables in a regression model

straight line regression

The graph of the simple linear regression equation is a(n)

Bar Charts

Use horizontal bars to display the magnitude of the quantitative variable Bar and column charts are very helpful in making comparisons between categorical variables

Column Charts

Use vertical bars to display the magnitude of the quantitative variable Bar and column charts are very helpful in making comparisons between categorical variables

Decision analysis

Used to develop an optimal strategy when a decision maker is faced with several decision alternatives and an uncertain set of future events

Scatter chart matrix

Useful chart for displaying multiple variables

The charts that are helpful in making comparisons between categorical variables are

bar charts and column charts

Making visual comparisons between categorical variables is difficult in a

pie chart

In many cases, white space in a chart can improve


What would be the Coefficient of determination if the total sum of squares (SST) is 23.29 and the sum of squares due to regression (SSR) is 10.03

.42 (SSR / SST) 10.03 / 23.29


(Quiz) refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable. (PPT) This occurs when the relationship between the dependent variable and one independent variable is different at various values of a second independent variable


(quiz) Assessing the regression model on data other than the sample data that was used to generate the model is known as (PPT) If you have access to a sufficient quantity of data, assess your model on data other than the sample data that were used to generate the model It is recommended to divide the original sample data into training and validation sets

Training set

(quiz) is the data set used to build the candidate models. (PPT) The data set used to build the candidate models that appear to make practical sense

Validation set

(quiz) refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable. (PPT) The set of data used to compare model performances and ultimately pick a model for predicting values of the dependent variable

What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 25.32 and the sum of squares due to error (SSE) is 6.89?


Based on the assigned web article, in the Dodgers Front Office how many people have the term analyst or research in their titles?


Never use a ________ chart when a __________ chart will suffice.

3-D; 2-D

Data dashboard

A data visualization tool that updates in real time and gives multiple outputs is called Data visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available Key performance indicators (KPIs) Automobile dashboard: Current speed, Fuel level, and oil pressure Business dashboard: Financial position, inventory on hand, customer service metrics

Line chart

A line connects the points in the chart Useful for time series data collected over a period of time (minutes, hours, days, years, etc.)


A programming model used within Hadoop that performs two major steps: the map step and the reduce step

Simple linear regression

A regression analysis for which any one unit change in the independent variable, x, is assumed to result in the same change in the dependent variable, y

simple linear regression

A regression analysis involving one independent variable and one dependent variable is referred to as a

Data query

A request for information with certain characteristics from a database

Big data

A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time Represents opportunities Presents challenges in terms of data storage and processing, security, and available analytical talent More companies are hiring data scientists who know how to process and analyze massive amounts of data

Geographic Information Systems (GIS):

A system that merges maps and statistics to present data collected over different geographies Helps in interpreting data and observing patterns

Heat map

A two-dimensional graphical representation of data that uses different shades of color to indicate magnitude. Typically should not use red and green (color blind people)

Crosstabulation (Pivot Table)

A useful type of table for describing data of two variables

The company identified in Chapter 7, Analytics in Action: The Analytics in Action example in Chapter 7 concerned: Chapter 7 focuses on:

Alliance Data Systems Predicting the effect of advertising Linear regression

Stacked column chart

Allows the reader to compare the relative values of quantitative variables for the same category in a bar chart

Clustered column (or bar) chart

An alternative chart to stacked column chart for comparing quantitative variables

Confidence interval

An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence


An open-source programming environment that supports big data processing through distributed storage and processing over multiple computers

Predictive analytics

Are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.

The software package most commonly used for creating simple charts is


key performance indicators.

In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as

Independent variable

In a linear regression model, the variable (or variables) used for predicting or explaining values of the response variable are known as the ________________. It(they) is(are) denoted by x.

Confidence level

Indicates how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating

Data ink ratio

Measures the proportion of what Tufte terms "data-ink" to the total amount of ink used in a table or chart Edward R. Tufte first described the data-ink ratio Helpful for creating effective tables and charts for data visualization Data-ink: Ink used in a table or chart that is necessary to convey the meaning of the data to the audience Non-data-ink: Ink used in a table or chart that serves no useful purpose in conveying the data to the audience

Statistical inference

Process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through the analysis of sample data drawn from the population Consider both hypothesis testing and interval estimation

Business analytics

Scientific process of transforming data into insight for making better decisions. Used for data-driven or fact-based decision making, which is often seen as more objective than other alternatives for decision making

Best-subsets procedure

Simple linear regressions for each of the independent variables under consideration are generated, and then the multiple regressions with all combinations of two independent variables under consideration are generated, and so on Once a regression has been generated for every possible subset of the independent variables under consideration, an output that provides some criteria for selecting regression models is produced for all models generated

Least squares method

Slides 106 - 125 A procedure for using sample data to find the estimated regression equation Determine the values of b0 and b1 Interpretation of b0 and b1:


The value of the independent variable at which the relationship between dependent variable and independent variable changes

Tables should be used instead of charts when

The values being displayed have different units or very different magnitudes

Testing Individual Regression Parameters:

To determine whether statistically significant relationships exist between the dependent variable y and each of the independent variables x1, x2, . . . , xq individually If a βj = 0, there is no linear relationship between the dependent variable y and the independent variable xj If a βj ≠ 0, there is a linear relationship between y and xj

Pivot Chart

To summarize and analyze data with both a crosstabulation and charting, Excel pairs PivotCharts with PivotTables

quadratic regression model

Which of the following regression models is used to model a nonlinear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model?

Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable


An effective display of trend and magnitude is achieved by using a combination of a

heat map and sparklines

The difference between the observed value of the dependent variable and the predicted using the estimated regression equation is known as the


F Test

used to Test the hypothesis that the values of the regression parameters b0, b1, b2.... are all zero 𝐹= (SSR/𝑞)/(SSE/(𝑛 −𝑞 −1)) SSR = Sum of squares due to regression SSE = Sum of squares due to error q = the number of independent variables in the regression model n = the number of observations in the sample Larger values of F provide stronger evidence of an overall regression relationship

