BIT 3434 EXAM 1
In the article "THE WORLD OF BUSINESS ANALYTICS", what was the estimated amount of savings per year at United Airlines resulting from the installation of a Paragon scheduling system?
$1 million
If X is a normally distributed random variable with a mean of 50 and standard deviation of 5, what it the probability of X being equal to 40?
0.0
The following functional relationship must be represented in a spreadsheet. How many output (or target) cells are there?Y = f(X1 , X2 , ... , Xk )
1
The educational version of Analytic Solver Platform (ASP) allows for ______ trials per simulation.
10,000
Suppose a data set contains 2,000 observations with 20% "successes" and 80% "failures". If Analytic Solver's Data Mining tool is used to partition the data set with oversampling with a 50% success rate in the training sample, how many total observations will there be in the validation sample?
1000
In the article "THE WORLD OF BUSINESS ANALYTICS", approximately how many decision variables were used to model the yield management system at American Airlines?
250
When Tim Brownson was talking about gut-level decision making he equated chicken vindaloo with
6 alarm chili
Suppose that cell H15 is an output cell in a spreadsheet for which we have run a simulation. How could you compute the probability of that cell's value exceeding 500?
=1-PsiTarget(H15, 500)
A CVaR constraint should be applied to _____.
A PsiOutput() cell
According to Dan Ariely, if you go bar hopping who should you take with you?
A slightly uglier version of yourself.
A home's appraised value is likely ____ amount a bank would lend a person buying the property.
Above the max
What are the two common types of errors in human judgment?
Anchoring and framing
In a classification problem, the dependent (Y) variable is:
Categorical
The set of means for all the independent variables for a particular group is called a _______.
Centroid
In this lecture, decision making was described as
Choosing your favorite future
When a credit manager of a mortgage company identifies the loans as those resulting in default and those that are current, he/she uses:
Classification
When faced with a classification problem, careful consideration should be given to:
Composition of the training sample
As the number of replications (n) increases, the width of the confidence interval:
Decreases
Virginia Tech recently created a dashboard to track and report the number of COVID-19 cases present on campus. This is an example of:
Descriptive Analytics
Classification techniques differ from most other predictive statistical methods, such as regression analysis, because the dependent variable is:
Discrete
Prior to registration, the number of students in BIT 3434 next semester would be an example of a _____.
Discrete random variable
Which classification technique is provably optimal for data that is multivariate normal?
Discriminant Analysis
Which of the following is not a benefit of using a model to make a decision
Economy Timeliness Feasibility All of the above are potential benefits of using a model**
According to George Box, wrong models are not useful.
False
Best case / worst case analysis provides information about the distribution of possible values between the minimum and maximum values of an output cell.
False
In general, plugging in the expected values of uncertain cells in a spreadsheet will give you the expected value of the output cell(s).
False
Large positive correlation between X & Y implies X causes Y whereas large negative correlation between X & Y implies Y causes X.
False
Regression can only be used if the scatter plot of X and Y indicates a linear relationship.
False
The Average( ) functions excludes values of zero from its calculations.
False
The TREND() function reports the value of the intercept (bo) and slope (b1).
False
The regression model with the highest adjusted R2 value will also have the highest R2 value.
False
The scatter plot of two variables with zero correlation will form a straight horizontal line.
False
To be useful, models must not be a simplified version of what they represent.
False
When building multiple regression models you should include all the available independent (X) variables in the model
False
When using ASP, you can only have one PsiOutput( ) cell in your model.
False
___________ refers to how decision-makers view a problem from a win-loss perspective.
Framing effects
In this lecture, I described an example I often use in live classes involving biological males and biological females. In my description of this example, what were the biological females wearing that distinguished them from the biological males?
Glasses
How does ASP know what cell(s) to track when it runs a simulation?
It collects data on any cells containing the PsiOutput( ) function.
What is a computer model?
It is a set of mathematical relationships and logical assumptions implemented in a computer.
What country did I mention W. Edwards Deming helping after WWII?
Japan
Regression analysis finds the values of the parameter estimates that minimize the sum of squared estimation errors. This approach is referred to as:
Method of least squares
According to Tricia Wang, ____________________ is the unconscious belief in valuing the measurable over the immeasurable.
Nokia; quantification bias
The overall classification error rate on the training data ___________.
None
In the equation: Yi = 𝛽0 + 𝛽1X1 i + 𝜀i, 𝛽0 and 𝛽1 are known as:
Population Parameters
Which statistic summarizes a classification technique's accuracy when it predicts "positive"?
Precision
Your company produces 10 different products. Each product uses resources, such as money, material and labor that have limited availability. You are responsible for developing a mathematical model to determine how many units of each product to produce to minimize the total production cost. What type of model is this?
Prescriptive
Who came up with the 3-legged stool as a metaphor for decision making?
Prof. Ron Howard
What RNG function can be specified with user supplied values for the minimum, maximum, and most likely values?
PsiTraingular( )
If a nominal variables contains Q possible values, how many binary variables are needed to represent it in a linear model with an intercept?
Q-1
In the model: Y = f(X1, X2, ..., Xk) +ε, the term ɛ is called:
Random disturbance, or error
Which statistic summarizes a classification technique's ability to accurately recognize "positives" when they appear?
Recall
In addition to Solver, Excel provides another tool for solving regression problems that is easier to use and provides more information about a regression problem. This code is available in Excel under:
Regression
Recalculating the spreadsheet several hundred or several thousand times and recording the resulting values generated for the output cell(s), or bottom-line performance measure(s) is called:
Replication
When such a decision is made, some chance exists that the decision will not produce the intended results. This chance, or uncertainty, represents:
Risk
In a simple linear regression model, the estimated values of 𝛽0 and 𝛽1 are denoted b0 and b1, respectively. The values of b0 and b1 are called:
Sample statistics
If a classification technique always predicts "positive" what will be zero?
Specificity
Which statistic summarizes a classification technique's ability to accurately recognize "negatives" when they appear?
Specificity
In the following total cost (TC) equation, "F" is the fixed cost, "c" is the variable cost per unit and "Q" is the quantity produced. Which is (are) the dependent variable(s)?TC=F+cQ
TC
According to the Iceberg model
Thoughts cause feelings
A first step in applying regression analysis to a set of data is to prepare a scatter plot of the X and Y variables to help you understand the relationship between the variables.
True
According to Dr. Ragsdale, you should not trade what you want most for what you want now.
True
For a given set of data, the TREND() function reports the estimated value of Y at a specified value of X.
True
Including all the available independent variables in a multiple regression model will result in the highest R2 value.
True
Prof. Dan Ariely demonstrated that eliminating an alternative a decision maker doesn't want can change his or her rankings of the remaining alternatives.
True
The regression model with the highest adjusted R2 value will also result in the most precise predictions (i.e., have the smallest prediction interval widths).
True
Which leg of the 3-legged stool is most critical?
Values
In the article "THE WORLD OF BUSINESS ANALYTICS", when did the field of business analytics (called Operations Research at the time) begin?
WWII
In order to indicate the output cell (or cells) that we want Analytic Solver Platform (ASP) to track during the simulation we can use the:
a. "Add Output" button on the Analytic Solver Platform (ASP) menu.
Affinity analysis is:
a. A data mining technique aimed at discovering what goes with what.
In what-if analysis:
a. A manager changes the values of the uncertain input variables to see what happens to the bottom-line performance measure.
Simulation is:
a. A technique that measures various characteristics of the model bottom-line performance measure. b. A technique that describes various characteristics of the bottom-line performance measure. c. A technique that is useful when one or more values for the independent variables are uncertain. d. All of the above answers are correct.**
Analytic Solver Platform (ASP) has the ability to:
a. Automatically identify probability distributions that fit your historical data reasonably well.
A listing of all the available Analytic Solver Platform (ASP) RNG functions is:
a. Available in Excel, after ASP has been installed.
Decision makers expect that good decisions will always lead to good outcomes. Why is this not always the case?
a. Because unforeseeable circumstances beyond your control sometimes lead to bad outcomes even though the decision was sound.
Data mining tasks fall into the following potential categories:
a. Classification. b. Prediction. c. Association/Segmentation. d. All of the above are categories of data mining tasks.**
Analytic Solver Platform (ASP) provides several "Psi" functions that can be used to:
a. Create the RNGs required for simulating a model.
The Fisher's Linear Discriminant Function (FLDF):
a. Identifies a linear function for each of the m groups in a classification problem. b. Measures the strength of an observation's "membership" to the corresponding group. c. Is used to compute m classification scores c1(i), c2(i), ..., cm(i). d. All of the above are characteristics of FLDF.**
What is the step immediately following completion of the problem formulation step?
a. Implementing this formulation as a spreadsheet model.
A classification tree:
a. Is a graphical representation of a set of rules for classifying observations into two or more groups. b. Use a hierarchical sorting process consisting of splitting nodes and terminal nodes to group records from a data set into increasingly homogeneous groups. c. Are popular because the resulting classification rules are very apparent and easy to interpret. d. All of the above are characteristics of classification trees.**
The purpose of discriminant analysis (DA) is to:
a. Provide theoretically optimal or good classification results.
The "R Square" statistic in regression:
a. Provides a goodness-of-fit measure. b. Is referred to as the coefficient of determination. c. Ranges in value from 0 to 1. d. All of the above are correct.**
When faced with uncertainty, people do the following:
a. React with paralysis. b. Do exhaustive research. c. Avoid making a decision. d. All of the above answers are correct.**
Neural networks:
a. Simulate human learning. b. Are computer programs modeled after computing architecture of the human brain. c. Are a pattern recognition technique that attempts to learn what relationship exists between a set of input and output variables. d. All of the above characterize neural networks**
One advantage of calculating a confidence interval for a prediction, or prediction interval, of a new value of Y for a given value of X is:
a. That a prediction interval provides a lower bound on the fitted point value. b. That a prediction interval provides an upper bound on the fitted point value. c. That a prediction interval provides a confidence level (e.g. 95%) associated with the lower and upper bounds on the fitted point value. d. All of the above are advantages of constructing a prediction interval.**
Classification tree algorithms typically make choices in a way that minimizes the average weighted impurity of the resulting partitions. The two most common ways of measuring impurity are:
a. The Gini Index and the Entropy Measure.
Consider a model Ai = 𝛽0 + 𝛽1X1i + 𝜀i. You estimated the fitted line as Fi = b0 + b1X1i want to test whether the regression is significant using 𝛼=0.05 significance level and found the appropriate critical value of the t-statistic, t*=1.645. The value of the t-statistic found in the regression output, t(b1)=4.345. You may conclude that:
a. The regression is significant at 𝛼=0.05 significance level.
The benefit(s) of simulation include(s):
a. The results of simulation do give us greater insight into the problem. b. It gives a decision maker some idea of the best- and worst-case total outcomes for the problem. c. It provides an idea of the distribution and variability of the possible outcomes. d. All of the above answers are correct.**
A measure of the accuracy of the prediction obtained from a regression model is given by:
a. The standard deviation of the estimation errors, also known as the standard error, Se.
Multicollinearity is the term used to describe the situation when:
b. The independent variables in a regression model are correlated among themselves.
What is the end result of the problem identification step?
b. A well-defined statement of the problem.
In the article "THE WORLD OF BUSINESS ANALYTICS" it is noted that most business processes are by nature cross-functional. What type of expertise should be represented on the interdisciplinary team that tackles business process reengineering projects?
b. An operations research/management science (OR/MS) analyst.
Overfitting refers to a situation when the tree algorithm:
b. Classifies new observations less accurately than trees that do not overfit the training data.
What is a good decision?
b. It is one that uses a structured, data-driven, and model-based process to make decisions.
Clicking the "Fit" icon found within the "Tools" group on the Analytic Solver Platform (ASP) tab:
b. Launches the Fit Options dialog.
What is the major benefit of modeling?
b. Models allow us to gain insight and understanding about the object or decision problem under investigation.
What is the purpose of implementation step in the problem solving process?
b. Preparing a message that is understood by various stakeholders in an organization and persuading them to take a particular course of action.
The manager hopes that using the expected, or most likely, values for all the uncertain variables will:
b. Provide the most likely value for the bottom-line performance measure (Y).
A random variable is a:
b. Variable whose value cannot be predicted or set with certainty.
Logistic regression is:
c. A classification technique that estimates the probability of an observation belonging to a particular group.
Cluster analysis is:
c. A data mining technique used to identify meaningful groupings of records within a data set.
Regression analysis is:
c. A modeling technique for analyzing the relationship between a continuous (real-valued) dependent variable Y and one or more independent variables X1, X2, ...Xn.
The Mahalanobis distance measure in Discriminant Analysis (DA):
c. Accounts for differences in the covariances between all possible pairings of the independent variables.
When using Solver to estimate the parameters of a simple linear regression model, the objective function is to minimize ESS = with no constraints. This is an example of:
c. An unconstrained nonlinear optimization problem.
In Discriminant Analysis, the F1 score:
c. Combines the precision and recall measures to provide an overall measure of a classifier's accuracy.
The k-nearest neighbor (k-NN) technique:
c. Identifies the k observations in the training data that are most similar (or nearest) to a new observation we want to classify.
What is a mathematical model?
c. It is a model that uses mathematical relationships to describe or represent an object or decision problem.
What is a physical model?
c. It is a real representation of an object, such as a car, that is built to a smaller scale for evaluation and testing purposes.
What is business analytics?
c. It is synonymous to management science.
When attempting to optimize a simulation model (also known as simulation optimization) we typically want to:
c. Maximize or minimize the average value of (or some other statistic describing) the cell representing the objective or bottom-line performance measure.
The regression sum of squares (RSS) represents:
c. The amount of variation in Y around its mean that the regression function can account for.
The difference (Yi - ) is referred to as:
c. The estimation error, or residual, for observation i.
The "multiple R" statistic of the regression output represents:
c. The strength of the linear relationship between actual and estimated values for the dependent variable.
Oversampling tends to ________.
create models that better delineate between groups.
The analysis of variance (ANOVA) provides an efficient way to statistically test whether:
d. A regression model is significant overall.
The TREND( ) function in Excel can be used to:
d. Calculate the estimated values for linear regression model.
The first step to create the training and validation data set using XLMiner Platform is:
d. Click the Partition icon in the Data Mining section.
The term "data mining":
d. Encompasses a variety of analytic techniques that can be used to help managers analyze, understand, and extract value from large sets of data.
In Discriminant Analysis (DA), precision is a measure of:
d. How accurate the classifier is when it predicts a "success".
Adjusted R2 statistic is recommended in multiple regression because:
d. It accounts for the number of independent variables included in a regression model.
The first step in performing a simulation in a spreadsheet is:
d. Placing a random number generator (RNG) formula in each cell that represents a random, or uncertain, independent variable.
If we don't know what value a particular cell in a spreadsheet will assume and enter a number that we think is the most likely value for the uncertain cell, we can calculate the most likely value of the bottom-line performance measure. This is called:
d. The base-case scenario.
The challenge with data availability today is getting:
d. The right data in the right amount for the problem at hand.
In multiple linear regression models:
d. There can be more than one independent variable and all terms in the regression model must be linear.
When selecting an RNG for an uncertain variable in a model, it is important to consider:
d. Whether the variable can assume discrete or continuous values.
Sensitivity analysis is useful when:
d. You may be interested in examining how sensitive the simulation output results are to various uncertain input cells in the model.
What made the oracle at the Temple of Apollo woozy?
ethylene gas
Priming occurs when
exposure to one stimulus influences one's reaction to another unrelated stimulus
The approach to variable selection I described in this lecture is called _____.
forward step-wise
In simulation, we assume the ________________ known with certainty.
function f()
What was the point of Deming's red bead experiment?
in order to manage something well & fairly you need to understand its inherent uncertainty
When conducting a simulation, the final scenario shown on the spreadsheet is _______.
just the last one ASP happened to generate
Check My Work In the following functional relationship, how many independent variables are there?Y = f(X1 , X2 , ... , Xk )
k
The following functional relationship must be represented in a spreadsheet. How many input cells are there?Y = f(X1 , X2 , ... , Xk )
k
The secret of making good decisions is
knowing what you want before you choose
The model coefficients in logistic regression are estimated using ____.
maximum likelihood estimation
A higher reorder point will ______ . (Select all that apply.)
tend to increase average inventory tend to reduce the chance of a stock out tend to increase service level
According to Dr. Ragsdale "shaking the ladder" is a metaphor for
testing the results of a model
We defined service level as _______.
total demand met / total demand
ASP's Fit command _______ .
will suggest probability distributions for a given set of data
An element of unsystematic or random variation in the dependent variable is expressed by ___________ in equation f(X1, X2, ..., Xk) + ε.
ε
