SOM 307 Final Exam
A manager of a fast food restaurant wants the drive-thru employee to ask every fifth customer if he or she is satisfied with the service. Who makes up the population?
all customers who use the drive-thru window of this fast food restaurant
_____ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed.
Internet of Things (ioT)
_____ attempts to estimate the quantitative outcome as a linear function of explanatory variables.
Linear regression
A dashboard is a collection of tables, charts, and maps to help management _____ selected aspects of the company's performance.
Monitor
A time series plot of a period of time (quarterly) versus quarterly sales (in $1,000s) is shown below. What is the type of the time series data ?
Non-stationary
The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour.
Poisson
Let us say that we have a set of emails which are labeled as spam or not spam. Using this data, your task is to determine which future email can be potentially a spam. What kind of analytics it is ?
Predictive Analytics
In the spectrum of business analytics, which is the most complex?
Prescriptive
Which of the following analytical techniques helps us arrive at the best decision?
Prescriptive Analytics
What is the standard method to compute the lower bound to detect outliers using the box plot ? (IQR: Inter-quartile range)
Q1-1.5*IQR
A characteristic or quantity of interest that can take on different values is a(n) _____.
Variable
One of the 4 Vs of big data that refers to uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, and model approximations is _____.
Veracity
A mathematical model that gives the best decision, subject to the situation's constraints, is an a(n) _____.
optimization model
A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of _____.
predicitve analytics
Estimation methods are also referred to as _____.
prediction methods
A forecast is defined as a(n) ______.
prediction of future values of a time series
_____ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.
predictive
_____ analytics use techniques that take input data and yield a best course of action.
prescriptive
An analytics that uses machine learning to to decide the best course of action based on a computer program's predictions.
prescriptive analytics
Nonnegativity constraints ensure that _____.
the solution to the problem will contain only nonnegative values for the decision variables
If the corelation between two variables is near 0, it implies that ______.
the variables are not linearly related
If a time series plot exhibits a horizontal pattern, then _____.
there is still not enough evidence to conclude that the time is stationary
All of the following are the examples of continuous random variable ?
time
A set of observations on a variable measured at successive points in time or over successive periods of time constitute a _____.
time series
Data collected from several entities over a period of time (minutes, hours, days, etc.) are called _____.
time series data
While building predictive models, why we split observations into groups of training and testing data?
to avoid overfitting
Data used to build a data mining model is called _____.
training data
The moving averages method refers to a forecasting method that _____.
uses the average of the most recent data values in the time series as the forecast for the next period
In the graph of the simple linear regression equation, the parameter ß0 represents the _____ of the true regression line.
y-intercept
A _____ determines how far a particular value is from the mean relative to the data set's standard deviation.
z-score
A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. What is the probability that less than 8000 steps were taken ?
.091
Let X be a random variable with a Uniform distribution between 8 and 20. Find the probability that X is less than 10?
.16
A nickel and a dime are tossed. If an event is defined as a single toss of both coins where at least one head appears, what is the probability of the complement of that event?
.25
Reviews of call center representatives over the last three years showed that 10% of all call center representatives were rated as outstanding, 75% were rated as excellent/good, 10% percent were rated as satisfactory, and 5% were considered unsatisfactory. For a sample of 10 reps selected at random, what is the probability that 1 will be rated as unsatisfactory?
.315
Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive-thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes less than two minute to fill an order? (you may use excel)
.736
A bucket contains 3 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and is not replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is yellow?
1/11
A survey of 100 random high school students finds that 85 students watched the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either game?
10
A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. What percent of the days does he exceed 13,000 steps?
2.28%
The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus?
30%
The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 75 miles per gallon or better?
50%
The random variable X is known to be uniformly distributed between 2 and 12. Compute E(X), the expected value of the distribution.
7
A bucket contains 2 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and then replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is yellow?
8/121
According to company records, 5% of all automobiles brought to Geoff's Garage last year for a state-mandated annual inspection did not pass. Of the next 10 automobiles entering the inspection station, what is the probability that more than 5 will not pass inspection?
=1-BINOM.DIST(5, 10, 0.05, TRUE)
Prescriptive analytics is a process of generating instant recommendations by ........?
Analyzing data
Data that are too large or too complex to be handled by standard data-processing techniques and typical desktop software are called _____.
Big data
You want to detect if there are any outliers in the variable ("Age of car (months)"), which of the following is the best visualization technique to do so?
Boxplot
Which of the following best exemplifies big data?
Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.
A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction.
Data mining
The use of analytical techniques for better understanding patterns and relationships that exist in large data sets is _____.
Data mining
_____ are analytical tools that describe what has happened.
Descriptive analytics
Which of the following is not an approach to making decisions?
Guess and Check
You want to visualize the distribution of annual maintenance cost, which type of plot will help to achieve the task ?
Histogram
_____ is the most critical step of the decision-making process.
Identifying and defining the problem
The coefficient of determination (R 2 or R-squared) is 0.82. What can you infer about the quality of the regression fit ?
R squared is close to 1 therefore the fit is good
_____ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.
Supervised learning
In a linear programming model, the nonnegativity constraints mean that decision variables can take on any value greater than or equal to zero.?
Yes
A student willing to participate in a debate competition is required to fill out a registration form. State whether each of the following information about the participant provides categorical or quantitative data. a. What is your birth month? b. Have you participated in any debate competition previously? c. If yes, in how many debate competitions have you participated so far? d. Have you won any of the competitions? e. If yes, how many have you won?
a. Categorical, b. Categorical, c. Quantitative, d. Categorical e. Quantitative
In problem formulation, the _____.
objective is expressed in terms of the decision variables.
Autoregressive models _____.
occur whenever all the independent variables are previous values of the time series
A(n) _____ matrix displays a model's correct and incorrect classification.
confusion
Rob is a financial manager with Sharez, an investment advisory company. He must select specific investments—for example, stocks and bonds—from a variety of investment alternatives . Restrictions on the type of permissible investments would be a _____ in this case.
constraint
An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a
continuous random variable
_____ are collected from several entities at the same point in time.
cross sectional data
Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen.
data dashboards
Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of _____.
data exploration
The U.S. Internal Revenue Service uses _____ to identify patterns that distinguish questionable annual personal income tax filings.
data mining
_____ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling by addressing missing and erroneous data.
data preparation
_____ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.
data sampling
A controllable input for a linear programming model is known as a _____.
decision variable
A multiple regression model for predicted heart rate is as follows: heart rate = 10 - 0.5*(run speed) + 12*(body weight). As the run speed increases by 1 unit (holding body weight constant), heart weight is expected to ?
decrease by .5
In a linear regression model, the variable that is being predicted or explained is known as _____.
dependent variable
Data dashboards are a type of _____analytics.
descriptive
The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by _____.
determining how well a particular forecasting method is able to reproduce the time series data that are already available
A variable used to model the effect of categorical independent variables in a regression model is known as a _____.
dummy variable
Classifying a record as belonging to one class when it belongs to another class is referred to as a(n) _____.
error
In the simple linear regression model, the _____ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables.
error term
A test set is the data set used to ______.
estimate performance of the final model on unseen data
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of _____.
estimation of a continuous outcome
Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2, . . . , xq that are outside the experimental range is called _____.
extrapolation
An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n) _____.
false positive
A(n) _____ solution satisfies all the constraint expressions simultaneously.
feasible
A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n) _____.
frequency distribution
A _____ is a graphical summary of data previously summarized in a frequency distribution.
histogram
The value of an independent variable from the prior period is referred to as a _____.
lagged variable
_____ refers to the degree of correlation among independent variables in a regression model.
multicollinearity
In a normal distribution, which is greater, the mean or the median?
neither the mean or the median
Constraints are _____.
restrictions that limit settings of the decision variables.
Data-driven decision making tends to decrease a firm's _____.
risk
_____ acts as a representative of the population.
sample
A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between two quantitative variables.
scatter chart
A time series that shows a recurring pattern over one year or less is said to follow a _____.
seasonal pattern
Identify the shape of the distribution in the figure below.
skewed right
_____ is one minus the Class 0 error rate.
specificity
A __________ describes the range and relative likelihood of all possible values for a random variable.
statistical distribution
_____ is NOT a step of the data mining process.
supervised learning
A mathematical procedure for using sample data to estimate regression parameters is _____.
the least squares method
In a simple linear regression model, y = ß0 + ß1x + ε the parameter ß1 represents the _____.
the slope of the regression line