QBA Final Exam Study Guide
In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump for President. Determine a 95% confidence interval for the proportion of all the registered voters who will vote for Trump.
(0.25,0.34)
The correlation coefficient will always take values
-1 to +1
Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)
-1.33
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?
0.35
The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus?
30%
Use technology to compute the standard deviation for the following sample data.
6.75
The distribution of hourly sales for a local family owned store is normally distributed with a mean of $225 per hour and a standard deviation of $75 per hour. Which of the following intervals contains the middle 95% of hourly sales?
75-375
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
Compute the IQR for the following data.
9.50
One minus the overall error rate is often referred to as the __________ of the model.
Accuracy
The SUM function in Excel
Adds up all the numbers in a range of cells
The population parameters that describe the y-intercept and slope of the line relating y & x, are
B0 & B1
A chart that is recommended as an alternative to a pie chart is a
Bar chart
A better understanding of consumer behavior through analytics leads to
Better pricing strategies
A __________ refers to a constraint that can be expressed as an equality at the optimal solution.
Binding constraint
Which of the following graphs provides information on outliers and IQR of a data set
Box plot
Within a given range of cells, the number of times a particular condition is satisfied is computed by using the __________ function.
COUNTIF
__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters.
Centroid Linkage
A_____classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules
Classification tree
The modeling process begins with the framing of a _______model that shows the relationships between the various parts of the problem being modeled
Conceptual
An initial estimate of the probabilities of events is a __________ probability.
Conditional
The __________ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.
Confidence level
An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a
Continuous random variable
The data dashboard for a marketing manager may have KPIs related to
Current sales measures and sales by region
__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.
Data Partitioning
When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses __________ to develop an optimal strategy.
Decision analysis
A _______refers to a model input that can be controlled in a spreadsheet model
Decision variable
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a
Dendrogram
A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.
Durability
Prediction of the value of the dependent variable outside the experimental region is called
Extrapolation
The __________ function in Excel is used to compute the statistics required to create a histogram.
Frequency
The_______the lift ratio, the ___________the association rule
Higher; stronger
Consider a clustered bar chart of the dashboard developed to monitor the performance of a call center. This chart allows the IT manager to
Identify the frequency of a particular type of problem by location
______Is a measure of the heterogeneity of observations in a classification tree
Impurity
A one-tailed test is a hypothesis test in which the rejection region is
In one tail of the sampling distribution
In a linear regression model, the variables used for predicting or explaining values of the response variable are known as the _____
Independent variable
A(n) __________ is a visual representation that shows which entities affect others in a model.
Influence diagram
__________ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.
Interaction
In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as
Key performance indicators
The value of an independent variable from the prior period is referred to as a
Lagged Variable
The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.
Lift
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.
Logistic regression
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
Matching coefficient
Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another
Mean Forecast Error
In a normal distribution, which is greater, the mean or the median?
Neither the mean or the median (they are equal)
Euclidean distance can be used to measure the distance between __________ in cluster analysis.
Observations
______refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.
Overfitting
Which one of the following statements is not true concerning PivotTables in Excel?
PivotTables can be built using data arrayed in rows.
A simple random sample of 31 observations was taken from a large population. The sample mean equals 5. Five is a
Point estimate
The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour.
Poisson
_________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.
Predictive
A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of
Predictive Analytics
In the spectrum of business analytics, which is the most complex?
Prescriptive
Which of the following analytical techniques helps us arrive at the best decision?
Prescriptive
An initial estimate of the probabilities of events is a ______probability
Prior
The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio
Reduces the data-ink ratio
__________ is a statistical procedure used to develop an equation showing how two variables are related.
Regression analysis
The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for
Regression trees
Which of the following gives the proportion of items in each bin?
Relative frequency
Determine whether the alternative hypothesis is left-tailed, right-tailed, or two-tailed: H0: μ = 11, Ha: μ > 11.
Right tailed
The value of the ___________ is used to estimate the value of the population parameter.
Sample statistic
________ are used in the pharmaceutical industry to assess the risk of introducing a new drug
Simulations
B1 represents
Slope of the true regression line
Using a large value for order k in the moving averages method is effective in
Smoothing out random fluctuations
When working with large spreadsheets with many rows of data, it can be helpful to _______the data to better find, view, or manage subsets of data.
Sort and filter
A line chart that has no axes but is used to provide information on overall trends for time series data is called a
Sparkline
The graph of the simple linear regression equation is a(n)
Straight line
__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.
Supervised Learning
The VLOOKUP with range set to ______takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument
TRUE
A________decision is concerned with how the organization should achieve the goals and objectives set by its strategy
Tactical
The process of extracting useful information from text data is known as
Text mining
An exponential trend pattern occurs when
The amount of increase between periods in the value of the variable is constant
Which of the following statements is correct?
The binomial distribution is a discrete probability distribution and the normal distribution is a continuous probability distribution.
A procedure for using sample data to find the estimated regression equation is
The least squares method
Which of the following is a discrete random variable?
The number of times a student guesses the answers to questions on a certain test
Which of the following is not true of a stationary time series
The time series plot is a straight line
If covariance between two variables is near 0, it implies that
The variables are not linearly related
All of the following are examples of discrete random variables except
Time
A light bulb manufacturer uses descriptive analytics
To present supply chain to managers visually
Which of the following statements is the objective of the moving averages and exponential smoothing methods?
To smooth out random fluctuations in the time series
In the text mining process, the text is first preprocessed be deriving a smaller set of ________ from the larger set of words contained in a collection of document.
Tokens
Arrows pointing from the selected cell to cells that depend on the selected cell are generated by using the __________ button of the Formula Auditing group.
Trace dependents
__________ is the data set used to build the candidate models.
Training set
Larger values of a have the disadvantage of increasing the probability of making a
Type 1 error
When the expected value of the point estimator is equal to the population parameter it estimates, it is said to be
Unbiased
A characteristic or quantity of interest that can take on different values
Variable
The goal regarding using an appropriate number of bins is to show the
Variation in the data
Spreadsheet models are referred to as what-if models because they
allow easy instantaneous recalculation for a change in model inputs.
The population parameters that describe the y-intercept and slope of the line relating y and x, respectively, are
b0 & b1
Single linkage is a measure of calculating dissimilarity between clusters by
considering only the two most similar observations in the two clusters.
Optimization models can be used to
decide on how to invest cash received from insurance policies.
The variance is based on the
deviation about the mean
A test set is the data set used to
estimate performance of the final model on unseen data.
Bar charts use
horizontal bars to display the magnitude of the quantitative variable.
Tactical decisions are concerned with
how the organization should achieve the goals and objectives set by its strategy
A disadvantage of stacked - column charts and stacked- bar charts is that
it can be difficult to perceive small differences in areas.
Which of the following is a commonly used supervised learning method.
k-nearest neighbors
The endpoint of a k-means clustering algorithm occurs when
no further changes are observed in cluster structure and number.
Autoregressive models
occur whenever all the independent variables are previous values of the time series.
Advanced analytics generally refers to
predictive and prescriptive analytics
With reference to time series data patterns, a cyclical pattern is the component of the time series that
shows a periodic pattern lasting more than one year.
The least squares regression line minimizes the sum of the
squared differences between actual and predicted y values.
Trend refers to
the long-run shift or movement in the time series observable over several periods of time.
In a base-case scenario, the output is determined by assuming
the most likely values for the random variables of a model.
Fast food restaurant question
the one with 2/3
Tables should be used instead of charts when
the values being displayed have different units or very different magnitudes
Problems with infeasible solutions arise in practice because
too many restrictions have been placed on the problem.
A set of values for the random variables is called a(n)
trial