QBA Final Exam Study Guide

Ace your homework & exams now with Quizwiz!

In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump for President. Determine a 95% confidence interval for the proportion of all the registered voters who will vote for Trump.

(0.25,0.34)

The correlation coefficient will always take values

-1 to +1

Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.)

-1.33

The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?

0.35

The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus?

30%

Use technology to compute the standard deviation for the following sample data.

6.75

The distribution of hourly sales for a local family owned store is normally distributed with a mean of $225 per hour and a standard deviation of $75 per hour. Which of the following intervals contains the middle 95% of hourly sales?

75-375

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

Compute the IQR for the following data.

9.50

One minus the overall error rate is often referred to as the __________ of the model.

Accuracy

The SUM function in Excel

Adds up all the numbers in a range of cells

The population parameters that describe the y-intercept and slope of the line relating y & x, are

B0 & B1

A chart that is recommended as an alternative to a pie chart is a

Bar chart

A better understanding of consumer behavior through analytics leads to

Better pricing strategies

A __________ refers to a constraint that can be expressed as an equality at the optimal solution.

Binding constraint

Which of the following graphs provides information on outliers and IQR of a data set

Box plot

Within a given range of cells, the number of times a particular condition is satisfied is computed by using the __________ function.

COUNTIF

__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters.

Centroid Linkage

A_____classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules

Classification tree

The modeling process begins with the framing of a _______model that shows the relationships between the various parts of the problem being modeled

Conceptual

An initial estimate of the probabilities of events is a __________ probability.

Conditional

The __________ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.

Confidence level

An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a

Continuous random variable

The data dashboard for a marketing manager may have KPIs related to

Current sales measures and sales by region

__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.

Data Partitioning

When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses __________ to develop an optimal strategy.

Decision analysis

A _______refers to a model input that can be controlled in a spreadsheet model

Decision variable

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

Dendrogram

A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

Durability

Prediction of the value of the dependent variable outside the experimental region is called

Extrapolation

The __________ function in Excel is used to compute the statistics required to create a histogram.

Frequency

The_______the lift ratio, the ___________the association rule

Higher; stronger

Consider a clustered bar chart of the dashboard developed to monitor the performance of a call center. This chart allows the IT manager to

Identify the frequency of a particular type of problem by location

______Is a measure of the heterogeneity of observations in a classification tree

Impurity

A one-tailed test is a hypothesis test in which the rejection region is

In one tail of the sampling distribution

In a linear regression model, the variables used for predicting or explaining values of the response variable are known as the _____

Independent variable

A(n) __________ is a visual representation that shows which entities affect others in a model.

Influence diagram

__________ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.

Interaction

In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as

Key performance indicators

The value of an independent variable from the prior period is referred to as a

Lagged Variable

The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

Lift

_________ attempts to classify a categorical outcome as a linear function of explanatory variables.

Logistic regression

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

Matching coefficient

Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another

Mean Forecast Error

In a normal distribution, which is greater, the mean or the median?

Neither the mean or the median (they are equal)

Euclidean distance can be used to measure the distance between __________ in cluster analysis.

Observations

______refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

Overfitting

Which one of the following statements is not true concerning PivotTables in Excel?

PivotTables can be built using data arrayed in rows.

A simple random sample of 31 observations was taken from a large population. The sample mean equals 5. Five is a

Point estimate

The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour.

Poisson

_________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another.

Predictive

A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of

Predictive Analytics

In the spectrum of business analytics, which is the most complex?

Prescriptive

Which of the following analytical techniques helps us arrive at the best decision?

Prescriptive

An initial estimate of the probabilities of events is a ______probability

Prior

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio

Reduces the data-ink ratio

__________ is a statistical procedure used to develop an equation showing how two variables are related.

Regression analysis

The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for

Regression trees

Which of the following gives the proportion of items in each bin?

Relative frequency

Determine whether the alternative hypothesis is left-tailed, right-tailed, or two-tailed: H0: μ = 11, Ha: μ > 11.

Right tailed

The value of the ___________ is used to estimate the value of the population parameter.

Sample statistic

________ are used in the pharmaceutical industry to assess the risk of introducing a new drug

Simulations

B1 represents

Slope of the true regression line

Using a large value for order k in the moving averages method is effective in

Smoothing out random fluctuations

When working with large spreadsheets with many rows of data, it can be helpful to _______the data to better find, view, or manage subsets of data.

Sort and filter

A line chart that has no axes but is used to provide information on overall trends for time series data is called a

Sparkline

The graph of the simple linear regression equation is a(n)

Straight line

__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.

Supervised Learning

The VLOOKUP with range set to ______takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument

TRUE

A________decision is concerned with how the organization should achieve the goals and objectives set by its strategy

Tactical

The process of extracting useful information from text data is known as

Text mining

An exponential trend pattern occurs when

The amount of increase between periods in the value of the variable is constant

Which of the following statements is correct?

The binomial distribution is a discrete probability distribution and the normal distribution is a continuous probability distribution.

A procedure for using sample data to find the estimated regression equation is

The least squares method

Which of the following is a discrete random variable?

The number of times a student guesses the answers to questions on a certain test

Which of the following is not true of a stationary time series

The time series plot is a straight line

If covariance between two variables is near 0, it implies that

The variables are not linearly related

All of the following are examples of discrete random variables except

Time

A light bulb manufacturer uses descriptive analytics

To present supply chain to managers visually

Which of the following statements is the objective of the moving averages and exponential smoothing methods?

To smooth out random fluctuations in the time series

In the text mining process, the text is first preprocessed be deriving a smaller set of ________ from the larger set of words contained in a collection of document.

Tokens

Arrows pointing from the selected cell to cells that depend on the selected cell are generated by using the __________ button of the Formula Auditing group.

Trace dependents

__________ is the data set used to build the candidate models.

Training set

Larger values of a have the disadvantage of increasing the probability of making a

Type 1 error

When the expected value of the point estimator is equal to the population parameter it estimates, it is said to be

Unbiased

A characteristic or quantity of interest that can take on different values

Variable

The goal regarding using an appropriate number of bins is to show the

Variation in the data

Spreadsheet models are referred to as what-if models because they

allow easy instantaneous recalculation for a change in model inputs.

The population parameters that describe the y-intercept and slope of the line relating y and x, respectively, are

b0 & b1

Single linkage is a measure of calculating dissimilarity between clusters by

considering only the two most similar observations in the two clusters.

Optimization models can be used to

decide on how to invest cash received from insurance policies.

The variance is based on the

deviation about the mean

A test set is the data set used to

estimate performance of the final model on unseen data.

Bar charts use

horizontal bars to display the magnitude of the quantitative variable.

Tactical decisions are concerned with

how the organization should achieve the goals and objectives set by its strategy

A disadvantage of stacked - column charts and stacked- bar charts is that

it can be difficult to perceive small differences in areas.

Which of the following is a commonly used supervised learning method.

k-nearest neighbors

The endpoint of a k-means clustering algorithm occurs when

no further changes are observed in cluster structure and number.

Autoregressive models

occur whenever all the independent variables are previous values of the time series.

Advanced analytics generally refers to

predictive and prescriptive analytics

With reference to time series data patterns, a cyclical pattern is the component of the time series that

shows a periodic pattern lasting more than one year.

The least squares regression line minimizes the sum of the

squared differences between actual and predicted y values.

Trend refers to

the long-run shift or movement in the time series observable over several periods of time.

In a base-case scenario, the output is determined by assuming

the most likely values for the random variables of a model.

Fast food restaurant question

the one with 2/3

Tables should be used instead of charts when

the values being displayed have different units or very different magnitudes

Problems with infeasible solutions arise in practice because

too many restrictions have been placed on the problem.

A set of values for the random variables is called a(n)

trial


Related study sets

Muscles of the Neck and Vertebral Column: Head Movements and Trunk Extension

View Set

Chapter 8 Review Cell & Molecular

View Set

NRS 326 Exam 3 Fluid and Electrolytes

View Set

Chapter 3-C: Personal Auto Insurance Basics

View Set