Business Analytics Study Guide

Ace your homework & exams now with Quizwiz!

A one-way data table summarizes: a. a single input's impact on the output of interest. b. multiple input's impact on a single output of interest. c. values of the input cells that will cause the single output value to equal zero. d. values of cells when not all of the model is observable on the screen.

A

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____. a. dendrogram b. scatter chart c. decision tree d. box-plot

A

A _____ refers to the number of times that a collection of items occur together in a transaction data set. a. test set b. validation count c. support count d. training set

C

Single linkage measures dissimilarity between two clusters by considering: a. the two most distant observations in these clusters. b. the average dissimilarity over all pairs of observations between these clusters. c. only the two closest observations in these clusters. d. the distance between the cluster centroids

C

Test set is the data set used to: a. build the data mining model. b. estimate accuracy of candidate models on unseen data. c. estimate accuracy of final model on unseen data. d. show counts of actual versus predicted class values.

C

The procedure of using sample data to find the estimated regression equation is better known as _____. a. point estimation b. interval estimation c. the least squares method d. extrapolation

C

. The endpoint of a k-means clustering algorithm occurs when: a. Euclidean distance between clusters is minimum. b. Euclidean distance between observations in a cluster is maximum. c. no further changes are observed in cluster structure and number. d. all of the observations are encompassed within a single large cluster with mean k.

C

A variable that takes on the values of 0 or 1 and is used to incorporate the effect of categorical variables in a regression model is called a. an interaction b. a constant variable c. a dummy variable d. None of these alternatives is correct.

C

An estimated regression equation has the form: 𝑦̂ = 7 + 2𝑥1 + 9𝑥2. As x1 increases by 1 unit (holding x2 constant), then y is expected to a. increase by 9 units b. decrease by 9 units c. increase by 2 units d. decrease by 2 units

C

In multiple regression analysis, the correlation among the independent variables is termed a. homoscedasticity b. linearity c. multicollinearity d. adjusted coefficient of determination

C

In regression analysis, 𝑦̂ = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + ⋯ + 𝑏𝑞𝑥𝑞 developed from sample data is known as a. covariance equation b. correlation equation c. estimated regression equation d. sample equation

C

In the theory of association rules in data mining, by confidence we mean an estimated probability that a. the antecedent and consequent occur b. the antecedent occurs given that the consequent occurs c. the consequent occurs given that the antecedent occurs d. the antecedent or the consequent occur

C

One measure of the accuracy of a forecasting model is a. the smoothing constant b. a deseasonalized time series c. the mean square error d. the seasonal index

C

Prior probabilities are the probabilities of the states of nature that are estimated a. after obtaining sample information b. before obtaining perfect information c. before obtaining sample information d. after obtaining perfect information

C

The simplest measure of similarity between observations consisting solely of categorical variables is given by _____. a. the Euclidean distance b. the standardized Euclidean distance c. matching coefficient d. Jaccard's coefficient

C

The time series pattern which reflects a multi-year pattern of being above and below the trend line is a. a trend b. seasonal c. cyclical d. irregular

C

Which of the following is true of bottom-up hierarchical clustering? a. All observations are put in a mega-cluster to begin with. b. Each of the large clusters is broken down iteratively. c. It starts with each observation in its own cluster and then iteratively combine two most similar clusters d. At the end of the process, observations in the same cluster have maximum distance.

C

Which of the following reasons is responsible for the increase in the use of data-mining techniques in business? a. The lack of methods to electronically track data b. The dearth of information to analyze and interpret c. The ability to electronically warehouse data d. The ability to manually analyze all data

C

With reference to a what-if model, an uncontrollable model input is known as a(n) _____. a. decision variable b. dummy variable c. parameter d. outlier

C

_____ is the process of estimating the value of a categorical outcome variable. a. Sampling b. Prediction c. Classification d. Validation

C

A method that uses a weighted average of all past values is known as a. a smoothing average b. a moving average c. an exponential average d. an exponential smoothing

D

_____ measures dissimilarity between two clusters by considering only the two most distant observations in these clusters. a. Single linkage b. Complete linkage c. Centroid linkage d. Average group linkage

B

A(n) _____ refers to a model input that the decision maker can control in a what-if model. a. decision variable b. outlier c. parameter d. dummy variable

A

In a regression model, which of the following tests is used in order to determine whether an individual independent variable is significant? a. t test b. z test c. Either z test or F test can be used d. chi-square test

A

In regression analysis, the variable that is being predicted is the a. dependent variable b. independent variable c. intervening variable d. is usually x

A

In the k-nearest neighbor method, when the value of k is set to 1, a. the classification or prediction of a new observation is based solely on the single most similar observation from the training set. b. the new observation's class is naïvely assigned to the most common class in the training set. c. the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set. d. the classification or prediction of a new observation is subject to the smallest possible classification error

A

Nodes of a decision tree indicating points where a decision maker chooses a decision alternative are known as a. decision nodes b. chance nodes c. marginal nodes d. conditional nodes

A

The arguments supplied to the IF function are: a. the condition for execution, the result if condition is true, and the result if condition is false. b. the range of cells and the condition for execution. c. the array1 of data cells, the array2 of data cells, and the condition for execution. d. the condition for execution only.

A

The time series pattern that reflects gradual increase or decrease in values over a long time period is called a. a trend b. seasonal c. cyclical d. irregular

A

___ is a category of data-mining techniques that detect patterns and relationships in the data a. Descriptive data-mining b. Predictive data-mining c. Machine Learning d. Artificial intelligence.

A

A graphic presentation of the expected gain from the various options open to the decision maker is called a. a payoff table b. a decision tree c. the expected opportunity loss d. the expected value of perfect information

B

A group of observations measured at successive time intervals is known as a. a random variable b. a time series c. a forecast d. a cross-sectional data

B

A monthly time series has a seasonal pattern. If a linear regression model is used, how many dummy variables must be used to represent this seasonality? a. 10 b. 11 c. 12 d. 13

B

A tabular representation of gains or losses for a decision problem is called a a. decision tree b. payoff table c. sequential matrix d. probability table

B

A(n) _____ is a visual representation that shows which entities influence others in a model. a. decision tree diagram b. influence diagram c. entity chart d. time series plot

B

An analysis of items frequently co-occurring in transactions (such as purchases) is known as _____. a. market segmentation b. market basket analysis c. regression analysis d. cluster analysis

B

An observation is classified as Class 1 if a. the predicted probability of this observation to be in Class 1 is less than the cutoff value b. the predicted probability of this observation to be in Class 1 is greater than or equal to the cutoff value c. the allowable probability of making Class 1 error is less than the test p-value d. the allowable probability of making Class 1 error is greater than or equal to the test p-value.

B

Average group linkage measures dissimilarity between two clusters by considering: a. only the two most dissimilar observations in these clusters. b. the average distance over all pairs of observations between these clusters. c. only the two closest observations in these clusters. d. the distance between the cluster centroids.

B

For a quarterly time series over the last 4 years, the following linear trend expression was estimated: 𝑦̂𝑡 = 120 + 2t. The forecast for the second quarter of Year 5 is a. 124 b. 156 c. 154 d. 160

B

If data for a time series analysis is collected on an annual basis only, which pattern may be ignored? a. trend b. seasonal c. cyclical d. irregular

B

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations? a. the average of the sum of both legs b. the hypotenuse c. the small leg d. the long leg

B

If the coefficient of determination R 2 is 0.95, then a. 95% of correlation between independent variables is explained b. 95% of observed variations in y is explained by the estimated regression equation c. one can be 95% confident that the predicted y is within 2s of the regression mean d. z = 1.96 is used to find the 95% confidence interval for a population mean

B

In a regression model involving more than one independent variable, which of the following tests is used to examine the overall significance of the model? a. t test b. F test c. Either a t test or a chi-square test can be used d. z test

B

In classification, which of the following would be considered as a categorical variable for a credit approval decision for a requester? a. marital status of the requester b. reject or accept credit approval c. income of the requester d. gender of the requester

B

In regression analysis, if the coefficient of determination R 2 = 1, then a. SSE must also be equal to one b. SSE must be equal to zero c. SSE can be any positive value d. SSE must be negative

B

In regression analysis, independent variables are a. used to predict other independent variables b. used to predict the dependent variable c. called the intervening variables d. the variable that is being predicted

B

Jaccard's coefficient is different from the matching coefficient in that the former: a. measures overlap while the latter measures dissimilarity. b. does not count matching zero entries while the latter does. c. deals with categorical variable while the latter deals with continuous variables. d. is affected by the scale used to measure variables while the latter is not.

B

Nodes of a decision tree indicating points where the outcomes do not depend on a decision maker are known as a. decision nodes b. chance nodes c. marginal nodes d. conditional nodes

B

Observation refers to: a. estimated continuous outcome variable b. set of recorded values of variables associated with a single entity c. goal of predicting a categorical outcome based on a set of variables d. mean of all variable values associated with one particular entity

B

One of the measures of the accuracy (goodness of fit) of the regression model is called the coefficient of a. regression b. determination c. statistical variability d. statistical importance

B

Spreadsheet models are referred to as what-if models because they a. are mathematical and logic-based models. b. allow easy instantaneous recalculation for a change in model inputs. c. come preloaded on computers. d. have specialized functions to perform detailed analysis.

B

The _____ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results. a. SUM b. SUMPRODUCT c. SUMIF d. IF

B

The data-mining method that can be used in market segmentation to divide consumers into different homogeneous groups is _____. a. data visualization b. cluster analysis c. market analysis d. supervised learning

B

The k-means clustering is the process of a. agglomerating observations into a series of nested groups based on a measure of similarity. b. organizing observations into one of k groups based on a measure of similarity. c. reducing the number of variables to consider in a data-mining approach. d. estimating the value of a continuous outcome variable.

B

The modeling process begins with the framing of the _____ that shows the relationships between the various parts of the problem being modeled. a. mathematical model b. conceptual model c. circular model d. correlation model

B

The probabilities of states of nature after revising the prior probabilities based on given sample information are called a. the expected probabilities b. the posterior probabilities c. the prior probabilities d. the unconditional probabilities

B

The time series pattern that reflects variability during a single year is called a. a trend b. seasonal c. cyclical d. irregular

B

What would be the coefficient of determination R 2 if the total sum of squares (SST) is 23.29 and the sum of squares due to error (SSE) is 13.26? a. 2.32 b. 0.43 c. 13.26 d. 0.89

B

Which of the following statements is the objective of the moving averages and exponential smoothing methods? a. To characterize the variable fluctuations by a smooth curve b. To smooth out random fluctuations in the time series c. To characterize the variable fluctuations by an exponential equation d. To transform a nonstationary time series into a stationary series

B

_____ is a generalization of linear regression for predicting an outcome of a binary variable. a. Multiple linear regression b. Logistic regression c. The k-nearest neighbors' method d. Cluster analysis

B

A regression analysis between sales (y in $1000) and advertising (x in $100) resulted in the following equation 𝑦̂ = 30 + 4𝑥. The above equation implies that an a. increase of $4 in advertising leads to an increase of $4,000 in predicted sales b. increase of $1 in advertising leads to an increase of $4 in predicted sales c. increase of $100 in advertising leads to an increase of $34,000 in predicted sales d. increase of $100 in advertising leads to an increase of $4,000 in predicted sales

D

In _____ decision making companies have to decide whether they should manufacture a product or outsource its production to another firm. a. goal seek b. two-way c. voting-based d. make-versus-buy

D

In multiple regression analysis, a. there can be any number of dependent variables but only one independent variable b. there must be only one independent variable c. the coefficient of determination must be larger than 1 d. there can be several independent variables, but only one dependent variable

D

In regression analysis, 𝑦 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ + 𝛽𝑞𝑥𝑞 + 𝜀 is called a. covariance equation b. correlation equation c. estimated regression equation d. linear regression model

D

In the regression model 𝑦 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ + 𝛽𝑞𝑥𝑞 + 𝜀, the standard deviation of the error term 𝜖 is estimated by a. the mean square error (MSE) b. the adjusted R 2 c. the coefficient of determination R 2 d. the standard error of the estimate s

D

Regression analysis was applied between demand for a product (y) and the price of the product (x) that may vary between 1 and 5, and the following estimated regression equation was obtained: 𝑦̂ = 120 − 10𝑥. Based on the above equation, if price is 2 units, the predicted demand a. increases by 120 units b. decreases 100 units c. is 120 units d. is 100 units

D

The effectiveness of a classification method can be judged by computing the misclassification errors and summarizing them in a a. pivot table b. payoff table c. dendrogram d. confusion matrix

D

The impact of two inputs on the output of interest can be examined by a _____. a. Goal Seek b. Watch Window c. one-way data table d. two-way data table

D

The lift ratio of an association rule with a confidence value of 0.88 and in which the consequent occurs in 60 out of 100 cases is: a. 1.30 b. 0.54 c. 1.00 d. 1.47

D

The pattern of a time series in business forecasting that is most difficult to predict is a. trend and seasonal pattern b. seasonal pattern c. trend pattern d. cyclical pattern

D

The purpose of regression analysis is to a. verify a statistical hypothesis concerning the unknown population parameter b. check the correlation between the mean and the variance c. prove that the mean depends on the standard deviation d. identify the relationship between a dependent variable and one or more independent variables

D

The standard error of the estimate s is the a. square root of the sum of squares due to regression (SSR) b. square root of sum of squares due to errors (SSE) c. square root of total sum of squares (SST) d. square root of mean square error (MSE)

D

The uncontrollable future events that affect the outcome of a decision are known as a. alternatives b. decision outcomes c. payoffs d. states of nature

D

Which of the following approaches is a good way to proceed with the influence diagram building for a problem? a. The influence diagram for the entire problem is build first and then separate portions are clustered to form separate models. b. The influence diagram for all the model parts at the same level are built in parallel to reduce the likelihood of error. c. The influence diagram is reverse engineered -the diagram is developed in the opposite direction starting with the model output. d. The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled.

D

Within a given range of cells, the number of times a particular condition is satisfied is computed by using the _____ function. a. SUMIF b. IF c. VLOOKUP d. COUNTIF

D

_____ measures dissimilarity between two clusters by using the distance between the two cluster centroids. a. Single linkage b. Complete linkage c. Group average linkage d. Centroid linkage

D

______ is the vector of the averages computed for each variable across all cluster observations. a. Euclidean distance b. Matching coefficient c. Jaccard's coefficient d. Centroid

D


Related study sets

2nd Grade: How Old Are You/ What Grade Are You In?

View Set

Mega CNS Practice Test Questions 1-500

View Set

MGT 489 | Final Exam Study Guide

View Set

Bio 101/ Natural Selection-Bugs that Resist Drugs {ch. 14}

View Set

Final Exam Fitness Assessment and Design

View Set

Tidwell Chapter 14 Sections 3 & 5 Test

View Set