Risk analysis Smart Book 3

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

(Blank) analysis is the analysis of the degree of association between products that might be purchased together in order to better identify "what goes with what."

Affinity

The naïve (Blank) algorithm focuses mainly on classification and then ends with an innovative prediction step that expresses the prediction as a probability.

Bayes

A numerical predictor variable can be used with naïve Bayes if it is first converted into a categorical variable by (Blank)

Binning

Which of the following statements about "binning" is accurate?

Binning reduces the noise in the data. Bins must be consecutive.

The values in the Gini Index range from (Blank 1) to (Blank 2), where a value of (Blank 3) indicates that the region is completely homogeneous, whereas a value of (Blank 4) indicates that the region is completely lacking in homogeneity. (Enter numerical values only.)

Blank 1: 0 Blank 2: 0.5 or .5 Blank 3: 0 Blank 4: 0.5 or .5

(Blank 1) (Blank 2) is a computer functionality that can be used to do calculations for algorithms when the problems are of significant size.

Blank 1: Analytic or Analytics Blank 2: Solver

The matching set is the set of historical records whose (Blank 1) values for the predictor variables perfectly match the corresponding (Blank 2) values for the predictor record is called the matching set.

Blank 1: categorical Blank 2: categorical

The analysis of which products should be promoted and/or packaged together is called (Blank 1) (Blank 2) analysis.

Blank 1: market Blank 2: basket

The prediction model is a model for predicting the predictor record's (Blank 1) (Blank 2).

Blank 1: numerical Blank 2: outcome

Complete Bayes often breaks down unless there are a (small/large) number of predictor variables and a (small/large) number of categories for each predictor variable.

Blank 1: small Blank 2: small

A (Blank 1)-or-(Blank 2) outcome is a situation where only two outcomes are possible.

Blank 1: yes Blank 2: no

An even number for k is preferred with yes-or-no outcomes, so that there will be no ties when making the prediction.

False

Slimming a decision tree is pruning is done by removing some of the branches that were added at the bottom of the tree.

False

The best way to preserve the environment is to become a vegan.

False

In the matching set, determine which outcome is an outlier and then predict that the new predictor record also is likely to have this same outcome.

False In the matching set, determine which outcome is most prevalent. We then predict that the new predictor record also is likely to have this same outcome.

When the calculations for algorithms are fairly complex, it is recommended to use the Algorithm Solution functionality on a spreadsheet.

False the functionality is Analytic Solver.

The (Blank) Index measures the lack of homogeneity of a region.

Gini

Which of the following is not one of the ways that the regression tree algorithm diverges from the classification trees algorithm?

How data are collected?

The regression tree algorithm diverges from the classification tree algorithm in just two ways. Which of the following is one of those ways?

How homogeneity is measured?

Which of the following is the correct formula to standardize a value?

(Original value - Mean)/Standard deviation

Which of the following is the correct formula for a normalized value?

(Original value - Minimum)/(Maximum - Minimum)

The challenges and issues associated with classification tree and regression tree algorithms include all but which of the following?

They are not suitable for the analysis of large amounts of data.

If not rescaled, given to variables with different scales, one variable is likely to dominate the other when plotted.

True

A regression tree is used when:

dealing with numerical outcomes.

By using the validation partition to test the accuracy of a model on new data for different values of k, the performance of the algorithm can be improved by:

determining the best value for k.

To (Blank) the data, one should rescale all data so that the values range from 0 to 1.

normalize

A numerical outcome is a situation where the outcome can have any numerical value:

over some range.

A danger with continuing to split data after obtaining an excellent set of regions is that you may be (Blank) the data.

overfitting , over fitting, or over-fitting

The stopping rule is a rule for terminating an algorithm before it starts (blanks) the data.

overfitting, over fitting, or over-fitting

The minimum error tree may be (Blank) to the validation data, so the additional pruning of the best pruned tree may actually improve the performance of the new data.

overfitting, over-fitting, or over fitting

When using the classification tree model, to (Blank) the tree, remove some of the branches that were added at the bottom of the tree.

prune

Which of the following is not a way to help preserve the environment?

purchase produce shipped directly from the international counties that specialize in the crops

A regression tree is a decision tree that predicts a ______ variable.

quantitative response

An index that measures the lack of homogeneity of a region when dealing with numerical outcomes is the (Blank) index.

regression

Whenever the predictor variables have different units of measurement, it is essential to (Blank) these variables.

rescale

An example of a response variable in a regression tree is:

salary.

Which of the following are true of market basket analysis? (Check all that apply.)

It can be helpful to guide product placement, weekly promotional offers, and the bundling of products. It can be applied on past customer transactions to recognize customer buying habits.

Which of the following is not an advantage of multiple linear regression?

It generalizes the impact of each predictor variable.

Which of the following is not an example from the text of a company that uses affinity analysis and recommendation systems?

Lays

After calculating the mean of the output variable values of the outcomes for the historical records in that region, the ______ is the sum of the square of the deviations of these output variable values from this mean.

Regression Index

When you interact with Netflix to stream content, you are engaging in which of the following?

affinity analysis and recommendation systems

In the naïve Bayes algorithm, all of the predictor variables are required to be:

categorical.

A (Blank) node represents a split based upon a dividing line for one predictor variable.

circular

When using classification models, each possible outcome is called a(n):

class.

If a researcher is doing research on a subject with two classes of outcomes, which model would be recommended for the researcher?

classification

The (Blank) model is used for predicting the class of the predictor record's outcome.

classification

Which of the following can be used to create simple rules to classify new predictor records?

classification trees and classification tree algorithms

When using a classification model, you want to identify which of the predictor record's possible outcomes is relatively likely to occur. You will be:

classifying the outcome.

When overfitting the data, researchers are making a classification based on such a small amount of data, that they may be:

fitting to the noise (the randomness) in the data while ignoring intuitive information about the more typical outcomes of historical records in the various areas of a scatter plot.

The complete Bayes can only work if there are some (Blank) records in the matching set.

historical

Even if the predictor variables are not (Blank), naïve Bayes can still perform well.

independent

Taken to the extreme, choosing ______ would include every piece of data, so it would simply predict the outcome for the predictor record based on the outcomes for the majority of the overall data set.

k = n

It is typically advantageous to choose k > 1 to:

smooth the results.

A (Blank) node represents a region that has not been split further.

square

To (Blank) the data, one should rescale each type of data so that it measures the number of standard deviations above or below the mean for a standard normal distribution.

standardize

One popular (Blank) rule is to impose a lower bound on the number of historical records in a region, so a splitting would be prohibited if it would create a new region with too few historical records.

stopping

A good classification tree algorithm will make the best split first followed by decision rules that are made up with ______.

successively smaller and smaller numbers of training records.

If you are asked whether you would like coffee with your dessert, this represents which type of outcome?

yes-or-no


संबंधित स्टडी सेट्स

303 Hinkle PrepU Chapter 10: Principles and Practices of Rehabilitation

View Set

Analyzing Financing Activities: FAR340 Exercise 4

View Set

Computer Organization and Architecture Chapter #1

View Set

Sociology of Self in Modern Society EXAM 1

View Set

My Brother Sam is Dead Chapters 1-4 Questions (Including Literature Circle packet, Pro-Americans, and Pro-British Questions) : )

View Set