Qm323

Ace your homework & exams now with Quizwiz!

If we take the average of all the individuals estimated partworths, we always get an accurate and complete representation of the group's preferences for each attribute True False

False

You have a data set which includes a person's age, weight and income. You conduct a cluster analysis on all the variables that have been normalized and create two clusters. Which of these describes the centroid of Cluster 1? 1 value - the average of the normalized age, weight, and income in cluster 1 1 value - the standard deviation of normalized age, weight, and income in cluster 1 3 values - the average of each of the normalized age, weight, and income in cluster 1 3 values - the standard deviation each of the normalized age, weight, and income in cluster 1 6 values - the average and standard deviations for each of the normalized age, weight, and income in cluster 1

3 values - the average of each of the normalized age, weight, and income in cluster 1

Normalization is also called Standardizing

=STANDARDIZE(X,Mean,StandardDeviation)

3. The graph that shows cumulative % of buyers vs customer rank is designed to: a) See how good our model is at predicting customer purchase b) See how many people purchased c) See which people purchased d) Identify the people that are most and least likely to purchase e) Compare our model to random targeting

A

Which of these best describes the "purchdum" variable used in our purchase probability regression? A variable equal to the number of people who purchased A variable that is equal to the probability of purchase A measure of a person's willingness-to-pay A variable indicating whether a person showed strong intent to purchase or not A variable for how much a person would pay for the product

A variable indicating whether a person showed strong intent to purchase or not

In a workbook, when you identify and isolate input parameters in a single location, the advantage(s) is/are (select all that apply): Makes it easier to document assumptions behind parameters if they are in a single location Helps avoid mistakes Makes it easy to change the spreadsheet for sensitivity analysis All of the above

All of the above

Which statement is true about the K-Means algorithm? All variables must be categorical The output attribute must be categorical All variables must be numeric and continuous Variables may be either categorical or numeric None of the above

All variables must be numeric and continuous

Which of these is NOT a step in or part of an ABC analysis? Count how many people would prefer each of the 3 products Determine the market share between the 3 products Calculate the value each individual puts on each of the 3 products Average the ratings for each product across all individuals in the survey Determine 3 unique product profiles based on factors like cost-effectiveness or manufacturing capability

Average the ratings for each product across all individuals in the survey

1. An analysis of items frequently co-occurring in transactions is known as _. a. market segmentation b. market basket analysis c. regression analysis d. cluster analysis

B

1. The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is . a. data visualization b. cluster analysis c. market analysis d. supervised learning

B

1. k-means clustering is the process of: a. agglomerating observations into a series of nested groups based on a measure of similarity. b. organizing observations into one of a number of groups based on a measure of similarity. c. reducing the number of variables to consider in a data-mining approach. d. estimating the value of a continuous outcome variable.

B

1. A refers to the number of times that an item or a collection of items occur together in a transaction data set. a. test set b. validation count c. support d. training set

C

1. In an Excel workbook, a number in a cell is calculated using 8 other cells. Something is not correct with this calculated number, and you want to see which of the 8 cells may be causing the problem. The best way that will help identify the wrong cell is: a) Linking between sheets b) Put a comment on each cell c) Trace precedents d) Trace dependents e) Start all over again

C

1. The endpoint of a k-means clustering algorithm occurs when: a. Euclidean distance between clusters is minimum. b. Euclidean distance between observations in a cluster is maximum. c. no further changes are observed in cluster structure and number. d. all of the observations are encompassed within a single large cluster with mean k.

C

1. Which statement is true about the K-Means algorithm? a) All variables must be categorical b) The output attribute must be categorical c) All variables must be numeric and continuous d) Variables may be either categorical or numeric e) None of the above

C

1. The lift ratio of an association rule with a confidence value of 0.75 and in which the consequent occurs in 5 out of 10 cases is: a. 0.375 b. 0.500 c. 1.000 d. 1.500 e. 2.000

D

1. What is the purpose of using an error trap in your Excel workbook? a) To identify a cell where you may have entered a wrong number b) To identify a cell that contains an implausible number c) To draw attention to a cell whose value may be of concern d) All of the above

D

1. Which of the following is true of Euclidean distance? a. It is used to measure dissimilarity between categorical variable observations. b. It is not affected by the scale on which variables are measured. c. It increases with the increase in similarity between variable values. d. It is susceptible to distortions from outlier measurements.

D

1. True or False: K-means clustering is always conducted only on "k" number of data variables.

F

True or false? When clustering, it is essential to calculate z-scores of all variables in Excel BEFORE running XLMiner to ensure results are not biased by differences in variable scale True False Not sure

False

The following elements of an influence chart can be preceded by other elements: Intermediate Calculations and Output Intermediate Calculations and Inputs Decision Variables and Inputs Decision Variables and Output Intermediate Calculations and Decision Variables

Intermediate Calculations and Output

Which of these is true of k-means clustering? It allows you to use both numerical and categorical variables for clustering It tells you how many clusters to use It tells you which clusters should be targeted We can only have 2, 3 or 4 clusters It doesn't tell you which clusters are the most valuable

It doesn't tell you which clusters are the most valuable

Lift less than 1

Lift of 0.88 means that if a person purchased a Red Sox shirt, they were 12% (1-0.88) less likely (negative association) to buy a Yankees shirt

Which of these is required to get meaningful regression coefficients in a conjoint analysis? Survey more than 30-40 people Remove one level from each attribute in the regression data Provide the same number of levels for each attribute Include all possible product profiles in the survey None of the above

Remove one level from each attribute in the regression data

Out of 1000 people (who came into a store and bought something) 550 bought only Red Sox (RS) shirts 100 bought only Yankees (Y) shirts 300 bought both Yankees (Y) & Red Sox (RS) shirts and 50 did not buy any shirts Exercise: Calculate Lift of "If RS, then Y" (RS Y)

Support of RS & Y combined Confidence = 300 / 850 Lift = 0.353/(400/1000) = 0.353/0.400

Which of these is the most accurate description of a cluster centroid? The average value across all variables and clusters The central observation in the data set The central observation in each cluster The average for each variable for each cluster The median value of each variable across all clusters

The average for each variable for each cluster

Which of these is the MAIN reason to use comments for a cell: To make the spreadsheet more engaging To document data and assumptions To understand which cells may be referencing that cell To link back to a different sheet

To document data and assumptions

What is the purpose of an influence chart? To help define our objective To calculate the numbers in our problem To help analyze our model To help design the spreadsheets / workbook

To help design the spreadsheets / workbook

Which of these best describes the purpose of a sensitivity analysis? To understand which variables are likely to change over time To understand the impact changing one variable will have on the outcome of interest To calculate the values of each variable that will maximize the outcome of interest To determine how many units will be sold based on segment size All of the above

To understand the impact changing one variable will have on the outcome of interest

You have a data set with age, weight and income. You run cluster analysis to create 3 distinct clusters. When you plot income and weight by cluster, you find that many data points are overlapping and not clustered cleanly. Which of these is likely true: You did not run enough iterations to identify 3 distinct clusters The data requires you to run a different number of clusters The inter-cluster distances must be too high You don't necessarily expect to see clear clustering on 2 variables Income and weight are not correlated

You don't necessarily expect to see clear clustering on 2 variables

1. A bank wants to understand better the details of existing customers who are likely to open a new brokerage account. In order to analyze this, they analyze data from a random sample of 200 customers on whether they opened a new account in the past, their average balance, age, family size, and dummies for gender and marital status. Of these customers, 31 had opened new accounts in the past. The bank analyzes the data using the pivot table below which shows the average of opening of a bank account. a) Which group has the highest probability of opening an account? b) What are the confidence and lift associated with targeting this group?

a) answer: Married Females since it has the highest Confidence (0.204) b) Answer: The data above shows that Confidence = 0.204 = 20.4% Lift = Confidence/(31/200) = 0.204/0.155 = 1.317

Question 2: In a spreadsheet, the "trace precedents" command can be usefully applied to which of these elements (Select all that apply): a) Input parameters b) The final objective c) Decision variables d) Intermediate variables e) All of the above

b,d

Question 3: Conjoint Analysis helps identify which of the following: a) Good customers vs. bad customers b) Number of customers in a group c) Attributes that customers prefer d) Market share of the company e) All of the above

c

Question 5: Which of the following is true about Conjoint Analysis (Select all that apply)? a) It can allow us to identify the box that will generate the largest number of sales b) Adding up all of the partworths in the regression will equal the base intercept term c) The value of any product profile is equal to the intercept plus any included attribute level partworths d) Each attribute should include at least 3 levels e) The attribute with the largest range will have the largest importance

c,e

According to the article "Operation Everything," Operations Research (O.R.), of which linear programming is an example, got started where: a. General Electric's research department in the 1950s b. Harvard Business School in the 1980's c. World War II logistics d. The UPS team that developed the ORION algorithm e. None of the above

c. World War II logistics

averageif

cluster centroid

1. The predicted purchase probability calculated from a regression is equal to which of the following values calculated using pivot tables? a) Antecedent b) Consequent c) Support d) Confidence e) Lift Answer: D. The Confidence gives us the probability of consequent, given the antecedent.

d

2. Which of these best describes a residual? a) It must be a number between 0 and 1 b) It must be positive c) It must be negative d) It must have both positive and negative values e) It represents a distance between observed and predicted values

e

Edward Tufte once wrote "the only design worse than a _______ is several of them." Histogram Pie Chart Bar Chart Scatterplot 3-D figure

pie chart

The K-Means algorithm terminates when... a user-defined minimum value for the summation of squared error differences between instances and their corresponding cluster center is seen the number of instances in each cluster for the current iteration is identical to the number of instances in each cluster of the previous iteration there are three clusters the number of clusters formed for the current iteration is identical to the number of clusters formed in the previous iteration the cluster centers for the current iteration are identical to the cluster centers for the previous iteration

the cluster centers for the current iteration are identical to the cluster centers for the previous iteration

Select the row you want to format Conditional Formatting > Top Rules > More Rules > You will get the screen on the left Modify the number to be 3 and change the format to what you wish

to find top 3 value in the row

Ex: You have data set with age, weight and income. You run cluster analysis to create 3 distinct clusters. When you plot income & weight by cluster, you find that many data are overlapping and not clustered cleanly. Which of these is likely true? 答案-------You don't necessarily expect to see clear clustering on 2 variables

只有两个variable很少


Related study sets

ASTRO 7N Unit 1 Part 1: Gravity Lesson

View Set

MGMT 456 Ch. 14 Comp of Special Groups

View Set

Chapter 5 Questions for Review - Demand and Supply

View Set

JoJo's bizarre adventure part 1 (phantom blood)

View Set