QM323 Midterm
Problems in Age of Analytics video
- there aren't enough individuals fluent in data science. - managers are not as knowledgeable about analytics, so they are unable to move past smaller scaled experiments
Parameters
-are numbers in an equation such as your objective equation -Parameters do not change when you change the choice variables
optimization models
-to make the best choice -businesses build models to help simulate what would happen as a result of each potential choice
Steps in k-Means clustering
1. Randomly partition the observations into k clusters 2. Normalize the data (i.e convert into z-scores) 3. Sort the data by cluster 4. Calculate the cluster centroids - Centroids = average values of a (normalized) variable across observations in the cluster 5. For each observation, calculate distance to each cluster centroid 6. Reassign observations to the cluster with the closest centroid 7. Repeat until there is no change in the clusters (or a specified maximum number of iterations is reached)
Decision making framework
1. find objective 2. analyze data 3. identify risks 4. communicate findings
You have a data set which includes a person's age, weight and income. You conduct a cluster analysis on all the variables that have been normalized and create two clusters. Which of these describes the centroid of Cluster 1? • 1 value - the average of the normalized age, weight, and income in cluster 1 • 1 value - the standard deviation of normalized age, weight, and income in cluster 1 • 3 values - the average of each of the normalized age, weight, and income in cluster 1 • 3 values - the standard deviation each of the normalized age, weight, and income in cluster 1 • 6 values - the average and standard deviations for each of the normalized age, weight, and income in cluster 1
3 values - the average of each of the normalized age, weight, and income incluster 1
Standardize excel function
=STANDARDIZE(X,Mean,StandardDeviation)
Absolute Reference
A cell reference that does not change when a formula is copied to a new location.
What is an influence chart?
A diagram that shows what outcomes will be generated, and which inputs influence what Ex. Revenue and costs affecting profits
Conditional formatting
An Excel feature that enables you to specify how cells that meet one or more given conditions should be displayed.
Cluster centroid
Average value of the objects contained in the cluster on all the variables in the cluster variate.
COUNTIFS
Counts the number of cells specified by a given set of conditions or criteria
COUNTIF
Counts the number of cells within a range that meet the given condition
Descriptive analytics
Defines an organization's past data
If we take the average of all the individuals estimated partworths, we always get an accurate and complete representation of the group's preferences for each attribute True False
False
True or false? When clustering, it is essential to calculate z-scores of all variables in Excel BEFORE running XLMiner to ensure results are not biased by differences in variable scale • True • False • Not sure
False
Which of these is true of k-means clustering? • It allows you to use both numerical and categorical variables for clustering • It tells you how many clusters to use • It tells you which clusters should be targeted • We can only have 2, 3 or 4 clusters It doesn't tell you which clusters are the most valuable
It doesn't tell you which clusters are the most valuable
The lift ratio of an association rule with a confidence value of 0.43 and in which the consequent occurs in 6 out of 10 total transactions has a lift ratio of 0.072 • 0.258 • 0.600 • 0.717 • 1.600
Lift = 0.43/(6/10) = 0.717 or 0.72
Break-even analysis
NPV Zero
Steps in building a good spreadsheet
Plan the structure Build the spreadsheet Review and test (and, if necessary, re-do) Communicate findings (display results in an easy to understand manner)
Steps in the decision making process
Problem recognition Information search Evaluation of alternatives Product choice
Importance
Range / (sum of ranges across all attributes) PERCENTAGE
The following elements of an influence chart can be preceded by other elements: a) Intermediate Calculations and Output b) Intermediate Calculations and Inputs c) Decision Variables and Inputs d) Decision Variables and Output e) Intermediate Calculations and Decision Variables
Recall that Inputs and Decision Variables are at the beginning of theInfluence Chart and the Objective (Output) is at the end. Intermediate Calculations are in the middle. Nothing can precede Inputs and Decision Variables. So, in the options below, only the first one is valid
object for optimization
The objective is what we are trying to maximize or minimize
Which of these is the MAIN reason to use comments for a cell: To make the spreadsheet more engaging • To document data and assumptions • To understand which cells may be referencing that cell • To link back to a different sheet
To document data and assumptions
Okcupid Article
Used k-means clustering to find women with similar answers to questions to be able to match to women of different interests and personalities
Facebook dark ads
When does ad targeting become discrimination? - Article discusses how groups were excluded from seeing particular ads for jobs or listings, depending on gender, race, etc.
You have a data set with age, weight, and income. You run cluster analysis to create 3 distinct clusters. When you plot income and weight by cluster, you find that many data points are overlapping and not clustered cleanly. Which of these is likely true: • You did not run enough iterations to identify 3 distinct clusters • The data requires you to run a different number of clusters • The inter-cluster distances must be too high • You don't necessarily expect to see clear clustering on 2 variables • Income and weight are not correlated
You don't necessarily expect to see clear clustering on 2 variables
Sensitivity Analysis
a capital budgeting tool that determines how the Npv varies as a single underlying assumption is changed
market basket analysis
a statistical technique that reveals customer behavior patterns as they purchase multiple items
In an Excel workbook, a number in a cell is calculated using 8 other cells. Something is not correct with this calculated number, and I want to see which of the 8 cells may be causing the problem. The best way that will help identify the wrong cell is: a) Linking between sheets b) Put a comment on each cell c) Trace precedents d) Trace dependents e) Start all over again
c) Trace precedents
Lift Ratio
confidence / (support of con / # of transactions)
Formula to decide who to target
cost per mailing / profit per customer
You have stored Days Per Month (value 20) worked in cell B13 and Hours per Day (value 8) in cell C13 and Hourly Wage (value $15) in cell D13. Which formula would be the BEST to calculate the total labor expense for one employee a) =20*8*15 b) =20*8*D13 c) =20*C13*D13 d) =B13*C13*D13 e) None of the above
d) =B13*C13*D13
=RANDBETWEEN(X,Y)
generates a random number equally likely to be anywhere between X and Y
Prescriptive analytics
help us identify the best course of action given a set of alternatives
Range
max partworth - min partworth
Euclidean distance
measures dissimilarity between observations on continuous variables common to normalize data into z score
A decision variable in an influence chart is a _________
oval
An objective in an influence chart is a ________
rectangle
An intermediate variable/quantity is a ________
rounded rectangle
Cluster analysis is used to
segment observations into similar groups, making data more manageable and identify outliers
Euclidean distance formula
sqrt ( (x2-x1)^2+ (y2-y1)^2 )
Confidence
support of ant + cont / support of ant
Conjoint Analysis
technique used to develop an understanding of the attributes that guide consumer preferences by having consumers compare product preferences across varying levels of evaluative criteria and expected utility
ABC analysis
the ranking of all items of inventory according to importance
predicitive analytics
uses models constructed from past data to make predictions ex) using survey data to help predict market share of new product
The technique "Association Rules" is also called (Select ALL that apply): Go to Market Analysis • Affinity Analysis • Basket of Goods Market • Market Basket Analysis • Building Lift
• Affinity Analysis • Market Basket Analysis
What is the purpose of using an error trap in your Excel workbook? • To identify a cell where you may have entered a wrong number • To identify a cell that contains an implausible number • To draw attention to a cell whose value may be of concern • All of the above
• All of the above
in a workbook, when you identify and isolate input parameters in a single location, the advantage(s) is/are (select all that apply): • Makes it easier to document assumptions behind parameters if they are in a single location • Helps avoid mistakes • Makes it easy to change the spreadsheet for sensitivity analysis • All of the above
• All of the above
Which statement is true about the K-Means algorithm? • All variables must be categorical • The output attribute must be categorical • All variables must be numeric and continuous • Variables may be either categorical or numeric • None of the above
• All variables must be numeric and continuous
Which of these is required to get meaningful regression coefficients in a conjoint analysis? Survey more than 30-40 people • Remove one level from each attribute in the regression data • Provide the same number of levels for each attribute • Include all possible product profiles in the survey • None of the above
• Remove one level from each attribute in the regression data
Which of these is the most accurate description of a cluster centroid? • The average value across all variables and clusters • The central observation in the data set • The central observation in each cluster • The average for each variable for each cluster • The median value of each variable across all clusters
• The average for each variable for each cluster
What is the purpose of an influence chart? To help define our objective • To calculate the numbers in our problem • To help analyze our model • To help design the spreadsheets / workbook
• To help design the spreadsheets / workbook
Which of these best describes the purpose of a sensitivity analysis? • To understand which variables are likely to change over time • To understand the impact changing one variable will have on the outcome of interest • To calculate the values of each variable that will maximize the outcome of interest • To determine how many units will be sold based on segment size • All of the above
• To understand the impact changing one variable will have on the outcome of interest
The K-Means algorithm terminates when... • a user-defined minimum value for the summation of squared error differences between instances and their corresponding cluster center is seen • the number of instances in each cluster for the current iteration is identical to the number of instances in each cluster of the previous iteration • there are three clusters • the number of clusters formed for the current iteration is identical to the number of clusters formed in the previous iteration • the cluster centers for the current iteration are identical to the cluster centers for the previous iteration
• the cluster centers for the current iteration are identical to the cluster centers for the previous iteration