marketing research final
SPSS regression
Analyze > Regression > Linear for both SLR and MLR use the coefficients box. b is slope/intercept, Beta is standardized
2 way chi squared test in SPSS
Analyze, Descriptive Stats, Cross Tabs. put vars in rows/cols. does not matter which is which Hit Statistics and check chi squared. hit continue and ok will show descriptive stats and look at pearson chi squared values for residuals, go back. Analyze Descriptive Stats. keep everything the same, then hit Cells and click Standardized and row/col check percentages. this will show more info w residuals and % in each category to report, x^2(df) = val, p <
one way chi squared in SPSS
Analyze, Nonparametric tests, legacy dialogs, chi square click the variable. gives descriptive stats and x^2 values
paired samples t-test in SPSS
Analyze, compare means, paired samples t-test put 2 vars into the same pair and click ok. used paired sampled test significance and use two sided p. talk about averages for t-tests. mention vars, means, t(df) = t-stat, p
example of type i and ii errors
COVID testing. if someone has covid we want to know null hypothesis: person does not have covid. alternate hypothesis: someone has covid if null hyp is true, no covid if null hyp is false, covid if reject test outcome, I say they do have covid if fail to reject test outcome, I say they do not have covid type I error: null hyp is true and reject test outcome, so you say they have covid but they are healthy correct they are sick: is false null hyp, reject test outcome correct they are healthy: true null hyp and fail to reject null hypothesis type ii error: miss. false null hyp, fail to reject null hypothesis. say they are healthy but they actually have covid
motivation for evaluating patterns in statistical analysis
-many decisions are based on data -data analyzed and patterns are arrived (ex. more people buy after seeing an ad, dec launches are failures). but these are not experiments so we can't make causal claims -before acting on a pattern we should ask: is it meaningful or due to chance? is it statistically significant?
SPSS factor analysis for identifying if factor analysis is acceptable
SPSS analyze, data reduction, factor -descriptives click on KMO and Barlett's, anti image matrix -extraction click scree plot output: Kaiser Meyer Olkin is KMO (above 0.6) Barlett's look at sig level anti image look at the diagonal of the correlation matrix (above 0.5 should be included in analysis)
bonferroni correction
ballpark correction for something like that. multiply the baseline probability by the number of times estimate N by the number of other options they could have studied
standardized beta (Bi)
bi * (standard deviation of xi / standard deviation of y) this is used because unstandardized bs change by a unit of increase so a change in unit impacts the value. interpretation: one standard deviation increase in x leads to a Bi standard deviation change in y, holding other variables in the model constant. betas and p values do not change when the unit does, but b will change. a larger magnitude of Bi the better the predictor also note that beta can denote type ii error as well
attribute conjoint
clearly defined feature/characteristic independent variable (factor) should be unambiguous and useful to determine choice/preference # should be kept low (6 on average) use exploratory research to decide has levels to get the attributes right, the best practice is to cover in exploratory research, pilot studies to determine the right attributes to include, empirical range in product category to determine range
difference between cluster and factor analysis
cluster analysis: takes the responses and see which is similar to each other factor analysis: focuses on the variables to see which groups of variables move similarly
output for regression
coefficients for intercept and slope the sig under the x var is the p for the slope null hypothesis is b=0, where there is no change to predict, plug into y=mx+b
chi squared tests
compare distribution across groups with categorical variables one way chi squared: how people distribute across levels one variable two way chi squared: how people distributed across a group of 2 variables null hypothesis is an even distribution alternate hypothesis is uneven distribution examples: customer satisfaction at colorado resort. info on satisfaction (yes/no), demographics, customer intensity (usage frequency of ski). looked at chart of distributions across satisfaction based on demographics like gender. null hyp is no association, men and women have even distribution btwn 2 levels of satisfaction. alternate hypothesis is yes there's an association, men and women have uneven distribution.
simple t-test
compare mean difference between 2 groups. most common type of t-test -independent variable is a categorical variable with 2 levels -dependent variable: continuous variable
interactions
consider 3+ variables at once. important when one relationship between variables depends on another variable. hard to predict in advance and need more data 2 way interaction types: -reversal: other predictor reverses original pattern. if lines gaps are significant both sides but patterns change. -moderating: other predictor weakens original pattern. if it disappears, attenuation. if it is significant on one side but not the other. -no interaction is parallel line, no change in bars, same extent of change interaction coefficient quantifies the change
statistical tests for groups different on other scales (not means)
cross tab analysis
logistic regression
for binary outcomes. predicts a probability of whether they will end up in a certain group. the coefficients represent increases in the log odds of y per 1 unit of x. log odds are hard to think about, so we convert them into odds ratios odds ratios: ratio of being in 1 group at x to odds of being in 1 group at x-1. odds ratio -1 = the % change in odds if the odds ratio < 1 there is a negative relationship between x and y, so more likely to be in group 0 if odds ratio = 1, there is no relationship between x and y if odds ratio > 1 then there is a positive relationship between x and y, more likely to be group 1 ex interpretation: age 0.9<1, so negative relationship with second arrest. for every year older, the odds of a second arrest is 1-x 10% less
MLR analysis
have to make sure you specify that you are controlling the other variables
purpose for factor analysis
helps us generate: 1. interpretable factors: smaller # of interpretable, managerially relevant factors or composite variables (easier to explain fewer vars) 2. shorter surveys: reduce number of questions for future surveys 3. uncorrelated predictors: create a small uncorrelated set of factors to use in predictions and clustering. no multicollinearity concerns
pitfalls with MLR
higher R^2 does not necessarily mean that it is better at predicting the future because you can overfit the data by adding too many polynomial degrees. can fix this by splitting the data or holding out the sample and use remaining to refine the model multicollinearity: bad things happen when predictors are strongly related to each other. r > 0.8 is strongly related and SPSS will warn sometimes. reason why we leave out group w dummy variables w m-1 all groups because otherwise perfectly correlated intrinsically assumes linear model is a good estimation but we can transform predictors (log) or add polynomials U shaped relationship: change in direction with inflection point. y=b0+b1x+b11x^2 is significant then it is a sign of U shape. but can still be significant if not U shape. regression only cares about making predictions better, lowering SSE. could have enough curve but not fit. use scatterplots to confirm or test 2 linear regressions at a point where if it changes if both sides after one point. webstimate.org will give suggestions for inflection points
exploratory data before regression
histogram, summary statistics, simple plots use to catch coding errors, identify outliers
multivariate regression
how 2 or more factors impact the dependent variable y=b0+b1x+b2x2+...+bkxk+e multiple independent variables: partial regression coefficient b1. change in x1 by one unit changes y by b1 units, holding all other variables constant adding/remaining predictors will sometimes change p value. could be from suppressor variables (not common) or multicollinearity (more common, when variables overlap).
categorical vs continuous interactions
if significant then the relationship btween the continuous and dependent var is dependent on the categorical variable. and effect is the coefficient every unit increase in continuous increases the dependent variable by coefficient more/less when categorical is present make 2 line graphs. first set everything to 0 except the continuous. then set 1 for the categorical
how to figure out which cluster it is in?
in SPSS click save and click cluster membership. this will create a new membership variable to see if it is in the cluster
paired samples t-test
in both groups, within subjects. each participant has data for both variables. trying to determine if 2 variable's means are different from each other. null hypothesis is that the averages of the 2 scales are the same
report findings with residuals
include pattern of the data with both variables, report test stat, sample size, p value. if there are too many differences and the description seems long, try making a bar chat of counts in excel
continuous variable
interval or ratio uncountable categories numbering matters ex: willingness to pay, 5pt scale of agree to disagree, age
striking data patterns
intuitively look at data and seems too strong a result to be due to chance examples great performance by particular product, bounce in sales during economic slow down, punctual in branch A but not branch B
which attribute is most important? conjoint
look at the range (max part worth - min part worth) then the importance weight is the individual range / sum of all ranges of weights across attributes the highest one is most important
SPSS output for t-test
mean = group sample average to see if reliably different, look at the sig 2-tailed. if it shows 0.000 it means <0.001
type II error
miss. the null hypothesis is false and there is something there, but you fail to reject because p > 0.05 and you say there is nothing there (fail to reject). high stat power = low type II error = less likely to miss the effect
null hypothesis
no change in experiment, no relationship between variables. boring
categorical variables
nominal. individual categories, not numbered, numbering does not matter ex. demographics, favorite pizza topping, gender
residuals
observed cell count - expected cell count if no association standardized residuals bigger than 2 or less than -2 indicates deviance from no association model positive residuals are a lot more people there, negative residuals a lot less ex. -3.6 there are a lot fewer dissatisfied skiiers than you would expect if there was an uneven distribution
partial coefficient
predictive power over and above other predictors. a wholly better predictor comes along, then regression abandons first. keep theoretically relevant predictors in your model even if null
p value
probability you would find another pattern at least as strong. the cut off is p < 0.05
cross tab analysis
quick easy tool for analyzing association between 2 categorical variables. for chi squared. easy to understand. all we can find out is whether there is an uneven distribution and somewhere in the grid there are differences but we don't know where (if you want to know associations , look at residuals). association is NOT causation. (an observed association may be drive by a 3rd variable, which we can't control for. need regression for this). be careful when a cell count is low, because the test won't work well. but SPSS will tell you
clusters - standardization
range and variation can matter so may need to adjust the data first (a bigger scale has more weight so it will ignore other ranges) you use the standard units and standardize the variable using z scores, which are standardized against the averages. to do this in SPSS: analyze, descriptive stats, descriptions. click save standardized values as variables which will create a new variable w the z scores for you. this is expressed in standard deviations from the mean. negative is below mean, positive is above mean. centered around 0.
forms of conjoint analysis
ratings based: linear regression. easier to interpret, getaway w smaller samples due to higher statistical power choice based: logistic regression. more realistic w external validity but inefficient, need more # of profile for same precision adaptive: changes options based on responses static: everyone sees the same
statistical tests for multiple categories
regression, ANOVA (analysis of variables). or you can use a subset of 2 groups
degrees of freeedom
sample size - 2
statistical power
the probability that true effect will be detected when the effect exists. correctly reject the null hypothesis in statistical test and find significant result. 1-Power = type II error rate (beta) = probability that we miss the effect if it is there usually we aim for 80% power (20% probability of type II error/miss). this means we care more about avoiding type I errors (false alarms) than type II errors to maximize power, we can increase sample size, study big effect sizes (difference between groups), or increase the significance value threshold alpha (but do not do that) to figure out how big enough you will run t tests to see when you first get significance. interactions between factors require 8x to 16x more sample size. big effect size then you don't need as big sample size to find significance
binary regression
there are 2 groups. it is just like independent samples t-test and you'll get the same values. 0 means value is y intercept, otherwise 1 you add the slope to it
coeff changes when using MLR
this might be due to multicollinearity. both strong predictors but one might be more attributable so higher weight
continuous vs continuous interactions
to graph this you need to plug in numbers for a few different levels for each variable and create multiple graphs. make sure the range is representative
SPSS interactions
transform compute variable. then have numeric expression w 2 vars where you multiply the two variables
regression when categorical variables non binary
transform into dummy variables. categorical variables with m levels, then you need m-1 levels. coded 0 and 1. name the variable for whatever is 1. reference level is whichever is not new dummy variable, interpret with respect to reference level. choose a reference that makes sense to compare to (control level). all dummy variables should go into the model. y=bo+b1x1+b2*x2 + e ex. if bx = 200, that means that variable is 200 more than the reference group the reference group will always be the b0 value
dummy variables in SPSS
transform, recode into different variables. pick old variable from the side. put the new name. hit change. then hit old and new values. put old/new value. click add. can also click all other values to create a new add for everything else. if you use, system-or-user-missing and system missing you can keep all missing values denoted as missing. click continue. to do multiple variables, run through the steps again
partial regression coefficient
unstandardized b (slopes)
part worths conjoint
utility for a specific level of particular attribute. how much that part of the product is worth to a consumer. building block of conjoint.
questions to ask about p values
we need p values to tell us if by chance, but we need to ask: -think about sampling process. if the hypothesis is before/after analysis? most times we don't know -if performed after or assumed to have been, seek replication. replication hypothesis will be defined before
comparing groups
we often need to understand the difference between groups. we need systematic approach that allows us to say when 2 or more groups of customers. companies, markets, etc are really different -segments (difference among behavior/attitude) -experiments (does the treatment work or not)
can we find patterns in data designed to be random?
yes examples: -PIN numbers should be approx = in frequency however people do not choose randomly. some examples of popular PINs are 1234, 1111, 0000, 1212, 7777. or 12345 or 1234567. we can create stories behind any of the PINS/ the graphs because it's hard to imagine what randomness looks like. examples such as lower digits, choose same 1st 2 as second 2 (line), second digits are birth year (vertical line more concentrated at top), month and day or day and month -Dutch lotto: people pick on the diagonal, multiples of 7, 123456, magic #s from lost, lotto advertisement logo. we don't guess most of these, but can tell a story once seeing it
how to know how many clusters?
are the clusters feasible to target? sufficient size to be considered a segment? differentiation between segments
hypothesis test examples
articles about medical stories to see if there is a statistically significant difference
conjoint analysis
attribute based approach. total utility = sum of utility of each attribute. attributes have different levels and each attribute level is given its own weight. so the valuation of the product is the total utility as long as the product is a combo of the existing attributes process 1. develop a set of attributes 2. select levels 3. obtain eval (rating or choices) of product profiles 4. estimate part worth vals for each level of each attribute 5. compute importance weights of each attribute 6. aggregation of results across consumers 7. eval tradeoffs among attributes 8. market simulations 9. eval accuracy of results used when people do not know the answer, hard to operationalize can be used to find market share, guide pricing strat, or figure out brand name equity (how much a brand is worth) ex. importance of screen design for the iPhone, considering phone as a bundle of attributes if you like phone A over phone B then you like the attributes of phone A more than the attributes of phone B. can rate these attributes and see which phone you like better
one sampled t-test
average of variables compared to a fixed number ex. average older than 25? wtp >$40? null hyp is average is the same as the chosen variable
interactions with categorical vs categorical
-the effect of one variable goes up/down by coeff when the other is added, holding all else constant you can test the moderation/reversal by doing one equation with plugging 0 in for everything. then for the second equation plug in 0 for everything except 1 for the second variables. then compare the two coefficients to see how the first variable changed. can also plot these as bar graphs if you plug 0s and 1s in for these variables
p values assume you predicted this in advance
-basic point for significance is that when you assess whether significant, people can't attend to how the hypothesis was arrived at. significance is dependent on if the hypothesis was made before or after data is analyzed -meaningless if defined after the analysis (however most hypothesis occur after) -defining p value before is what is the probability of THAT specific thing happening by chance? whereas after is something like that. there are so many like that's so one of them is likely to happen -these examples make it easy to identify random chance but not cause when you get your data back, you want to tell a story since coincidences are disappointing. you need to make the hypothesis in advance otherwise the something like that's inflate the false positive rate ex. 10 heads in a row. he said he could get ten heads in a row, but turns out they filmed for 10 hours to get this clip. the impossible became inevitable. if you record 1000 times it is likely to appear in one of the lines. he defined the hypothesis after (if it was before, he would've said this EXACT time I will get ten in a row) ex. tossing a coin 10 times, HHHHHTTTTT if you don't say if heads comes first before tails there's 2 possibilities so it's more likely. ex. monkeys type w keys and shakespeare one will eventually write something from it but you have to specify which ex. if trying to prove jelly beans cause acne you keep trying a bunch of different colors and green works and post that info but p value meaningless bc didn't set out for green in advance
reporting findings for t-test
-describe the pattern of data. include both variables, means, and standard deviations -report the test statistic. degrees of freedom, and t-stat -report the p value
report outcomes for chi squared
-patttern of data with both variables -test statistic (df, sometimes sample size) -p value
t-test in SPSS
1. calculate the p value of test in SPSS. smaller p is stronger evidence against the null. when p is small either the null is false or something really unlikely has happened 2. choose a significance level of alpha, 0.05 3. if p < alpha, reject null hyp. otherwise fail to reject null
how to replicate
1. new data set, test the same prediction 2. in same data set, test new prediction from the same hypothesis 3. same data set, falsify original hypothesis ex. veto from Arnold Schwarzenegger first letters of each line spelled out F you and he said it was just by chance. probability of something like that because the hypothesis was not that day in that spot of the letter w that word or veto instead of other statement and that word. To replicate, can't do new dataset bc needs to be that letter, but could look for patterns in that letter that suggest intentionality or bad words in other correspondences. unlikely due to chance but something like that more likely ex. bible code predicted bill clinton victory w hebrew letters. that (clinton president in bible) vs. something like that (other president, other words) can't collect new data set bc no other bible. but could look at other historical events and see if predicted correctly, or make sure code is not present in other books. ex. 911 was the winning number on 9/11/2002 in NYC. baseline is 1/(10x10x10) chance but changes if something like that because if 2 NY state lottery 2/1000 or other numbers that match a date or other states where plane down, or any state bc everyone affected. ex. Bush/Gore bush won the electoral, gore won popular. FL was deciding state and if 269 people voted differently, there would have been a different president. very marginal. in palm beach county, seniors complain about a confusing ballot, they used a butterfly ballot where you have to punch in a hole for the candidate but it's not entirely lined up so Buchanan, a reform party that is not normally popular got disproportionately high votes but this is after situation because nobody predicted that ballot in that county would impact. replications: new data set but butterfly ballots uncommon so hard to find another city, new pred same hyp is look at other candidates see if unusual on right side of ballot, or look at non butterfly ballots in palm beach
key points about patterns in statistical analysis
1. we naturally find patterns, even in randomness 2. we need p values to tell us if things are patterns by chance 3. p values assume we predicted in advance 4. if you run enough tests, something is bound to be significant by chance
independent samples t-test
2 groups independent. person can only be in one of these groups. between subjects design. determining whether 2 groups' means are different from each other
tailed tests
2 tailed test: non directional. ex pay different prices 1 tailed: directional. has twice the alpha (0.05*2 = 0.1) ex: web shoppers pay less in practice, only 2 tailed tests are used. 0.05 is always the conventional alpha cutoff for statistical significance
levels conjoint
2-4 levels per attribute specific. (ex. good is price = $499 or 8 hr battery life. bad is price 399-499 or long battery life. don't want ranges or something subjective that can be interpreted differently) evenly spaced numerical values, spanning a reasonable range. must freely combine together & make sense together. dont include 2 levels if they would not be able to exist together
independent samples t-test in SPSS
analyze, compare means, independent samples t-test the test variables are the continuous variables the grouping variables are the categorical variables then hit define group and put in your coding. ex: group 1: 0, group 2: 1 this will give you the descriptives box. in the independent samples test use the t-test for equality of means. use two sided p report as test(df) = t, p = . also report the means of each group ex. don't reject the null hyp, then the average liking of Luke Skywalker and btween men and woman are not different. but Yoda, reject null hypothesis because one group likes Yoda more on average. from descriptives box, the females average is more so females like Yoda more
data mining
abundance of data, lack of ante hypothesis algorithms or smart people thrown at data patterns arise (ex. soccer moms buying light drinks at starbucks 2x as likely to use drive through) are they significant and what were the odds?
factor analysis SPSS for actual factors
analyze data reduction factor extraction specify # of factors or put in a certain eigen rotation click rotated solution and loading solution click scores and check save as variables
logistic regression SPSS
analyze regression binary logistic use the last output box. the final col Exp(B) is the odds ratio
SLR in SPSS
analyze regression linear
SPSS cluster analysis if you don't know the number of clusters
analyze, classify, hierarchical cluster. add all your variables in, cluster cases and stats/picks checked. then click plots, dendrogram the dendrogram produces a graph w the participant numbers on the side and then the participant numbers in their groups. there are horizontal boxed lines that formed the groups. you look for long horizontal lines because a long length is a very different response vs close is similar. draw a vertical line through these large horizontal lines. the number of horizontal lines crossed through is the number of clusters, and these horizontal lines lead into the group
one sampled t-test in SPSS
analyze, compare means, one sampled t-test. change test value to the # you are comparing to. click ok. use two sided p in significance
correlation in SPSS
analyze, correlate, bivariate
factor analysis
data -> factor analysis (what hangs together in correlation matrix) -> 2 super variables (factors) multiple x variables can be summarized by a smaller set of underlying factors. 1. identify set of variables appropriate for factor analysis ( have to be correlated). include only variables that are correlated, little correlation not appropriate. make sure KMO value > 0.6, significance for barlett's below 0.05. diagonal elements in anti image correlation matrix above 0.5 are acceptable (if below remove the variables and run the analysis again since they are not related to survey) 2. identify how many factors. looking at total variance explained and scree plot. if eigenvalues > 1 keep all factors w eigen values > 1 on the graph. bigger eigen better at capturing data scree plot: plot on y axis % variance explained/eigenvalues and x axis is # of factors. look for kink/elbow, and all points above is # of factors. after kink, smaller marginal diff between factors so adding a factor does not give much more worth and u want as few variables as possible. total variance explained (least recommended bc not as hard and fast set on percentage) but accept a factor analysis where % variance explained >70% (cumulative % of var). max # of factors is the # of var in orig survey but this is not a good solution. 3. identify what the factors are. looking at rotated component matrix. rotated solution so the variable loads highly only on one factor (always look at rotated!) bc otherwise all variables loaded highest on 1st variable. label the factors based on highest loads in absolute value in each row. no hard cutoffs just what is relatively high or low. higher is more related. factor loadings are the weights that determine how each factor is related to each observed variable. the loadings help you determine the name that best describes the factor factor scores: these are inferred factors for each response and how they would respond to a factor based on what we did ask. a weighted average of variables that are linked to that factor. standardized. factors are perfectly uncorrelated. flexible but has guidelines to interpret while still subjective then run cluster analysis on these saved factor variables. ex. finding the key personality trait variables to define personalities of people who can represent brands. finding the correlations between these personality traits. find that some are the operationalizations for the same construct and realizing they answer the same question and people respond similarly on these questions. highly correlated variables have similar responses so can be combined to one super variable
correlation
degree of association between 2 continuous variables. -1 to 1, where 0 is no association. symbol is r
regression
describe and predict one dependent variable (continuous outcome) from set of predictor variables (independent variables. could be continuous, nominal, or both. can be ran on 3+ variables) any # of variables. not necessarily categorical. harder to understand yi actual observed data yi hat is predicted value (the line) errors are yi - yhat line minimizes SSE (sum all i (yi-yi hat)^2 y=bo+b1x+e where y is outcome, dependent, continuous x is independent/predictor, any type e is error term bo is intercept where line crosses the y axis b1 is the coefficient, slope. every unit of increase of x, y increases by b
SPSS output for chi squared
descriptive boxes with #s in each group. use pearson chi square value, asym sig (2 sided) is the p value
summary descriptive statistics SPSS
descriptive stats, descriptives
histograms SPSS
descriptive stats, frequencies, charts, click histogram
alternate hypothesis
difference between conditions, relationship between variables
one way chi squared
distribution for one variable null hyp: participants are evenly distributed across all groups ex. is there a gender imbalance across participants?
2 way chi squared
evaluated distribution across two different variables ex. fan of Star Wars yes/no and fan of Star Trek yes/no
t-tests for 2 groups example
ex. do web shoppers pay different price than dealership shoppers. null hypothesis is that people pay the same, alternate hypothesis is webshoppers do not pay the same amount (with no direction)
type I error
false positive, the null hypothesis is true but you say it is not true and reject the null hypothesis. the lower the significance level alpha, lower probability of type I error occurring. 0.05 is the conventional alpha level, the cut off for statistical significance. 5% of all samples, we will reject a true null!
cluster analysis
seeks to group objects such that segments are created that are homogenous as possible given the variables. works on principle of maximizing between cluster variance while minimizing within cluster variance. every object is allocated to one cluster not regression based. no p values to point to one right interpretation. it is subjective, requires you to decide what outcome is best for your data can only use continuous variables (usually interval or ratio) this is research for segmentation, how to find the segments. cluster analysis is computationally severe. it is strategically advantage if study one so u can make a strategic decision and segment for each include psychographic variables. but note that people do not use outcome measures (sales, satisfactions) to determine cluster but may use to link to cluster. after fitting clusters, see which variables differ between clusters ex. positioning fielders in baseball it can find clusters of where the ball is typically hit to decide where to stand. but also know to bring expertise and not blindly follow because there could be gaps ex 2 shopping attitudes. can find different segments of shoppers with similar attitudes based on their answers to all the survey questions
SPSS cluster analysis when you know the number of clusters
shows you what the actual cluster is Analyze classify Kmeans cluster input # of clusters click options. click initial cluster centers and ANOVA table output ignore the first 2 tables. work bottom up -# of cases in each cluster: an evenish distribution is ideal, make sure no clusters of one -ANOVA: are all variables important? look at the sig to see if less than 0.05, it should happen but if not it is not distinguishing between clusters so you should get rid of that variable -final cluster centers: clusters average responses to each variable. to describe the clusters, identify which variables are very positive or very negatively associated w each cluster (highlight the high or low numbers). figure out labels for each cluster if the cluster is 1 person or the variables are not significant, then probably rule this out as a solution
profile conjoint
structured summary of product in terms of set of attributes. conjoint analysis presents products/services as bundles of attributes
SPSS value label
tells you what a variable means and what you encoded it as