CS 201 Consumer Research & Analysis
Two methods of data collection (descriptive)
1. Observation (watching): watching and capturing the relevant facts, actions, or behaviors What they "actually do" 2. Communication (asking): Surveying respondents about desired information using questionnaire What they "say they will do"
The report summary is the most important
"Executive summary" -sometimes this will be the only part that is read -think about the "elevator speech" --"60 seconds"
Cross Tabulation
-"two way" frequency analysis. Also known as contingency tables -Examines the relationship between two or more CATEGORICAL variables -reveals relationship that otherwise may not be apparent
Design content around the audience
-Clear beginning and end --state your purpose and call to action -Ensure content can be seen by entire audience --back row check -Visuals understood at a glance -Use a variety of visual aides, avoid monotonous content --avoid too many charts --avoid all bullets and all text
Key Considerations for Communication research
-Degree of structure -Degree of disguise -Method of administration
Presentations and reports have distinct roles
-Different context --presentations are in the moment and guided --reports are stand alone -Different audience --presentations have live audiences --reports often serve broader and more diverse audiences -Different purposes --presentations aim for high level understanding and quick impact --Reports are detailed archives that live on
Build slides that are simple and focused
-Error on the side of more slides than less --fewer points per slide --better to stretch than cram -Call attention to your talking points --highlight and emphasize significant points --use arrows, circles/boxes, or color highlighting and so on -Plan your slides --max of one minute per slide --leave room for Q&A
Special consideration needs to be given to the response options
-Include a "don't know" if applies to a sizable portion (>/+20%) -Responses must be exhaustive, may need to include an "other" option -Responses must be mutually exclusive --include "check all that apply" --include "most important" -Response order bias occurs when responses to a question are influenced by the order --options earlier in a list tend to be selected more often --randomize or "split ballot"
R^2 problems
-Is sensitive to the number of I.V's included --can increase R^2 simply by adding more I.V's regardless of the quality of their contribution (adjusted R^2 tries to account for this ) -Does not account for degrees of freedom -What constitutes a "good" R^2 varies widely by subject area and is difficult to have a general "rule of thumb"
The online survey
-Keep it as short as possible, including instructions -Optimize the survey for use with mobile phones -Begin with a question likely to engage the respondent's attention -Keep questions as simple as possible -Use visual/graphics if they help, but don't make the survey complex or difficult to navigate -Remind respondent's about incentives, and explain how to obtain them -For long surveys, let respondent's see progression through the survey and how much remains -Pretest the online survey
Questionnaire Development Process: optimize the appearance of the questionnaire
-Short as possible -beware of clutter -use graphics and other visuals to improve appearance -Build in other interactive features **opportunity to engage
Questionnaire Development Process: Begin to "work" the question areas
-Start by stating the issues, specific wording comes later -Idea is to capture needed data using as FEW questions as possible Is the question necessary? Are several questions needed instead of one? Do respondents have the necessary information? Will respondents give the information?
Sample size involves making tradeoffs
-The greater the precision, the larger the sample needed -The bigger the sample, the more confident that the true value falls within the range -The greater the variability in the sample the larger the sample size needed Increases in desired precision, confidence or variation lead to increases in necessary sample size
Recruiting Messages on Email
-Use a personal "from" name -Keep the subject line simple, but interesting -Avoid language that will get caught in spam filters -Personalize the message by using recipient's name -Include short, effective message to capture attention containing information about (a) who you are, (b) purpose (c) request for help (d) length/time of survey (e) confidentiality and (f) incentives -If offering incentives, make them meaningful -Consider the timing of the email- know your audience -Send reminder emails, but no more than 2 -Pretest the recruiting message
Sum of Squares in regression
-a measure of how the data varies around the mean --variance is the average of the sum of squares -A high sum of squares indicates that most of the values are farther away from the mean --indicates there is a large variability in the data
Write like you talk
-aim for shorter words -avoid vague modifiers -use specific, concrete language -delete words you don't need
Chi-square goodness-of-fit test
-applies when you have one CATEGORICAL variable from a SINGLE population -tests how likely it is that an observed value (your sample estimate) is due to chance -goodness of fit statistic shows if your sample data represents the data you would expect to find in the actual population - an "expected value" is the same conceptually as a mean or average
Handling nonresponse
-eliminate cases with a significant amount of item nonresponse - eliminate the case with the missing item(s) from all further analysis -Substitute values for the missing items -Contact the respondent again
Pivot tables used for cross tabs in excel
-shows the relationship between two variables without changing your dataset
Two forms of hypothesis testing are used in decision making
1. Null hypothesis -proposed result is not true for the population -difference is caused by random chance 2. Alternative Hypothesis -proposed result is true for the population -Difference is "real" We are always looking to REJECT the NULL hypothesis and not accept the alternative fail to reject the null hypothesis = there is insufficient statistical evidence to reject it
Written reports are all about the detail
1. Completeness: include what really matters and place the rest in the appendix (or leave out) -be sure every question has been addressed! 2. Accuracy: reasoning is sound and accurate -ideas are well supported from quotes to statistics 3. Clarity: outlines key points for the reader -tell the reader what the report covers and then cover it
Understanding nonresponse error
1. Contact a SAMPLE of nonresponders -compare responses to see if they are different 2. Compare respondents demographics against population -see if certain groups are over or underrepresented 3. Conduct an analysis of late responders vs. early responders -looking to see where they are different
Key considerations for bivariate analyses
1. Cross tabulations most used tool for categorical variables 2. Difference in means, independent, and pairwise t-test 3. Correlation coefficient, r, to see if two continuous variables are linearly related *cross tabs **t-tests ***correlation
Key Steps in interpreting multiple regression results
1. Does the set of predictors explain a statistically significant portion of variation in the dependent variable?(look at F-statistic) 2. How much of the variation in the dependent variable does our set of predictors explain? (look at the coefficient of multiple determination) 3. Which of the individual predictors explain variation in the dependent variable, and what is the direction of the relationship (positive or negative)? (Look at the t-values and p-values of the individual predictors)
2 General uses of Cross tabulation
1. Examine the cause and effect relationship between variables -IV is the cause or the predictor variable -DV is the effect or the outcome variable 2. Understand the joint distributions of two variables
General report structure includes 3-5 distinct sections
1. Introduction -sets up project -provides background and specifies decision problem and research problem(s) 2. Results -organize to provide answers to the research problems -include tables and figures as needed 3. Conclusions -rule of thumb is one for each research problem that motivated the study 4. Recommendations &Call to action -what to do next based on learning -should follow the conclusions 5. APPENDIX
Nonprobability Samples
A sample that relies on personal judgement in the selection process Sampling error cannot be estimated, and we cannot make inferences about the population Several kinds -convenience -judgement -quota
Four main regression statistics aide in interpretation
1. R (multiple R): The correlation between the predicted value and observed value on the D.V. 2. R^2: the proportion of variance which is "explained" by the regression equation --simply the square of multiple R --"goodness of fit" measure 3. F: the statistic used to test whether the model fits the data well 4. T: a measure of whether the I.V. has a significant relationship with the D.V.
Three common examples of innaccuracy
1. Simple errors in addition or subtraction 2. Confusion between percentages and percentage points 3. Inaccuracy caused by grammatical errors
Data analysis hinges on the variable(s) in question
1. Univariate Analysis: one variable at a time (or "describing data") 2. Bivariate analysis: two variables at a time (how X is related to Y) and the focus shifts to analyzing the relationships between the variables 3. Multivariate Analysis: more than two variables at a time
regression results can be expressed in two forms
1. Unstandardized: provides coefficients in the original metric of the I.V. Has a Y-intercept Does not allow you to compare the contribution of I.V.'s 2. Standardized: Provides standardized coefficients (somewhat like the correlation coefficients we studied) Does NOT have a Y-intercept allows you to compare the contribution of I.V's
Questionnaire Development Process: determine the wording of each question
1. avoid ambiguous words & questions 2. Avoid leading questions 3. Avoid assumed consequences 4. Avoid generalizations 5. Avoid double-barrel questions
Key Considerations for Univariate analyses
1. categorical measures simplest form, frequency analysis most common 2. Continuous measures add greater depth, mean, and standard 3. Confidence Intervals integral to univariate statistics **Frequency analysis ***mean & variance
Three MUST HAVES for effective communication
1. know your audience 2. Use a "goldilocks" level of detail--not too much, not too little 3. Always end with a very clear call to action
Communication Considerations: Method Considerations
1. sample control -ability to project or be representative 2. Information control -managing interview bias -level of anonymity 3. Administrative control -costs of sending out survey, such as paper and stamps
Likert-Summated Ratings Scale
A form of an interval scale where respondents indicate their degree of agreement or disagreement with each of a number of statements developed in the early 1900s
Simple Random Sample
A probability sampling plan in which each unit included in the population has a known and equal chance of being selected for the sample -relatively easy implementing a digital version
Systematic Sample
A probability sampling plan in which every kth element in the population is selected from the sample pool after a random start
Comparative-Rating Scales
A rating scale based on a series of relative judgements or comparisons rather than as independent assessments -Constant-sum method A comparative-ratings scale in which an individual divides some given sum among two or more attributes on a basis such as importance or favorability
Probability Samples
A sample in which each target population element has a known, nonzero chance of being included in the sample -RANDOM selection that makes the task OBJECTIVE -simple random -systematic -stratified -cluster (including area)
Highly Standardized Questions have distinct advantages and disadvantages
ADVANTAGES -Ease of administration -Ease of coding and analysis -Measure reliability DISADVANTAGES -forced choice -omitted response -precision of response
an effective message stems from knowing your audience and revealing what you want them to do
AIM: audience, intent, message
Descriptive statistics are at the core of the most analyses for continuous variables
Aim is to describe properties of data based on measures central tendency and of dispersion -Central tendency: central point of distribution (mean, median, mode) -Dispersion: variability or how the scores are scattered around the central point (variance, standard deviation, range)
Sampling Plan: Step 1 Define the Target Population
All individuals and entities that meet the criteria to qualify for a research study -must be ver clear and precise in defining the population Looking to determine what is true about the population based on the sample
Error in Measurement
All measurement includes error Random error: error in measurement due to temporary aspects of the person or situation Systemic Error: Error in measurement that is constant, affects the measurement in a constant way Observed response= TRUTH + SYSTEMIC ERROR + RANDOM ERROR
Convenience Sample
Being included in a sample as a matter of convenience, right place, right time -easy to conduct -no way to know if sample is representative of the population -mall intercepts, surveys on websites
Questionnaire Development Process: determine the sequence of the questions
Best practice is funnel approach -start with broad questions and progressively narrow the scope Considerations -question order bias -branching questions -classification & sensitive information last
Closed-ended and Open-ended questions work together
CE questions gather the facts and OE questions provide the context
Observation and communication have distinct advantages
Communication: Versatility, speed, cost Observation: Objectivity, accuracy
Questionnaire Development Process: Ensure respondents are able to answer the questions
Consumers may not recall what is being asked 2 types of error 1. Telescoping error: thinking something occurred more recently/further out than reality -tend to include purchases from broader time frames -gets worse as time frame asked about is shorter 2. Recall loss: forgetting that an event happened at all -is reduced as time period being asked about is shorter best time frame when asking consumers is 2-4 weeks
Two-Box Technique
Converting interval-level ratings into categorical measures by only showing the top two positions on a rating scale
Used when both variables are CONTINUOUS
Correlation is the most useful -Pearson Product-moment correlation coefficient --indicates the degree of linear association between two continuous variables --tells us whether there is any relationship between two groups --Sample correlation coefficient (r) can range from -1 to +1, the closer to 1, the stronger the association
Communication Considerations: Method
Covers how the research will be carried out -traditional forms of data collection --personal interviews: remain the most versatile --telephone interviewing: becoming increasingly difficult --Mail (paper) surveys: shift to paperless among reasons for decline --Online surveys: today's predominant method
Error goal
Decrease TOTAL ERROR not any one source of error
Mode
Defines the most frequently occurring value in a dataset. The mode exists as a data point and is unaffected by extreme values
What is the most prevalent type of Primary Research?
Descriptive Analysis
Questionnaire Development Process: Confirm the method of administration
Desired information guides choice of method and nature of the questions Consider the advantages and disadvantages along with degree of structure and disguise
2 forms of observation
Direct- observing the actual behavior or activity Indirect- observing the effect or result of a behavior or activity
Most measurement relies on self-report, including less tangible such as attitudes scales
Easy to measure facts: age, income, preference typically use lower level scales Other qualities are harder to measure, feelings. Attitude scales are relevant Several self-report rating scales used to measure "unobservable" concepts 1. Itemized-rating scales 2. Graphic-ratings scales 3. Comparative-rating scales
Types of Error: Nonresponse error
Error from failing to obtain information from some of the sample elements of the population Only an issue when those that did not respond are systematically different in a relevant way
Types of Error: Noncoverage error
Error that arises because of failure to included qualified elements of the defined population in the sampling frame -this is a sampling frame issue and can be mitigated by enhancing quality of the sampling frame
Types of Error: Response Error
Error that occurs when an individual provides an inaccurate response, consciously or subconsciously Key considerations -Did they understand the question? -Do they know the answer to the question? -Are they willing to provide the truth? -Is the wording of the question likely to bias the response? the best way to mitigate response error is PRETESTing
Personal Interviews: Distinct and Useful
Face to face conversations between interviewer and respondent -can be conducted in many different locations and handle a variety of information, either open end or fixed -generally strong sampling control with higher response rates -Great flexibility, but higher levels of interviewer bias -Time and cost intensive
Questionnaire Development Process: Ensure respondents are able to answer the questions
Filter question: a question used to determine if a respondent has the knowledge or "qualifies" for the study
Histogram
Form of a bar chart that shows values of the variable on the x-axis and the frequency of the value on the y-axis -Differs from a bar graph that relates two variables while a histogram shows only one
What level of measurement to use with data analysis
Four levels of measurement based on the data Scales each lend themselves to different analyses
More benefits on observation research
Gathering insights on behaviors is a near certainty, watching what people are "doing" -Don't rely on memory -Don't depend on willingness -DO often see the unexpected Lessening the expectations of consumers
Converting continuous measures to categorical measures can be useful
Higher levels of measures have all the properties of lower levels of measurement -can be easier to interpret, facilitates charting like the histogram
Observation Considerations: Administration
How the data are collected -Human: researchers SEE a behavior and record specific activity and events that take place (people watching) -Mechanical Observation: an electrical or mechanical device CAPTURES activity
Continuous Measures add greater depth
Interval and ratio-level data are both considered "continuous" variables and can accommodate numerous types of statistics
Data Preparation: Coding
Involves transforming raw data into symbols (usually numbers) -Given descriptive data are largely closed-ended, task involves assigning a number to each of the response categories --Single responses are straight-forward whether nominal or interval --Multiple responses can be more complex, often turn the one question into "many"
90% of the work is below the surface, collecting, organizing, scrubbing, analyzing
Just 10% is the story
Need to know your audience AND "your stuff"
Know your stuff -knowledge and clarity of the data & analysis -enables effective communication -ready to address questions Know your audience -comfort level with technical/analytical content -Involvement &interest in the project -history with the project -relationship with the team
Decision rules for hypothesis testing start with the null hypothesis assumption
Looking to reject the null hypothesis if the observed value is in the critical region -fail to reject ("accept") the hypothesis otherwise
Online Surveys have many capabilities due to technology
Many ways to administer a survey online, such as through a website or over email -explosion in use over the past decade -email lists and panels are readily available, but response rates are often very low -Flexibility: visuals and complex material possible -Usually quick and inexpensive
Types of Error: Office & Recording Errors
Mistakes made by people or machines Errors due to data editing, coding, or analysis
t-tests for means
Mix of categorical & continuous variables - a statistical technique that tests the difference between two means --follows a bell shaped curve, like the normal distribution -three main types of t-tests involving 2 variables, independent and pair samples --consistent with the one sample t-test applied in univariate analysis
Categorical Measures are the simplest form of univariate analysis
Nominal and Ordinal scales often referred to as "categorical measures" because they are used to categorize respondents
Other Considerations in Designing Scales
Number of items in the overall measure -overall vs. individual measure -# of attributes or dimensions needed to capture the overall measure Including "Don't know" or "Not Applicable" -If the question is not relevant to some or if some do not have an opinion about the topic -Avoid forced response Number of scale positions -5 pt (usually minimum number used) -10 pt (usually maximum number used) -5 to 7 pt most popular Odd or even number -odd is more common, provides a midpoint/neutral point
Ordinal Scales
Numbers are assigned to data based on some order, more than or greater than. Example is income (higher # is meaningful) -Only the order matters (6= $100K, 1=$20k) -Yet, the numbers we assign themselves don't matter, as long as the order is preserved -Differences between each is not known
Interval Scales (rating scales)
Numbers represent meaningful differences in the both the order and the value between them. ex: 7 point liking scale -differences can be compared: rating of "7" is higher than "1" ALSO the difference between 1 vs. 2 is the same as 6 vs. 7 -no "true" zero, just another number on the scale
Key considerations in observation research
Observation is behavior in the moment -can't watch past behavior or future intentions -Hard to "see" attitudes and values Observation on their terms -must "wait" for behaviors to happen -Scanners are an exception, rapid and ongoing Observation is the "real deal" -highly accurate -less subjective
univariate analysis
One variable is analyzed at a time. The major purpose here is to describe
Questionnaire Development Process: Determine the form of response to each question
Open ended questions begin with: what, why, how Closed-ended questions begin with: is/are, do/did, would/will, could/can, was/were, have/has, which, who, when, where
Precision vs. Confidence
Precision: the degree of error in an estimate of a population parameter Confidence: how confident we can feel that an estimate approximates the true value Precision and confidence inversely related = As one increases, the other decreases, all else equal
Two Primary Sources of Nonresponse Error
REFUSALS -overcoming refusals is an important in any method -Making multiple requests helps to mitigate -Personal interviews naturally have far lower refusals NOT-AT-HOMES -Those who do not answer calls or contacts -aim for a higher response rate from a smaller sample -a smaller sample will also help decrease the Total Sample Error
Semantic-Differential (Rating) Scale
Respondents check which phrase between a set of bipolar adjectives or other words that best describes their feelings toward the object ***inconvenient vs. convenient
Obtaining the highest possible response rates
Response rate serves as an indicator of the overall quality -low response could indicate poor questionnaire design -Provides insight into influence of nonresponse error Ways to improve -shorter surveys -guarantee of confidentiality or anonymity -tighten interviewer characteristics and training -enhance personalization -make it more interesting -offer incentives -implement follow-up surveys
Questionnaire Development Process: Decide what information is needed
Revisit the reason this project was initiated in the first place -what is the business problem? -what have you learned so far, such as from secondary research? what are general issues and questions? ultimately need to specify the research objective
Sampling Plan: Step 2 Identify the Sampling Frame
Sample frame: the list of population elements from which a sample will be drawn; could be geographic areas, institutions, individuals, or other units Commonly used sampling frames: -customer database -Member directories -Lists developed by data compilers -Others
Observation Considerations: Disguise
The amount of knowledge people have about a study -disguised studies= people are not aware --captures more authentic behavior, "can become part of the scene" --ethical considerations--> debriefing: respondents are made aware after the fact -undisguised studies= people know they are being watched --likely to forget after the first few minutes unless reminded --can collect additional background information
Observation Considerations: Structure
The degree of standardization used in research Structured: "looking for" Sometimes we have something specific we want to observe, such as the nature or levels of participation. We use a preset guide of what we want to observe or even a checklist Unstructured: "looking at" Sometimes we want to observe what is happening naturally. We have an open-ended approach to observation and record all that we can see
Sampling error
The difference between sample results and the results that would have been obtained from the whole population -can be estimated (assuming probability sample) -due to chance -usually less troublesome than other kinds of error -decreased by increasing sample size
Median
The middle value in a dataset that is arranged in ascending order and divides the dataset into halves. May or may not exist as a data point depending on whether there is an odd or even number
Mean
The most stable measure of central tendency. However, it can be affected by extreme values, such as anomalies. It may or may not exist as a data point and is the arithmetic mean, or some of all of the values divided by the total number of values
Stratified Sample
The population is divided into subsets and a random sample of elements is chosen from each subset Most appropriate when strata are similar within but different between with respect to key variable(s). Decreased variance within strata on key variable(s) means increased position
Cluster Sample
The population is divided into subsets and a random sample of one or more subsets (clusters) is selected Strata should be similar within and different between. Area sample is a form of cluster sampling in which areas serve as the primary sampling units
Hypothesis testing
The process for how we determine if the sample result is true -uses confidence intervals to provide a set of standards for making decisions -Decisions about whether (to accept) the results as a true measure of the population
Sampling Plan
The process of selecting people to be interviewed Six steps for "drawing a sample"
Significance level
The rejection region corresponds to alpha or the probability of making the wrong decision -the acceptable level of error is usually set at 0.05 -The level of error refers to the probability of rejecting the null hypothesis when it is actually true for the population
Key considerations in observation studies
There are several factors that need to be considered. -Degree of structure -Degree of disguise -Setting -Method of administration
Sampling Plan: Step 4 Determine the Sample Size
Three pieces of information are needed 1. How much PRECISION is desired in the estimate 2. How CONFIDENT we need to be the true value falls within the precision range established 3. How HOMOGENEOUS (similar) the population is on the characteristic is to be estimated the size of the population generally does not impact the size of the sample
Data Preparation: Inspecting the data to ensure quality standards are met
Top 5 Tasks (editing or "cleaning" of the data): 1. Convert all responses to consistent units 2. Assess degree of nonresponse, delete record if >/=50% are missing 3. Check for consistency across responses 4. Look for evidence that respondent wasn't thinking about answers 5. Verify that branching questions were followed correctly
Sampling Plan: Step 3 Select a Sampling Procedure
Two categories of sampling techniques -probability samples -non probability samples
Disguise helps to get at the truth but can also raise concerns
Use of disguise can be a violation of the respondent's right to know -respondent may wish to "opt out" Can lessen concerns by letting respondents know the study is "blind" and why In addition, "debriefing" provides respondents with information afterwards, including reason for the study or sponsor
Validity and Reliability of Measures
Validity: accuracy is another term for validity. A test is valid if it measures what it is supposed to measure. Reliability: reliability is another term for consistency. How ell the measure obtains consistent scores across time or situations
Observation Considerations: Setting
Where the study takes place -Contrived: observed in an environment that has been specially designed for recording their behavior --control over extraneous influences --"imitation" store, computer simulation -Natural: observed where behavior normally takes place --realistic behaviors without prompting --shopping in a store, using or consuming a product at home
Outlier
a data point that differs significantly from other observations due to variability in the measurement or it may indicate error
Sample Standard Deviation
a measure of the variation of responses on a variable. The standard deviation is the square root of the calculated variance on a variable
Mechanical Observation
aided observation utilize technological advances Video cameras- ubiquitous application People meters- capturing media behavior Bar code scanners- the first revolution in retailer industry
Other considerations with Data preparation
build a codebook with all the details about how data from data collection forms are coded in the data file -variable name -variable description -source of data -Process for handling missing data Identify blunders that are administrative errors that arise during editing, coding, and data entry -run frequencies on all the variables -check questionnaires against data -Double entry of data -scanning devices the way of the future
Cumulative Percentage Breakdown
categories are formed based on the cumulative percentages obtained in a frequency analysis
One sample t-test
compares the mean of your sample data to a known value, such as a known population mean
Categorical Variables
contain a finite number of categories or distinct groups (nominal and ordinal)
Questionnaire Development Process: re-examine
developing a survey normally requires several revisions of the data collection form. It is an iterative process
Types of Error: Sampling Error
difference between sample results and population results. Often an issue with the sampling frame -easy way to reduce (increase sample size) and there are easy ways to account for it (calculate confidence intervals and the margin of error or standard deviation) Highly dependent on the size of the sample- and can only be calculated with probability samples
Communication Considerations: Disguise
disguise is essential when knowing the purpose or sponsor is likely to bias respondents' answers -may cause respondent to change their answer -interrupt true spontaneous answers Disguise is also useful when recreating the natural environment is necessary, particularly in experimental research
Snake Diagram Scale (variation of semantic differential)
display of multiple semantic-differential ratings Lines represent average scores
Four Types of Scales used in Primary Research
each represent different levels of measurement the higher the level or scale, the more we can do with the data HIGH TO LOW -Ratio -Interval -Ordinal -Nominal
Questionnaire Development Process: Ensure respondents are able to answer the questions
even when respondents do remember, they may not wish to answer What do you do when someone asks you a difficult or personal question? -guarantee anonymity -place sensitive questions near the end -Include statement showing the situation is NOT unusual -ask in terms of "other people" -Ask for general rather than specifics -consider a randomized response option -Don't ask unless absolutely necessary
Coding Open-end items
factual open-ended items seeking concrete responses are relatively easy to code -numeric answers are typically recorded as given in the survey -other types of responses are given a specific code number Open-ended, less structured responses are more difficult to code 1. Identify the themes and patterns in various responses 2. Develop categories for responses 3. Sort responses and give the categories codes 4. Ensure consistency in codes
Goal is to "respect the analysis" while "looking for the stories"
focus on how to craft the story that causes your audience to take action -craft--illustrate--share
Research reports often serve as an archive
include all relevant content and support in the appendix --data collection forms and questionnaire --data tables and statistical output --additional exhibits not included in results (if needed) --data file --references and bibliography
Confidence Intervals
integral to univariate statistics -a range of values around the estimate that is believed to contain the true value -Indicates that the one can be 90%,95% or 99% confident that the population mean lies somewhere in these ranges --one of the main reasons for drawing a probability sample The wider the confidence interval, the further the estimate is from the true value and the less precise our estimate is -takes sampling error into account
Questionnaire Development Process: Pre-testing
involves "testing the survey" before actual data collection begins saves rework and ensures you get the data that you want
Most common type of categorical univariate analysis: Frequency Analysis
involves a count of the number of cases that fall into each of the possible response categories -use of percentages to interpret the results of categorical analyses -Identify blunder -Identify outliers (or the anomalies) -Identify the median
Median Split
involves identifying the value that is at the 50th percentile, which is done by looking at the cumulative percent in a frequency analysis, and the values up to and including this value will form one group
Communication Considerations: Degree of Structure
involves the extent to which questions are standardized With surveys, questions are either fixed alternative or open ended Fixed alternative questions: questions and responses are highly structured, everyone sees the same questions and responses, commonly ask about "evaluations" Open-ended questions: standardized question, but the response is "open", tend to ask about "feelings"
Snowball Samplign
judgement sample that is used to sample special, hard to find, populations in which an initial set of respondents are located and asked to with others with same, special characteristics
Nominal Scales
numbers are assigned for the sole purpose of identification or to "label" ex: gender, 1=female 2=male Used to categorize objects, just "labels" Categories don't overlap, mutually exclusive Numbers don't mean anything, just for coding purposes
Continuous Variables
numeric variables that have an infinite number of values between any two values (interval and ratio)
Random or Sampling Error
occurs as a result of sampling variability, when sample mean(s) differ from the true population mean because of random error. Some below the true value, some above it Variability occurs around the true value in a random way
Question Format
questions should be suited to the desired type of response High structure questions -Most useful when possible replies are known, limited in number and clear cut -Obtaining factual information and assessing opinions about issues -Used to collect rating on attitudes, perceptions, and awareness Low structure questions -Best when the goal is hearing from consumers in their own words -appropriate when responses are not clear cut -More suited to emotional based questions
Graphic Ratings Scale
ratings of an attribute represented by a point on a line, vs. fixed number, that runs from one extreme of the attribute to the other -Similar to itemized rating scale except a larger/undefined number of categories are used (vs. limited)
Systemic error or bias
refers to the tendency to consistently underestimate or overestimate a true value variability occurs around the true value in a random way
Randomness and unpredictability are essential to regression
residuals (e)= the difference between the observed value of the dependent variable (y) and the predicted value (y^) a residual plot shows the residuals on the vertical axis and the independent variable on the horizontal axis --randomness means model is fitting
Judgement sample
sample elements are handpicked because they are expected to serve the research purpose -Sample elements may be representative -Or the can offer the information needed -Snowball -Onsite panels
Quota Sample
sample is constructed with certain characteristics that reflect the target population -goal is to build a sample that looks like the population -Sample elements at discretion of researcher -online panels
Itemized-rating Scales
scales that indicate ratings of an attribute or object Respondent selects the category that best describes their position or feeling Most common: 1. Likert Summated-Ratings Scales 2. Semantic Differential Scales
Measurement
simply "rules" for how to interpret data or outcomes process of assigning numbers to objects (like people) to represent quantities of attributes OR memberships to groups Two key considerations -measure characteristics of a person -the way we measure these characteristics varies, attributes have different qualities
Ratio Scales
tell us about the order, exact value between units, AND have an absolute zero. ex: Height & weight -"zero" is meaningful, can compare the differences between the numbers -Can compare intervals, rank the numbers, use numbers to identify -Numbers can be added, subtracted. multiplied, divided (ratios)
independent samples t-test for means
test the difference in two means (different samples) ex: satisfaction ratings among males v. females
Paired Sample (dependent samples) t-test for means
test the mean of pairwise differences (same samples) ex: before and after measures, comparing ratings on different attributes with same sample
Systemic Sample Interval Formula
the number of population elements to count (k) when selecting the sample sample members in a systematic sample k= # elements in sampling frame// total sampling elements
"Paper" surveys once the norm are taking on new forms
traditionally sent by mail, respondents complete and return to organization -lower degree of sampling control -no interviewer bias and can offer anonymity, but less flexibility -lower cost than personal or telephone interviews
Bivariate or Multivariate analysis
two or more variables analyzed together. The purpose is to understand relationships
P-value
used to interpret results in hypothesis testing, as in many statistical tests -probability of obtaining a given result if the null hypothesis were true in the population -a result is regarded as statistically significant if the p-value is less than the chosen significance level of the test, .05 --If p-value > alpha fail to reject the null hypothesis --If p-value <= alpha reject the null (significant result)
Where do Chi-square tests come in?
used to test whether two variables in a cross tabulation are independent
median
value separating the upper half from the lower half of a data sample. Often thought of as the "middle" value