MKT310 Concept Check 2
drop a case completely if...
half or more of the responses are missing
Precision
the degree of error in an estimate of a population parameter
Output viewer
•shows you tables of statistical output and any graphs you create.
response rate
# of completed interviews with responding units / # of eligible responding units in the sample
examples of paired sample t-test
- before and after measures - applying same measure to different objects Good for when comparing 2 means when the scores for both variables are provided by the same sample
examples of independent sample t-test for means
- satisfaction ratings: gen Z vs Millennials - age in years: customers vs non-customers - two groups of customers
uses of frequency analysis
- univariate categorical analysis - identify blunders and cases with excessive nonresponse - identify outliers - identify the median
A correct crosstab analysis has...
1 -- cleaned up cross tab table 2 -- test for statistical significance 3 -- measure of strength of association 4 -- interpret the results
3 methods for diagnosing nonresponse error
1) contact a sample of nonrespondents 2) compare respondent demographics against known demographics of population 3) conduct an analysis of late responders vs early responders
Primary tasks in the editing process
1) convert all responses to consistent units 2) assess degree of nonresponse 3) check consistency across responses 4) look for evidence that the respondent wasn't really thinking about his or her answers 5) verify that branching questions were followed correctly
Developing the Sampling Plan 6 Steps
1) determine the target population 2) identify the sampling frame 3) select a sampling procedure 4) determine the sample size 5) select the sample elements 6) collect the data from the designated elements
To determine how big of a sample size you need, what 3 pieces of information do you need?
1) how homogenous (similar) the population is on the characteristic to be estimated 2) how much precision is needed in the estimate 3) how confident we need to be that the true value falls within the precision range established
converting a continuous measure to a categorical measure
1) median split 2) cumulative % breakdowns 3) two box technique
2 types of sample designs
1) nonprobability samples 2) probability samples
10 Steps for Developing Questionnaire
1) specify what information will be sought 2) determine method of administration 3) determine content of individual questions 4) determine form of response to each question 5) determine wording of each question 6) prepare dummy tables 7) determine question sequence 8) determine appearance of questionnaire 9) develop recruiting message 10) reexamine & pretest
Parameter
A characteristic or measure of a POPULATION - P & P
Statistics
A characteristic or measure of a SAMPLE - We calculate statistics from sample data to estimate the populations parameters - S & S
Constant-sum method
A comparative-ratings scale in which an individual divides some given sum among two or more attributes on a basis such as importance or favorability.
histogram
A form of bar chart on which the values of the variable are placed along the x-axis and the absolute or relative frequency of the values is shown on the y-axis.
Snowball sample
A judgement sample that relies on the researcher's ability to locate an initial set of respondents with the desired characteristics -- Ie. for a new product only a few may have used it yet -- reach out to those and interview them and ask if they know anyone else who has used the product
Self-report
A method of assessing attitudes in which individuals are asked directly for their beliefs about or feelings toward an object or class of objects.
Filter questions
A question used to determine if a respondent is likely to possess the knowledge being sought. It is also used to determine if an individual qualifies as a member of the defined population.
Nonprobability samples
A sample that relies on personal judgement in the element selection process It is IMPOSSIBLE to assess the degree of sampling error A) Convenience aka accidental B) Judgement C) Quota
Graphic rating scale
A scale in which individuals indicate their ratings of an attribute typically by placing a check at the appropriate point on a line that runs from one extreme of the attribute to the other.
Itemized-ratings scale
A scale on which individuals must indicate their ratings of an attribute or object by selecting the response category that best describes their position on the attribute or object. 1) summated ratings 2) semantic differential
Comparative-ratings scale
A scale requiring subjects to make their ratings as a series of relative judgments or comparisons rather than as independent assessments. 1) Constant-sum method
Summated-ratings scale
A self-report technique for attitude measurement in which respondents indicate their degree of agreement or disagreement with each of a number of statements.
Dummy table
A table (or figure) used to show how the results of an analysis will be presented.
Recall loss
A type of error caused by a respondent's forgetting that an event happened at all.
Telescoping error
Error that results from the fact that most people remember an event as having occurred more recently than it did.
What type of data analysis should we use for this question: Does being referred by a doctor to AFC lead to greater usage of the therapy pool?
Crosstabs why? - 2 Categorical variables: (1) Doctor referral (yes, no) = (2) Pool usage (yes, no) doctor ref. = independent/causal variable pool usage = dependent/outcome variable
Sampling error
Difference between results for the sample and what would be true for the population. Isn't usually the biggest problem Alleviate: - increase sample size
Random error
Error in measurement due to temporary aspects of the person or measurement situation that affects the measurement in irregular ways
Systematic error
Error in measurement that is also known as constant error since it affects the measurement in a constant way.
Response order bias
Error that occurs when the response to a question is influenced by the order in which the alternatives are presented.
Classification Information
Information used to classify respondents, typically for demographic breakdowns
Any where options are: not at all important, somewhat important, very important, etc.
Interval
What is your degree of satisfaction with UW-Madison gym offerings? Not at all satisfied Slightly satisfied Moderately satisfied Very satisfied
Interval
Where there is a list of items and you circle the number from unfavorable -> favorable for each item on the list
Interval
Ordinal Scale
Measurement in which numbers are assigned to data on the basis of some order (for example, more than, greater than) of the objects.
Nominal Scale
Measurement in which numbers are assigned to objects or classes of objects solely for the purpose of identification.
Interval Scale
Measurement in which the assigned numbers legitimately allow the comparison of the size of the differences among and between members but we cannot compare the absolute magnitude and there is no zero point.
Ratio Scale
Measurement that has a natural, or absolute, zero and therefore allows the comparison of absolute magnitudes of the numbers.
recording error
Mistakes made by humans or machines in the process of recording respondents' communication- or observation-based data.
Open ended questions
Nominal
What region from the US are you
Nominal
Which of the drinks on the following list do you like? Check all that apply
Nominal
How many fitness platforms do you currently subscribe to? 0 1 2 3 4 5+
Ordinal
How much are you willing to spend on a subscription-based fitness platform? $0-$9.99 $10-$19.99
Ordinal
How would you describe your family income level? Lower class Lower-middle class Upper-middle class Upper class
Ordinal
Rank the drinks according to your degree of liking. 1 = most preferred and 6 = least preferred
Ordinal
On average, how many episodes of TV show do you watch in one sitting? 1-2 3-4 5-6 7+
Ordinal NOT Interval
Please select your age 18 or under 19 20
Ordinal NOT Nominal
Measurement
Rules for assigning numbers to objects in such a way as to represent quantities of attributes.
What type of data analysis would you use for this question: Is there a correlation between age and fees paid?
Pearson product moment correl coeff Why? - age = continuous - fees paid = continuous
How often do you do your own laundry? ______ time(s) per month
Ratio
In the past 7 days how many 12oz servings of each soda have you consumed?
Ratio
Semantic-differential scale
Self-report technique for attitude measurement in which the subjects are asked to check which cell between a set of bipolar adjectives or phrases best describes their feelings toward the object.
Reliability
The ability of a measure to obtain similar scores for the same object, trait, or construct across time, across different evaluators, or across the items forming the measure.
Target Information
The basic information that addresses the subject of the study
Validity
The extent to which differences in scores on a measuring instrument reflect true differences among individuals, groups, or situations in the characteristic that it seeks to measure or true differences in the same individual, group, or situation from one occasion to another, rather than systematic or random errors.
*Target Population vs Target Frame
The target population is all cases that meet designated specifications for membership in the group. The sampling frame is the list of population elements from which a sample will be drawn; the list could consist of geographic areas, institutions, individuals, or other units.
Question Order Bias
The tendency for earlier questions on a questionnaire to influence respondents' answers to later questions.
AFC finds data that nationwide the average age of people who visit fitness centers is 40 years old compared to their average age of 68.6. •What test should AFC use to determine if the means are truly different?
This is a continuous measure and an individual variable therefore A one-sample t-test should be used.T he average age of people who attend fitness centers is 40 years old compared to AFC's average age of 68.6. AFC could then use a one-sample t-test to determine if the means are truly different.
Pretest
Use of a questionnaire (or observation form) on a trial basis in a small pilot study to determine how well the questionnaire (or observation form) works.
Split-ballot technique
Used for combatting response bias in which researchers use multiple versions of a survey, with different wordings of an item or different orders of response options.
cross tabs step 2: testing for statistical significance with the pearson chi-square (x^2) test of independence
a commonly used statistic for testing the null hypothesis that categorical variables are independent of one another
codebook
a document that contains explicit directions about how the data from data collection forms are coded in the data file
cramer's v
a measure of the strength of association numbers range from 0 to 1 & closer to 1 = stronger the relationship
sample stdev
a measure of the variation of responses on a variable
coefficient of multiple determination (R^2)
a measure representing the relative proportion of the total variation in the dependent variable that can be explained or accounted for by the fitted regression equation When there is only 1 predictor variable this is the coefficient of determination
cross tabulation
a multivariate technique used for studying the relationship between 2+ categorical variables considers the joint distribution of sample elements across variables
confidence interval
a projection of the range within which a population parameter will lie at a given level of confidence, based on a statistic obtained from a probabilistic sample only accounts for sampling error
Probability: C) Stratified sample
a sample in which 1) the population is divided into mutually exclusive and exhaustive subgroups and 2) a probabilistic sample of elements is chosen independently from each subset homogenous within, heterogenous between
Probability samples
a sample in which each target population element has a nonzero chance of being included in the sample sampling error CAN be estimated A) simple random B) systematic C) stratified D) cluster (including area)
regression analysis
a statistical technique used to derive an equation representing the influence of a single or multiple independent variable(s) on a continuous dependent/outcome variable
chi square goodness of fit test
a statistical test to determine whether some observed pattern of frequencies corresponds to an expected pattern categorical variables
independent sample t-test for means
a technique commonly used to determine whether two groups differ on some characteristic assessed on a continuous measure Analysis of a continuous measure with a categorical measure as the grouping variable
paired sample t-test
a technique for comparing two means when scores for both variables are provided by the same sample
Census
a type of sampling plan in which data are collected from or about each member of a population
significance
acceptable level of error
Population
all cases that meet designated specifications for membership - Requirements = population ELEMENTS
why use multivariate analysis?
allows researchers a closer look at their data than is possible with univariate analyses
response error
although the individual participates in the study, he or she provides an inaccurate response, consciously or subconsciously Alleviate: - match the background characteristics of interviewer and respondent - avoid ambiguous words and questions - avoid leading questions - avoid unstated alternatives
outlier
an observation so different in magnitude from the rest of the observations that the analyst chooses to treat it as a special case
why use probability sampling?
because you can assess the level of sampling error BUT this is not the only kind of error that can occur
Area sample
cluster sampling where areas serve as the primary sampling units -- using maps, the population is divided
always calculate percentages in the direction of the ______
causal variable
two box technique
converting a INTERVAL LEVEL RATING SCALE into a categorical measure the percentage of respondents choosing one of the top two positions on a rating scale is reported
cumulative % breakdown
converting a continuous measure to a categorical measure categories are formed based on cumulative percentages obtained from frequency analysis
median split
converting a continuous measure to a categorical measure split the continuous measure @ its median value REMEMBER -- on a table you go to the cumulative % and find the one that contains 50%. If there is 40% and 64%, you pick 64 because it contains 50th percentile
frequency analysis
counting the number of cases that fall into each of the possible response categories - Univariate analysis - Categorical
double entry
data entry procedure in which data are entered separately by two people in two data files are compared for discrepancies
Confidence
degree to which one can feel confident that an estimate approximates the true value
Probability: A) Simple random sample
each unit included in the population has a known and equal chance of being selected for the sample
coding FACTUAL open ended questions/items
easier to code because it is a concrete/factual response. ie: the question "what year were you born" is coded as the year ie: the question "how many times have you eaten at Wendys?" you just code the number
coding close ended questions/items
easier, usually yes (1), no (0) -- works like this with check all that appluy
Non-coverage error
error that arises because of failure to include qualified elements of the defined population in the sampling frame Alleviate: - improve sampling frame using other sources - Adjust the results by appropriately weighing the subsample results
item nonresponse
error that arises when a responded agrees to an interview, but refuses or is unable to answer specific questions
office errors
errors that arise when coding, tabulating, or analyzing the data Alleviate:
Probability: B) Systematic sample
every kth element in the population is selected
how to find the median
find the cumulative percentage that includes 50%
null hypothesis
hypothesis that a proposed result is NOT true for the population
alternative hypothesis
hypothesis that proposed result is true for the population
What type of data analysis would you use for this question: Does utilizing the exercise circuit lead to increased number of visits to the center?
independent sample t-test for means why? - exercise circuit = categorical (yes, no) - # of visits (continuous)
continuous measures
interval & ratio
if the p value is _____ we reject the null hypothesis that the values are independent of one another
less than the significance level ie. if p-value = 0.002 p-value < 0.05
coding EXPLORATORY open ended questions/items
much more difficult: Steps to code 1) identify usable responses 2) develop categories for responses 3) sort responses into categories 4) assess degree of agreement btwn coders
categorical measures
nominal & ordinal NO Cats
total sampling elements (TSE)
number of population elements that must be drawn from the pop and included in the initial sample pool in order to end up with the desired sample size
blunders
office errors that occur during editing, coding, or data entry
What type of data analysis would we use for this question: Do the mean attribute importance levels, provided by the same respondents, differ from one another?
paired sample t-test
Nonprobability: A) Convenience aka accidental sample
population elements are included in the sample because they are readily available (we don't know if they are actually representative of the pop) right place @ right tome
p-value
probability of OBTAINING A GIVEN RESULT if the null is true NOT probability that null is true
Data editor
purpose: show you a portion of data values you are working with. It can also be used to redefine the characteristics of variables (change the type, add labels), create new variables, and enter data by hand Gives you 2 views 1) data view 2) variable view
primary sources of nonresponse error
refusals not-at-homes It is better to work hard at generating responses from a smaller sampling pool than to start with a much larger sampling pool and ignore potential nonresponse error The response rate on a project serves as an indicator of the overall quality of a data collection effort
What type of data analysis would you use to answer this question: What are some of the factors that drive revenues at AFC?
regress revenues on 1) member age and the importance of 2) health, 3) social aspects, 4) physical enjoyment, and 5) specific medical concerns
If p < a in a paired sample t-test what do we do?
reject the null hypothesis that the means are the same and tentatively accept that the means are different
Nonprobability: C) Quota sample
sample chosen so that the proportion of sample elements with certain characteristics is about the same proportion of the elements with the characteristics in the target population (most online panels) a "quota" representing these characteristics is established (e.g., 25 Wisconsin residents between the ages of 20 and 29; 25 Illinois residents between the ages of 20 and 29; 35 Minnesota residents between the ages of 30 and 39; etc.) so that when the sample is complete it will mirror the population on the key characteristics
Nonprobability: B) Judgement sample
sample elements are handpicked because they are expected to serve the research purpose
Sample results equation
sample results = truth + (sampling , noncoverage, nonresponse, response, recording, & office errors)
Probability: D) Cluster (area) sample
sampling plan in which (1) the parent population is divided into mutually exclusive and exhaustive subgroups and (2) a random sample of one or more subgroups (clusters) is selected heterogenous within, homogenous between
Sample
selection on a subset of elements from a larger group of objects - The simpler the definition of the target definition, the easier & less costly it will be to find the sample pop
pearson product moment correlation coefficient
statistic that indicates the degree of linear association between two continuous variables range: -1 to 1
descriptive statistics
statistics that describe the distribution of responses on a variable
Sampling error
the difference between the results obtained from a sample and the results that would have been obtained had the info been gathered about every member in the pop - deceased by increasing sample size - can be estimated - usually less troublesome than other kinds of error
Nonresponse error
the failure to obtain data from some elements of the population that were selected for the sample Alleviate: - convince respondent of the importance of their participation - frame the study to enhance interest - keep survey as short as possible - guarantee anonymity - train interviewers well - personalize recruiting message - use an incentive - send follow-up surveys
editing
the inspection and correction of data received from each element of the sample (or census)
Sampling frame
the list of population elements from which the sample will be drawn Ex: customer database, member directories, & lists developed by data compilers perfect sampling frames do not exist
Sampling interval (k)
the number of population elements to count (k) when selecting the sample members in a systematic sample
data aggregation
the process of creating summary data
coding
the process of transforming raw data into symbols (usually numbers)
optical scanning
the use of scanner technology to "read" responses on paper surveys and to store these responses in a data file
hypothesis testing
to tell if a particular result in the sample represents the true situation in the population
one sample t-test
used to compare a sample mean against an external standard
sample mean
x̅ mean value of the responses on a variable
Key Considerations with Response Error
•Does the respondent understand the question? •Does the respondent know the answer to the question? •Is the respondent willing to provide the true answer to the question? •Is the wording of the question or the situation in which it is asked likely to bias the response?
Data view
•Each row represents a unit of observation, sometimes also referred to as a "record" or in SPSS as a "case." •Each column represents a variable. All of the data in a column must be of the same "type," either numeric or string (also called "character").
Variable view
•In the Variable View you can see and edit the information that defines each variable in your data set: each column of the Data View is described by a row of the Variable View.