Ligma balls
Correlation
A and b are related to each other
Inference
A conclusion reached on the basis of evidence and reasoning/facts
When developing a list of variables for a questionnaire, you should include
1. Variables of primary interest 2. Control and descriptive variables
Content Analysis Process
1. Select a topic 2. Identify scoring units 3. Create a sampling plan/sample 4. Create operational definitions 5. Assess inter-coder reliability 6. Code entire sample
Process of conducting a survey
1. Specify research problem 2. Select survey design 3. Select sampling strategy 4. Generate questionnaire 5. Generate data 6. Analyze data
quota sampling
A nonprobability sampling technique in which researchers divide the population into groups and then arbitrarily choose participants from each group
research question
A question that can be answered by an experiment or series of experiments
survey
A study, generally in the form of an interview or questionnaire, that provides researchers with information about how people think and act.
Machine Learning
A subset of AI- the extraction of knowledge from data based on algorithms created from training data.
survey vs. questionnaire
A survey is the method of data collection whereas a questionnaire is the instrument containing the questions
Hypothesis
A testable prediction, often implied by a theory
ratio variable
A variable that meets the criteria for interval variables but also has a meaningful zero point. Ex. Distance: either zero inches apart or 1200 inches apart
social listening
A way for companies to aggregate and analyze online posts about a specific key term.
Validity
Ability or potential of data collection tool to capture and measure the construct or the phenomenon that we are interested in measuring.
What do surveys do?
Allow for data collection from a large number of people- allow for assessment of self reported traits- when properly deployed they are a reliable means of information gathering
nominal
Also known as categorical. Numbers serve as tags or labels. Higher/lower numbers dont mean anything. Ex. Numbers on sports jerseys or male=1 female=0
Causation vs. association
Causation means a causes b Association states that a and b are correlated
Time order
Changes in A results in changes to B
Common A/B metrics include:
Click through rate Time on page Bounce rate
Best way to assess differences between groups
Compare the mean scores for each group
Manifest content
Content that is observable (not inferred or assumed)
Advantages of cross sectional research
Convenient, inexpensive, quick
Ordinal
Data can be ordered but the distance between values is not fixed
trend studies
Data collected from different people (all drawn from the same population) at multiple collection points
Panel designs
Data is collected from the same people at multiple collection points
Longitudinal
Data is collected multiple times
Ratio
Data is ordered, distance between values is fixed, and there is a meaningful zero point
interval
Data is ordered, distance between values is fixed, but there is not a meaningful zero point
Self-Report Surveys
Data provided solely by the respondent without interference from the researcher
experiment
Demonstrates truth of something - examines validity of a hypothesis or theory- attempts to discover new info
Questions appropriate for A/B testing
Does changing location of a design element increase website clicks? Does changing our website font increase time on page? Does changing the color of a design element increase clicks? Does adding an interactive element decrease bounce rate?
General rule of thumb for subset
Equals 10% of overall sample; coders must agree on 70% or more of the coded cases to claim intercoder reliability
systematic measurement error
Error in measurement in which the tool does not accurately measure the concept and is perceived incorrectly by most or all participants. Ex. Confusing question that everyone misreads
Disadvantages of longitudinal research
Expensive, time consuming, data can be difficult to interpret
quasi-experiment
Experiment that does not use random assignment
Research has a variety of purposes
Exploratory, descriptive, explanatory
Disadvantages of content analysis research
Finding a representative sample can be difficult Obtaining reliability in coding can be difficult Defining terms operationally can be difficult
Explanatory Studies
Focus on explaining the reasons behind a phenomenon, relationship, or event
Histograms
Frequencies shown using a bar chart-type plot are called this
Examples of important descriptive statistics
Frequency distributions, measures of central tendency(mean, median, mode), measures of dispersion (range, standard deviation)
Population
Group of people in the focus of the study
Most straightforward experiment
Has a control group and a treatment group; sometimes called a RCT
Advantages of longitudinal research
Helps address some types of error found inherent in cross sectional research, flexible, can help researchers identify time-based trends/ changes
Ways of developing questionnaires
In person, telephonically, manual, computer-assisted, online
Descriptive statistics
Information that characterizes it summarizes the whole set of data
Why do we sample?
It is TOO expensive and time consuming to survey everyone BUT, we want to estimate what is true of the entire population
Applications of A/B testing
Marketing/marketing communications Web design User experience Human factors
dependent variable
Measured by researchers
posttest
Measurements taken after delivery of experimental (manipulated) stimuli
pretest
Measurements taken before delivery of the experimental (manipulated) stimuli
Metrics
Measurements that evaluate results to determine whether a project is meeting its goals
Measurement
Most straightforward means of coding content involves assessing the degree to which something is present or absent
non-probability sample
Not all elements of a population have an opportunity to be included: don't allow us to make inferences about a population
nominal
Numeric values serve as labels
Content analyses are:
Objective Systematic Focused on manifest content
Personalization
Occurs when a company knows enough about a customer's likes and dislikes that it can fashion offers more likely to appeal to that person: ie Netflix curating watch lists
weaknesses of experimental research
Often the study context is artificial Cross sectional designs don't speak to long term effects In some scenarios, experiments can raise ethical questions
When conducting A/B tests there should be...
One thing different across versions A and B
strengths of experimental research
Only one method that can show causality Can be replicated
semantic differential measures
Participants indicate feelings and beliefs based on a bipolar format
likert-type measurement
Participants indicate measure of agreement to a prompt
Single selection measures
Participants make a single selection from a list of options
Multiple selection measures
Participants make more than one selection from a list of options
Ranking measures
Participants rank body of elements by preference
experimental group
Participate and get experimented on: research pill
control group
Participate but are not given anything: sugar pill
stratified random sampling
Population divided into subgroups and random samples taken from each subgroup
Disadvantages of cross-sectional research
Prone to various types of error, no going back
What do all true experiments require
Random assignment
correlation coefficient
Range is from -1 to 1; -1 represents perfect negative association between variables, +1 represents perfect positive association between 2 variables, 0 indicates no association
Standard deviation
a measure of variability that describes an average distance of every score from the mean
These types of measurement approaches (ie ordinal-level measurement)
Rarely result in intercoder liability
Rating measures
Rate on numeric scale thoughts or beliefs about a prompt
Non-spuriousness
Relationship between a and b must not be explained by a third variable
Spuriousness
Relationship between variables seems real but is explained by presence of another variable
Values greater than +/- 0.70
Represent a strong association between the variables
Values from +/- 0.2 to +/- 0.4
Represent a weak association between the variables
Values from 0 to +/- 0.2
Represent general lack of association between variables
Quantitative research
Research based on systematic calculation of data
Qualitative Research
Research that seeks to gain insight and depth on a topic
purposive sampling
Researchers purposefully select from a group of people of theoretical interest
Best Survey Practices
Response categories go less to more- uneven number of response categories- 7 most commonly used
convenience sampling
Sample is drawn from those that are easily available to collect data from
Disproportionate random sampling
Similar to proportional random sampling besides the fact that sample proportions are not equivalent to population proportions
random measurement error
Small measurement errors that are non-systematic; do not threaten overall validity of our data ex. Small number of survey participants misread a question
Inferential
Statistics that allow us to generalize from the data collected to the general populations they were taken from
Constitutive definitions
The definitions you find in dictionaries- define words in terms of other words and concepts- general and abstract
tabular format
The presentation of text and numbers in tables - essentially organized in labeled columns and numbered rows.
A/B Testing
This is the process of comparing two variations of a single variable to determine which performs best in order to help improve marketing efforts
inferential statistics
Trying to reach conclusions that extend beyond the immediate data alone. Ex. What the population might think
snowball sampling
Type of non-probability sampling; generate convenience sample from respondents and ask them to recommend others to take the survey
Advantages of Content Analysis
Unobtrusive Relatively inexpensive Deals with current events and topics of present day interest Uses material that is relatively easy to obtain and work with Yields data that can be quantified
independent variable
Varied by researchers
Why is A/B testing essentially a RCT
Version A is control stimuli Version B is manipulated stimuli
3 V's of big data
Volume: big data is large Velocity: big data occurs at an unprecedented speed Variety: big data comes in multiple formats/ takes on multiple forms
measurement error
When the data we collect does not represent reality
When to use hypotheses
When we are testing the relationship between two or more variables and when we have an educated/ informed guess as to what is likely to occur
When to use a research question
When we're exploring a new area and aren't clear about the relationships between variables in our study
Measurable scoring units
Words Phrases Minutes Images Entire documents (newspaper articles, TV commercials, TV show episodes, social media posts, etc.)
operational definitions
a carefully worded statement of the exact procedures used in a research study- important part of any quantitative research project
probability sample
a sample in which every element in the population has a known statistical likelihood of being selected: not necessarily equal, but non-zero chance. Allow us to make inferences about a population
Sample
a small part of something intended as representative of the whole
content analysis
a systematic analysis of the content rather than the structure of a communication, such as a written work, speech, or film
factorial design
an experiment or quasi-experiment that includes more than one independent variable
Reliability
consistency of measurement: reliable if you can yield the same results even if used with different subjects
cross-sectional
data is collected at one point in time
interval variable
data measured on a scale along the whole of which intervals are equally spaced apart. Ex. 81 Fahrenheit is exactly 1 degree greater than 80
Common metrics include:
engagement, click through rate, conversions, followers/ fans, leads, reach, loyalty
simple random sampling
every member of the population has an equal probability of being selected for the sample: everyone selected at random
intercoder reliability
in content analysis, the degree of agreement between or among independent coders
Four levels of measurement
nominal, ordinal, interval, ratio
Solomon four-group design
pretest-posttest design with two sets of nonequivalent groups, one set that takes the pretest and posttest and one set that takes only the posttest
Values ranging from +/- 0.4 to +/- 0.7
represent a moderate association between variables
Big Data
the huge and complex data sets generated by today's sophisticated information generation, collection, storage, and analysis technologies
ordinal variable
the term "ordinal" can be applied to a variable whose categorical values possess some kind of order. Ex. 5 excellent 4 good 3 average etc.