Analytics final
multivariate testing
enables companies to test whether changing several different variables on their website at the same time leads to a higher conversion rate
digital marketing analytics
enables marketers to monitor, understand, and evaluate the performance of digital marketing initiatives
lift
enables us to evaluate the strength of the association. = confidence/expected confidence
The purpose of building models in the AutoML process is to
extract insights from data
sentiment polarity
feedback that consists of contradictory or different opinions
A supermarket is trying to mimic the "Target Effect" to boost its sales. It creates a special, fast checkout line for shoppers with toddlers and babies. Which of the following products should this checkout line display prominently?
formula and diapers
The behavior analysis measure of ________ refers to the rate of visitors returning to a website within a certain time frame.
frequency engagement
Latent Dirichlet Allocation (LDA)
goal is to maximize the separation between the estimated topics and minimize the variance within each projected topic
Identify a true statement about an unsupervised model.
has no target variable
unsupervised model
has no target variable
association rule
helps define relationships in a transaction dataset using if-then statements
apriori algorithm
identifies combinations of items in datasets that are associated with each other
social network analysis
identifies relationships, influencers, information dissemination patterns, and behaviors among connections in a network
Which of the following statements is most likely to be true of a grocery store transaction?
if SODA then MILK
To explore patterns between two or more products, market basket analysis uses association rules that employ ________.
if then statements
Individuals on social media, who initiate or actively engage others in conversation and are often well-connected to others in the network, are referred to as ________.
influencers
The common adage that people use when referring to ________ data is "garbage in, garbage out."
invalid and unreliable
stemming
is the process of removing prefixes or suffixes of words, thus reducing words to a simple or root form.
The two most common techniques of cluster analysis discussed in the chapter are ________ and ________.
k-means clustering; hierarchical clustering
In the preprocessing step of text analytics, the words "best" and "writing" were reduced to "good" and "write." This is an example of ________.
lemmatization
A lift value of ________ indicates a negative relationship in which two or more products in an itemset are unlikely to be purchased together.
less than 1
Automated machine learning (AutoML)
mainly supervised approach that explores and selects models using different algorithms and compares their predictive performance
The Jaccard's coefficient approach of measuring similarity between observations
makes calculations based on how dissimilar two observations are from each other.
louvain communities
measure non-overlapping communities, or groups of closely connected nodes, in the network
conversion analysis
measures conversion rate and conversion by traffic source
behavior analysis
measures pageviews, frequency of engagement, site speed, bounce rate, click through rate, site content, and site search
Audience analysis
measures quantity of impressions/visitors, user demographics, and geography
betweenness centrality
measures the centrality based on the number of times a node is on the shortest path between other nodes
confidence
measures the conditional probability of the consequent actually occurring given that the antecedent occurs. = support of transactions that includes both A and C / support of transactions that includes A only
distribution of node degrees
measures the degree of relationship (or connectedness) among the nodes
density
measures the extent to which the edges are connected in the network and indicates how fast the information is transmitted
inverse document frequency
measures the frequency of a term/word over all the documents
eigenvector centrality
measures the number of links form a node and the number of connections those nodes have. can range from 0-1.
term frequency
measures the number of times a term/word occurs in the document
closeness centrality
measures the proximity of a node to all other nodes in the network
Jaccards
measures the similarity between 2 observations based on how dissimilar they are from each other
matching
measures the similarity between 2 observations with values that represent the minimum differences between 2 points
The Matching coefficient approach of measuring similarity between observations
measures the similarity between two observations with values that represent the minimum differences between two points.
As soon as Paula types in "The best coffee" in her search engine query box, the words "near me" and "recipe" appear as suggestions. The search engine uses the ________ technique to make such real-time recommendations possible.
n grams
undirected network relationship
no arrow is directed toward a node
supervised model
consists of defined target variable
degree centrality
measures the centrality based on the number of edges that are connected to a node
acquisition analysis
measures traffic sources and campaigns
In market basket analysis, ________ measures the number of transactions that include the items of interest divided by the total number of transactions.
support
bag of words
technique that counts the occurrence of words in a document whil ignoring the order or the grammar of words
treatment
term used to describe the digital marketing intervention being tested Ex: color of certain buttons when navigating through a website
term document matrix
uses rows and columns to separate the text
Which of the following is true of earned digital media?
It is organic, not initiated or paid for by a company.
4 steps of NLP:
Text Acquistion and Aggregation, text preprocessing, text exploration, and text modeling
Which of the following is true of AutoML?
The AutoML platform is typically capable of analytical discovery of relationships actually present in the dataset.
four key steps in AutoML
data preparation, building models, creating ensemble models and recommending models
Which of the following measures of centrality is based on the number of edges that are connected to a node?
degree centrality
All of the four measurement approaches for determining clusters when applying hierarchical clustering can be illustrated using a ________.
dendogram
Which of the following social network measures calculates the extent to which the edges are connected in a network and also indicates how fast information is transmitted?
density
The first step in the k-means clustering algorithm is ________.
determining the initial k clusters
Manhattan
distance between two points is not straight, referred to as a "city block"
Euclidean
distance measured as the true straight line distance between two points
Sean has an Instagram account that he uses to connect with like-minded people. He follows several people and likes sharing moments from his life regularly. Which element of a social network analysis is Sean?
node
text classification
creates categories or groups that are associated with the content
Identify an example of sentiment opposite polarity.
"I enjoyed my coffee, but the barista was rude."
Purchase journey and its stages
(1) previous experiences, (2) pre purchase, (3) purchase, (4) post purchase
divisive clustering
(Top-down approach), all records are initially assigned to a single cluster. 100 observations, all considered as one cluster
agglomerative clustering
(a bottom-up approach), each observation is initially considered to be a separate cluster. If you have 100 observations, you start with 100 separate clusters
A typical silhouette score ranges between ________.
+1 and -1
Which of the following statements is true of the clustering process?
It enables marketers to identify hidden structures in data.
Which age group of customers is most likely to research products online via social networks?
16 to 24
Based on the popularity of social networks, it is estimated that by ________, there will be about 3.43 billion users worldwide.
2023
A person from the age group of ________ is most likely to buy a product if, while researching it on social media, a "buy" button is present.
25-34
Approximately ________ of all social media users say social media referrals influence their purchasing decisions.
71%
expected confidence
= # of transactions that includes C / total # of transactions
Which of the following is true of the supervised model of analytics?
A supervised model is one that consists of a defined target variable.
Which of the following is true of the agglomerative clustering approach?
At the end of the process, all observations are included in a single cluster.
Identify a true statement about AutoML.
AutoML facilitates accurate decision making for users with limited coding and modeling experience.
In the k-means clustering algorithm, what happens after observations are randomly assigned to a cluster?
Cluster centroids are determined.
earned digital media
Communication or exposure not initiated or posted by the company (customer reviews, social media shares, media coverage, and organic search placement)
In market basket analysis, the measure of confidence is represented as ________.
Confidence = Support of transactions that includes both antecedent and consequent/Support of transactions that includes antecedent only.
topic modeling
Enables the analyst to discover hidden thematic structures in the text
Which of the following is true of structured data?
It can be stored in a database or spreadsheet format.
The market basket analysis measure of lift is represented as ________.
Lift = Confidence/Expected confidence.
The ________ approach of measuring similarity between observations is also referred to as the "City Block" distance measure.
Manhattan distance
Support
Measures the frequency of the specific association rule. = # of transactions that includes both A and C / total transactions
In market basket analysis, the measure of support is represented as ________.
Support = Number of transactions that includes both antecedent and consequent/Total number of transactions
Which of the following is a difference between structured and unstructured data?
Structured data exists in predefined formats, whereas unstructured data needs to be converted before usage.
Which of the following is true of the density of a network?
The higher the density, the faster information is transmitted in a network.
tokenization
The process of taking the entire text data corpus and separating it into smaller, more manageable sections. smaller sections are knowns as tokens
Identify a true statement regarding the divisive clustering approach of hierarchical clustering.
The process starts with a single cluster of 100 and ends up with 100 different clusters.
Which of the following questions would be asked during multivariate testing on a website?
Which combination of text, images, and colors in a webpage leads to the highest conversion?
Identify an example of paid digital media.
a Facebook advertisement about the health benefits of green tea by Starbucks
Natural Language Processing (NLP)
a branch of AI used to identify patterns by reading and understanding meaning from human language. Companies can analyze and organize internal data sources and external data sources.
sentiment analysis
a measure of emotions, attitudes, and beliefs. Goal is to identify the customers thoughts as they relate to products, features, services, etc
singleton
a node that is unconnected to all others in the network. Ex: a linked in user that doesnt add anyone
stop words removal
a process that deletes words that are not important such as "the" and "and"
N-grams
a simple technique that captures the set of co-occurring or continuous sequences of n-items from a large set of text
In the k-means clustering analysis, the silhouette score is calculated ________.
after the cluster algorithm has assigned each observation to a cluster
A/B testing
also known as split testing, enables marketers to experiment with different digital options to identify which ones are likely to be the most effective
nodes
an entity (people or product) that is also known as a vertex
egocentric network
an individual network, EX: a facebook profile
silhouette score
another way to identify the optimal number of clusters for the data, calculated after the algorithm has assigned each observation to a cluster
wards method
applies a measure of the sum of squares within the clusters summed over all variables
multichannel attribtuion
assesses how, when, and where these various touchpoints influence customers
In the ________ method of linking individual observations both within and between clusters, similarity is defined by the group average of observations from one cluster to all observations from another cluster.
average linkage
The eigenvector centrality measure
counts the number of links from a node and also the number of connections those nodes have.
Which of the following measures of centrality is based on the number of times a node is on the shortest path between other nodes?
betweenness centrality
________ network structures are characterized by independent participants that might share information about a popular topic or brand but do not interact much with each other.
brand cluster
Based on the concept of market basket analysis, a customer who purchases bread should have immediate and easy access to ________.
butter
Which of the following measures of centrality shows the proximity of a node to all other nodes in a network?
closeness centrality
ensemble model
combines the most favorable elements into a single model. Reduces issues such as noise, bias, and inconsistent or skewed variance
The ________ network structure represents groups that are large and connected, but also have quite a few independent participants.
community cluster
In the ________ method of linking individual observations both within and between clusters, similarity is defined by the maximum distance between observations in two different clusters.
complete linkage
In market basket analysis, ________ measures the conditional probability of the consequent actually occurring given that the antecedent occurs.
confidence
In market basket analysis, the measure of ________ indicates the percentage of times the association rule is correct.
confidence
frequency bar chart
consists of the x-axis representing terms, and the y-axis representing the frequency of a particular term occurring
In the context of network structures, ________ groups are separate and represent different conversations with little connection between them.
polarized crowd
Which of the four key steps in the AutoML process involves handling missing data, outliers, variable selection, data standardization, and data transformation to maintain a common format?
preparing data
Identify the correct sequence of the four key steps in the AutoML process.
preparing data, building models, creating ensemble models, recommending models
hierarchical clustering
produces solutions in which the data is grouped into a ranking of clusters
word clouds
provides a high-level understanding of frequency used terms
The step of creating ensemble models in the AutoML process allows us to
reduce the generalization error of the prediction.
lemmatization
reduces the word to its lemma form while considering the context of the word, such as the part of speech and meaning
The boosting process in the creating ensemble models step in the AutoML process serves the purpose of
reducing error in the model
boosting
reducing error in the model
Sasha enters an electronics website after clicking on a link on another website. In Google Analytics, this type of a channel is termed a(n) ________.
referral
amazon effect
refers to the often-disruptive influence e-commerce and digital marketplaces have had on traditional brick-and-mortar retailers
In cluster analysis, a market is segmented using ________.
shared traits
bagging
short for bootstrap aggregating. 2 steps
average linkage
similarity is defined by the group average of observations from one cluster to all observations from another cluster
complete linkage
similarity is defined by the maximum distance between observations in 2 different clusters
single linkage
similarity is defined by the shortest distance from an object in a cluster to an object from another cluster
In which of the following methods of linking individual observations both within and between clusters is similarity defined as the shortest distance from an object in a cluster to an object from another cluster?
single linkage
Leroy created an account on a social media website and promptly forgot all about it. He did not add anyone to his network. In the context of social network analysis, Leroy is a(n) ________.
singleton
Which of the following behavior analysis measures involves the rate at which users are able to see and interact with the website content?
site speed
In market basket analysis, ________ measures the frequency of the specific association rule divided by the total number of transactions.
support
In which of the four steps of text analytics is a corpus of text data defined?
text acquisition and aggregation
Which of the following is the first step of text analytics?
text acquisition and aggregation
In hierarchical clustering, approaches such as ________ are most often used when numerical variables are analyzed.
the Euclidean distance or the Manhattan distance
paid digital media
the company pays for exposure (display advertising, influencer promotions, and social media advertisements)
In the Manhattan distance approach of measuring similarity between observations
the distance between two points is a path with right turns as if one is walking a grid in a city.
In which of the following functions is the distance measured equivalent to the true straight line distance between two points?
the euclidean distance
edges
the links and relationships between nodes. Can explain friendship or family ties
owned digital media
the media is managed by the company (websites, blogs, and social media accounts)
Link Prediction
the objective is to predict new links between unconnected nodes
edge weight
the strength of the relationship between 2 nodes, the thicker the line the higher the exchange between the 2.
differential market basket analysis
the use of market basket analysis techniques across stores, locations, seasons, days of the week, etc
digital marketing
the use of marking touchpoints that are executed electronically through a digital channel to communicate and interact with current and potential customers and partners.
On Twitter, the ________ network structure indicates the topics that are all highly interconnected by similar conversations.
tight crowd
A text analytics computer program separated the phrase "I like cake" into three sections: "I," "like," and "cake." This is an example of ________ within the text preprocessing step.
tokenization
T/F: AB testing enables a company to continuously test and examine how visitors respond to change vs. Another. Measurements using Ab testing are useful in understanding which variations perform the best, and ultimately determining which had the greatest influence on a particular performance metric
true
T/F: In K means, it is best to begin with data that has been standardized using z-scores or min-max
true
T/F: K means can only be applied to numerical data
true
T/F: Unstructured data represents more than 75% of the emerging data
true
T/F: hierarchical clustering can be executed with a mixed set of data that can include numerical and categorical values
true
directed network relationship
typically, depicted as using a line with a directional arrow from one node to another
The term "organic channel" used by Google Analytics means that a user has landed on a webpage through _________.
unpaid search results on search engines such Google, Yahoo, Bing, or Baidu.
Market Basket Analysis
uses purchase transaction data to identify associations between products or combinations of products and services that occur together frequently. Enables marketers to identify what is being purchased together.
collaborative filtering
uses the idea of identifying relevant items for a specific user from a large set of items by taking into consideration the preferences of many similar users
K-means clustering
uses the mean value for each cluster and minimizes the distance to individual observations. Can range from 2-12 clusters.
Social network analysis results in ________ that trace connections in the population and ultimately represent the structure and size of the networks.
visual maps
graph
visualization that enables viewers to understand the relationship between nodes and the importance of nodes