Analytics final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

multivariate testing

enables companies to test whether changing several different variables on their website at the same time leads to a higher conversion rate

digital marketing analytics

enables marketers to monitor, understand, and evaluate the performance of digital marketing initiatives

lift

enables us to evaluate the strength of the association. = confidence/expected confidence

The purpose of building models in the AutoML process is to

extract insights from data

sentiment polarity

feedback that consists of contradictory or different opinions

A supermarket is trying to mimic the "Target Effect" to boost its sales. It creates a special, fast checkout line for shoppers with toddlers and babies. Which of the following products should this checkout line display prominently?

formula and diapers

The behavior analysis measure of ________ refers to the rate of visitors returning to a website within a certain time frame.

frequency engagement

Latent Dirichlet Allocation (LDA)

goal is to maximize the separation between the estimated topics and minimize the variance within each projected topic

Identify a true statement about an unsupervised model.

has no target variable

unsupervised model

has no target variable

association rule

helps define relationships in a transaction dataset using if-then statements

apriori algorithm

identifies combinations of items in datasets that are associated with each other

social network analysis

identifies relationships, influencers, information dissemination patterns, and behaviors among connections in a network

Which of the following statements is most likely to be true of a grocery store transaction?

if SODA then MILK

To explore patterns between two or more products, market basket analysis uses association rules that employ ________.

if then statements

Individuals on social media, who initiate or actively engage others in conversation and are often well-connected to others in the network, are referred to as ________.

influencers

The common adage that people use when referring to ________ data is "garbage in, garbage out."

invalid and unreliable

stemming

is the process of removing prefixes or suffixes of words, thus reducing words to a simple or root form.

The two most common techniques of cluster analysis discussed in the chapter are ________ and ________.

k-means clustering; hierarchical clustering

In the preprocessing step of text analytics, the words "best" and "writing" were reduced to "good" and "write." This is an example of ________.

lemmatization

A lift value of ________ indicates a negative relationship in which two or more products in an itemset are unlikely to be purchased together.

less than 1

Automated machine learning (AutoML)

mainly supervised approach that explores and selects models using different algorithms and compares their predictive performance

The Jaccard's coefficient approach of measuring similarity between observations

makes calculations based on how dissimilar two observations are from each other.

louvain communities

measure non-overlapping communities, or groups of closely connected nodes, in the network

conversion analysis

measures conversion rate and conversion by traffic source

behavior analysis

measures pageviews, frequency of engagement, site speed, bounce rate, click through rate, site content, and site search

Audience analysis

measures quantity of impressions/visitors, user demographics, and geography

betweenness centrality

measures the centrality based on the number of times a node is on the shortest path between other nodes

confidence

measures the conditional probability of the consequent actually occurring given that the antecedent occurs. = support of transactions that includes both A and C / support of transactions that includes A only

distribution of node degrees

measures the degree of relationship (or connectedness) among the nodes

density

measures the extent to which the edges are connected in the network and indicates how fast the information is transmitted

inverse document frequency

measures the frequency of a term/word over all the documents

eigenvector centrality

measures the number of links form a node and the number of connections those nodes have. can range from 0-1.

term frequency

measures the number of times a term/word occurs in the document

closeness centrality

measures the proximity of a node to all other nodes in the network

Jaccards

measures the similarity between 2 observations based on how dissimilar they are from each other

matching

measures the similarity between 2 observations with values that represent the minimum differences between 2 points

The Matching coefficient approach of measuring similarity between observations

measures the similarity between two observations with values that represent the minimum differences between two points.

As soon as Paula types in "The best coffee" in her search engine query box, the words "near me" and "recipe" appear as suggestions. The search engine uses the ________ technique to make such real-time recommendations possible.

n grams

undirected network relationship

no arrow is directed toward a node

supervised model

consists of defined target variable

degree centrality

measures the centrality based on the number of edges that are connected to a node

acquisition analysis

measures traffic sources and campaigns

In market basket analysis, ________ measures the number of transactions that include the items of interest divided by the total number of transactions.

support

bag of words

technique that counts the occurrence of words in a document whil ignoring the order or the grammar of words

treatment

term used to describe the digital marketing intervention being tested Ex: color of certain buttons when navigating through a website

term document matrix

uses rows and columns to separate the text

Which of the following is true of earned digital media?

It is organic, not initiated or paid for by a company.

4 steps of NLP:

Text Acquistion and Aggregation, text preprocessing, text exploration, and text modeling

Which of the following is true of AutoML?

The AutoML platform is typically capable of analytical discovery of relationships actually present in the dataset.

four key steps in AutoML

data preparation, building models, creating ensemble models and recommending models

Which of the following measures of centrality is based on the number of edges that are connected to a node?

degree centrality

All of the four measurement approaches for determining clusters when applying hierarchical clustering can be illustrated using a ________.

dendogram

Which of the following social network measures calculates the extent to which the edges are connected in a network and also indicates how fast information is transmitted?

density

The first step in the k-means clustering algorithm is ________.

determining the initial k clusters

Manhattan

distance between two points is not straight, referred to as a "city block"

Euclidean

distance measured as the true straight line distance between two points

Sean has an Instagram account that he uses to connect with like-minded people. He follows several people and likes sharing moments from his life regularly. Which element of a social network analysis is Sean?

node

text classification

creates categories or groups that are associated with the content

Identify an example of sentiment opposite polarity.

"I enjoyed my coffee, but the barista was rude."

Purchase journey and its stages

(1) previous experiences, (2) pre purchase, (3) purchase, (4) post purchase

divisive clustering

(Top-down approach), all records are initially assigned to a single cluster. 100 observations, all considered as one cluster

agglomerative clustering

(a bottom-up approach), each observation is initially considered to be a separate cluster. If you have 100 observations, you start with 100 separate clusters

A typical silhouette score ranges between ________.

+1 and -1

Which of the following statements is true of the clustering process?

It enables marketers to identify hidden structures in data.

Which age group of customers is most likely to research products online via social networks?

16 to 24

Based on the popularity of social networks, it is estimated that by ________, there will be about 3.43 billion users worldwide.

2023

A person from the age group of ________ is most likely to buy a product if, while researching it on social media, a "buy" button is present.

25-34

Approximately ________ of all social media users say social media referrals influence their purchasing decisions.

71%

expected confidence

= # of transactions that includes C / total # of transactions

Which of the following is true of the supervised model of analytics?

A supervised model is one that consists of a defined target variable.

Which of the following is true of the agglomerative clustering approach?

At the end of the process, all observations are included in a single cluster.

Identify a true statement about AutoML.

AutoML facilitates accurate decision making for users with limited coding and modeling experience.

In the k-means clustering algorithm, what happens after observations are randomly assigned to a cluster?

Cluster centroids are determined.

earned digital media

Communication or exposure not initiated or posted by the company (customer reviews, social media shares, media coverage, and organic search placement)

In market basket analysis, the measure of confidence is represented as ________.

Confidence = Support of transactions that includes both antecedent and consequent/Support of transactions that includes antecedent only.

topic modeling

Enables the analyst to discover hidden thematic structures in the text

Which of the following is true of structured data?

It can be stored in a database or spreadsheet format.

The market basket analysis measure of lift is represented as ________.

Lift = Confidence/Expected confidence.

The ________ approach of measuring similarity between observations is also referred to as the "City Block" distance measure.

Manhattan distance

Support

Measures the frequency of the specific association rule. = # of transactions that includes both A and C / total transactions

In market basket analysis, the measure of support is represented as ________.

Support = Number of transactions that includes both antecedent and consequent/Total number of transactions

Which of the following is a difference between structured and unstructured data?

Structured data exists in predefined formats, whereas unstructured data needs to be converted before usage.

Which of the following is true of the density of a network?

The higher the density, the faster information is transmitted in a network.

tokenization

The process of taking the entire text data corpus and separating it into smaller, more manageable sections. smaller sections are knowns as tokens

Identify a true statement regarding the divisive clustering approach of hierarchical clustering.

The process starts with a single cluster of 100 and ends up with 100 different clusters.

Which of the following questions would be asked during multivariate testing on a website?

Which combination of text, images, and colors in a webpage leads to the highest conversion?

Identify an example of paid digital media.

a Facebook advertisement about the health benefits of green tea by Starbucks

Natural Language Processing (NLP)

a branch of AI used to identify patterns by reading and understanding meaning from human language. Companies can analyze and organize internal data sources and external data sources.

sentiment analysis

a measure of emotions, attitudes, and beliefs. Goal is to identify the customers thoughts as they relate to products, features, services, etc

singleton

a node that is unconnected to all others in the network. Ex: a linked in user that doesnt add anyone

stop words removal

a process that deletes words that are not important such as "the" and "and"

N-grams

a simple technique that captures the set of co-occurring or continuous sequences of n-items from a large set of text

In the k-means clustering analysis, the silhouette score is calculated ________.

after the cluster algorithm has assigned each observation to a cluster

A/B testing

also known as split testing, enables marketers to experiment with different digital options to identify which ones are likely to be the most effective

nodes

an entity (people or product) that is also known as a vertex

egocentric network

an individual network, EX: a facebook profile

silhouette score

another way to identify the optimal number of clusters for the data, calculated after the algorithm has assigned each observation to a cluster

wards method

applies a measure of the sum of squares within the clusters summed over all variables

multichannel attribtuion

assesses how, when, and where these various touchpoints influence customers

In the ________ method of linking individual observations both within and between clusters, similarity is defined by the group average of observations from one cluster to all observations from another cluster.

average linkage

The eigenvector centrality measure

counts the number of links from a node and also the number of connections those nodes have.

Which of the following measures of centrality is based on the number of times a node is on the shortest path between other nodes?

betweenness centrality

________ network structures are characterized by independent participants that might share information about a popular topic or brand but do not interact much with each other.

brand cluster

Based on the concept of market basket analysis, a customer who purchases bread should have immediate and easy access to ________.

butter

Which of the following measures of centrality shows the proximity of a node to all other nodes in a network?

closeness centrality

ensemble model

combines the most favorable elements into a single model. Reduces issues such as noise, bias, and inconsistent or skewed variance

The ________ network structure represents groups that are large and connected, but also have quite a few independent participants.

community cluster

In the ________ method of linking individual observations both within and between clusters, similarity is defined by the maximum distance between observations in two different clusters.

complete linkage

In market basket analysis, ________ measures the conditional probability of the consequent actually occurring given that the antecedent occurs.

confidence

In market basket analysis, the measure of ________ indicates the percentage of times the association rule is correct.

confidence

frequency bar chart

consists of the x-axis representing terms, and the y-axis representing the frequency of a particular term occurring

In the context of network structures, ________ groups are separate and represent different conversations with little connection between them.

polarized crowd

Which of the four key steps in the AutoML process involves handling missing data, outliers, variable selection, data standardization, and data transformation to maintain a common format?

preparing data

Identify the correct sequence of the four key steps in the AutoML process.

preparing data, building models, creating ensemble models, recommending models

hierarchical clustering

produces solutions in which the data is grouped into a ranking of clusters

word clouds

provides a high-level understanding of frequency used terms

The step of creating ensemble models in the AutoML process allows us to

reduce the generalization error of the prediction.

lemmatization

reduces the word to its lemma form while considering the context of the word, such as the part of speech and meaning

The boosting process in the creating ensemble models step in the AutoML process serves the purpose of

reducing error in the model

boosting

reducing error in the model

Sasha enters an electronics website after clicking on a link on another website. In Google Analytics, this type of a channel is termed a(n) ________.

referral

amazon effect

refers to the often-disruptive influence e-commerce and digital marketplaces have had on traditional brick-and-mortar retailers

In cluster analysis, a market is segmented using ________.

shared traits

bagging

short for bootstrap aggregating. 2 steps

average linkage

similarity is defined by the group average of observations from one cluster to all observations from another cluster

complete linkage

similarity is defined by the maximum distance between observations in 2 different clusters

single linkage

similarity is defined by the shortest distance from an object in a cluster to an object from another cluster

In which of the following methods of linking individual observations both within and between clusters is similarity defined as the shortest distance from an object in a cluster to an object from another cluster?

single linkage

Leroy created an account on a social media website and promptly forgot all about it. He did not add anyone to his network. In the context of social network analysis, Leroy is a(n) ________.

singleton

Which of the following behavior analysis measures involves the rate at which users are able to see and interact with the website content?

site speed

In market basket analysis, ________ measures the frequency of the specific association rule divided by the total number of transactions.

support

In which of the four steps of text analytics is a corpus of text data defined?

text acquisition and aggregation

Which of the following is the first step of text analytics?

text acquisition and aggregation

In hierarchical clustering, approaches such as ________ are most often used when numerical variables are analyzed.

the Euclidean distance or the Manhattan distance

paid digital media

the company pays for exposure (display advertising, influencer promotions, and social media advertisements)

In the Manhattan distance approach of measuring similarity between observations

the distance between two points is a path with right turns as if one is walking a grid in a city.

In which of the following functions is the distance measured equivalent to the true straight line distance between two points?

the euclidean distance

edges

the links and relationships between nodes. Can explain friendship or family ties

owned digital media

the media is managed by the company (websites, blogs, and social media accounts)

Link Prediction

the objective is to predict new links between unconnected nodes

edge weight

the strength of the relationship between 2 nodes, the thicker the line the higher the exchange between the 2.

differential market basket analysis

the use of market basket analysis techniques across stores, locations, seasons, days of the week, etc

digital marketing

the use of marking touchpoints that are executed electronically through a digital channel to communicate and interact with current and potential customers and partners.

On Twitter, the ________ network structure indicates the topics that are all highly interconnected by similar conversations.

tight crowd

A text analytics computer program separated the phrase "I like cake" into three sections: "I," "like," and "cake." This is an example of ________ within the text preprocessing step.

tokenization

T/F: AB testing enables a company to continuously test and examine how visitors respond to change vs. Another. Measurements using Ab testing are useful in understanding which variations perform the best, and ultimately determining which had the greatest influence on a particular performance metric

true

T/F: In K means, it is best to begin with data that has been standardized using z-scores or min-max

true

T/F: K means can only be applied to numerical data

true

T/F: Unstructured data represents more than 75% of the emerging data

true

T/F: hierarchical clustering can be executed with a mixed set of data that can include numerical and categorical values

true

directed network relationship

typically, depicted as using a line with a directional arrow from one node to another

The term "organic channel" used by Google Analytics means that a user has landed on a webpage through _________.

unpaid search results on search engines such Google, Yahoo, Bing, or Baidu.

Market Basket Analysis

uses purchase transaction data to identify associations between products or combinations of products and services that occur together frequently. Enables marketers to identify what is being purchased together.

collaborative filtering

uses the idea of identifying relevant items for a specific user from a large set of items by taking into consideration the preferences of many similar users

K-means clustering

uses the mean value for each cluster and minimizes the distance to individual observations. Can range from 2-12 clusters.

Social network analysis results in ________ that trace connections in the population and ultimately represent the structure and size of the networks.

visual maps

graph

visualization that enables viewers to understand the relationship between nodes and the importance of nodes


Ensembles d'études connexes

CSP Practice Exam Flash Cards - 1

View Set

Media Law and Ethics Test 3: Intellectual Property

View Set

Personal Qualities of a Health Care Worker

View Set

AP Euro Dictatorships and the Second World War

View Set

Unit 1: Lecture 5: Bacterial Flagella

View Set

abnormal psychology test #2 chapters 5-9

View Set