INFO 320 - Chapter 4 Exam Review

Ace your homework & exams now with Quizwiz!

false

Analysis of items frequently co-occuring in transactions (such as purchases) is known as lift. (market basket analysis)

true

Association rules convey the likelihood of certain items being purchased together.

true

Centroid linkage uses the averaging concept of cluster centroids to define between-cluster similarity.

are the most different

Complete linkage defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that

antecedent item set occurs

Confidence can be viewed as a conditional probability of the consequent item set occurring given that the

calculating the confidence ratio for all association rules

Data preparation includes all of the following except which task?

observations

Euclidean distance can be used to measure the distance between ________________ in cluster analysis.

true

Euclidean distance can be used to measure the distance between two observations each consisting of two variable measurements.

true

Euclidean distance is the most common method to measure dissimilarity between observations.

convert the categories to binary, dummy variables.

In preparing categorical variables for analysis, it is usually best to

This means that if a transaction includes taco shells then it also includes ground beef and cheese.

Interpret the following association rule: "if {ground beef, cheese}, then {taco shells}."

does not count matching zero entries while the latter does.

Jaccard's coefficient is different from the matching coefficient in that the former

the products that are commonly purchased together.

Marketers are interested in examining transaction data on customer purchases to identify

higher, stronger

The __________________ the lift ratio, the _________________ the association rule.

cluster analysis

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called

clustering

The goal of _____________ is to segment observations into similar groups based on the observed variables.

unsupervised learning

The goal of ___________________ is to use the variable values to identify relationships between observations.

True

The goal of unsupervised learning is to use the variable values to identify relationships between observations.

greater than 1

The lift ratio demonstrates some usefulness to the association rule if its value is

true

The lift ratio of an association rule with a confidence value of 0.27 and in which the consequent occurs in 4 out of 10 cases is 0.675. lift ratio = confidence/(support of consequent/total number of transactions).

false

The lift ratio of an association rule with a confidence value of 0.42 and in which the consequent occurs in 6 out of 8 cases is 0.75.

Transform

To bin a continuous variable into categories, you can use XLMiner with the "Bin Continuous Data" procedure under ____________________ in the "Data Analysis" group

as little as possible

Ward's method merges two clusters such that the dissimilarity of the observations within the resulting single cluster increases _______________.

matching coefficient

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

single linkage

Which of the following explicit measures does not help to filter association rules?

false

Complete linkage is a measure of calculating dissimilarity between clusters by considering only the two closest observations in the two clusters.

true

Complete linkage is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

cereal

Identify the antecedent of the following association rule: "if {cereal}, then {milk}."

whipped cream

Identify the consequent of the following association rule: "if {jello, pudding}, then {whipped cream}."

the cause of the outliers

If a model's implications depend on the inclusion or exclusion of outliers, then one should spend additional time to track down

the hypotenuse

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two objects of a cluster?

10

If the lift ratio of an association rule is 0.75, and if the confidence value is 0.45 and the support of the consequent has a value of 6, what is the total number of transactions? [0.45 = 0.75(x/6)]

true

If the support of the consequent is high, the confidence of the association rule could be high even if there is little or no association between the items.

missing not at random

In a data set, data may be missing for several reasons. If the reason that the values are missing is related to the value of the variable then the missing data is said to be

b. set of recorded values of variables associated with a single entity.

Observation refers to the

0.5

Platinum Gym has 10,000 gym members, out of which 1500 memberships include Unlimited Fitness Training and use of the tanning salon, and 750 include Unlimited Hydromassage. If the Fitness Training are considered A, the use of the tanning salon are considered B, and the Hydromassage are considered C, then the associate rule for these sales become "If A and B are purchased, then C is also purchased." Calculate the confidence level. Confidence level = Total support / Total memberships = 750 / 1,500 = 0.5 = 50%.

1.67

Platinum Gym has 10,000 gym members, out of which 1500 memberships include Unlimited Fitness Training and use of the tanning salon, and 750 include Unlimited Hydromassage. If the Fitness Training are considered A, the use of the tanning salon are considered B, and the Hydromassage are considered C, then the associate rule for these sales become "If A and B are purchased, then C is also purchased." Given total transactions for C = 3000, calculate the lift for this rule.

most similar

Single linkage is a measure of calculating the distance between two clusters by considering only the two _______________ observations between the two clusters.

001

Suppose we had a data set of from a call center where customers were asked to choose between the following three options:hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?

complete linkage

The ____________________ clustering method defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that are the most different.

true

The efficiency of an association rule, known as lift, is determined by the ratio of the confidence of an association rule to the benchmark confidence.

false (called the support count)

The number of times that a collection of items occur together in a transaction data set is known as the sampling.

at least 20% of the total transactions

There are infinitely many possible association rules for transaction data. To simplify, we only consider association rules with a support count of

k-means clustering

__________________ assigns each observation to one of k clusters in a manner such that the observations assigned to the same cluster are as similar as possible.

complete linkage

__________________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

hierarchical clustering

__________________ is bottom-up clustering that starts with each observation belonging to its own cluster and then sequentially merges the most similar clusters to create a series of nested clusters.

market basket analysis

________________________ analyzes items frequently co-occuring in transactions (such as purchases).


Related study sets

Newborn Assessment: NCLEX questions

View Set

Intro to Finance Ch 14 (Actually 15 OR 16 OR 17)

View Set

Math Final- Chapters 1-5 (Big Ideas)

View Set