ISDS 2001 CH. 4
subsets
Apiori Algorithm finds ______ that are common to at least a minimum number of the item sets
bottom-up
Apiori algorithm uses a _____-___ approach
ratio
____ data include measurement variables commonly found in the physical sciences and engineering EX: mass, length, time, salary data
data mining
_____ ____ tools are used to identify customer buying patterns
data mining
_____ _____ seeks to identify four major types of patterns: 1. Associations 2. predictions 3. clusters 4. sequential relationships
decision trees
_____ ______ partition data by branching along different attributes so that each leaf node has all the patterns of one class
cluster
_____ analysis for data mining places customers into groups having very similar characteristics
ordinal
_____ data contain codes assigned to objects or events as labels that also represent a rank order EX: credit score as low, medium, or high
nominal
_____ data contain measurements of simple codes assigned to objects as labels which are not measurements EX: marital status (single, married, divorced)
categorical
_____ data represent the label of multiple classes used to divide a variable into specific groups EX: race, sex, age group
data
_____ is the most critical ingredient for data mining which may include soft/unstructured components
decision trees
______ _____ are an analysis procedure which classifies observations into distinct groups based upon the values of predictor/input variables
association rule
______ _____ mining finds interesting relationships between variables
association rules
______ _______ are based on their support and confident measures
cluster
______ algorithms are used when the data records do not have predefined class identifiers
interval
______ data are variables that can be measured on interval scales EX: temperature
numerical
______ data represent the numeric values of specific variables
classification
______ methods learn from previous examples containing inputs and the resulting class labels
CRISP-DM
______-___ is the most comprehensive, common, and standardized data mining process
market segmentation
_______ ______ is an analysis that aids in dividing customers into groups based upon demographics so that you can target those groups with different advertising campaigns
association rule
_______ ______ mining is a popular data mining method that is commonly used as an example used to explain what data mining is and what it can do to a technologically less savvy audience
Apiori
_______ algorithm is the most common for association rule mining
parallel
_______ processing is sometimes used for data mining because of the massive data amounts and search efforts involved
classification
________ is perhaps the most frequently used data mining method for real-world problems
CRISP-DM
________ provides a systematic and orderly way to conduct data mining projects
preparation
a critical task to the DM process is data ______
project management
a data mining process must follow a systematic ________ ______ process to be successful
classification
a number of different algorithms (ID3, C4.5, C5, CART, and SPRINT) are commonly used for ________
classification
assessment methods for _______: 1. predictive accuracy 2. speed 3. robustness 4. scalability 5. Interpretability
affinity
association rule mining finds an ____ of two products to be commonly together in a shopping cart
market basis
association rule mining is also known as _______ _____ analysis - helps understand the purchase behavior of a buyer in the retail business
data mining
association rule mining is often used as and example to describe _____ ______ to ordinary people
maximum, minimum
cluster analysis creates groups that have ______ similarity among members within each group and ______ similarity among members across the groups
natural
cluster analysis for data mining is used for automatic identification of ______ groupings of things
fraud
cluster analysis has been used extensively for ______ detection and market segmentation of customers in CRM
segmentation
cluster analysis is also known as ______
relationship management
customer _______ _____ extends traditional marketing by creating one-on-one relationships with customers
preprocessing
data ________ contains four major steps: 1. data consolidation 2. data cleansing 3. data transformation 4. data reduction
flat
data mining can use simple ____ files as data sources or it can be performed on data in the data warehouses
nontrivial
data mining is a ______ process
intelligence
data mining is a easy to develop ______ from data that an organization collects, organizes, and stores
knowledge discovery
data mining is also known as _____ ______
intersection
data mining is at the _______ of many disciplines, including statistics, artificial intelligence, and mathematical modeling
hidden
data mining tools use mathematical techniques for extracting ______ patterns for predictive purposes
mathematical
data mining tools use patterns in data to develop ______ rules for predicting outcomes for future observations
private and personal
data that is collected, stored, and analyzed in data mining is often ____ and _______
cluster
example of a ______ pattern: - Market segmentation of customers
data mining
general recognition of the untapped value hidden in large data resources has recently increased the popularity of ____ ______
potentially useful
if the data is _____ ______, then results should lead to some business benefit
novel
if the data is _____, then previously unknown patterns are discovered
valid
if the data is _____, then the discovered patterns hold true on new data
knowledge mining
if using a mining analogy, "______ ______" would be more appropriate than "data mining"
logistics
in retailing and logistics application, data mining can optimize ______ by predicting seasonal effects
promotions
in retailing and logistics application, data mining improves the store layout and sales ________
shelf life
in retailing and logistics application, data mining minimizes losses due to limited ____ ____
million, billion
in statistics, a few hundred/thousand data points are large enough while in data mining, several _____ to a few ______ are large enough for studies
claim
in the insurance application, data mining can identify and prevent fraudulent ____ activities
numerical
interval and ratio data are classified under ______ data
crystal-ball
one myth of data mining is that it provides instant solutions and ____-___ predictions
de-identification
one way to accomplish privacy and protection of individual's rights when data mining is by __-_______ of customer records prior to applying data mining applications
categorical
ordinal and nominal data are classified under _______ data
sequential relationships
patterns called _______ ________ discover time-ordered events, such as predicting that an existing banking customer will open a savings account followed by an investment account within a year
clusters
patterns called _______ identify natural groupings of things based on their own characteristics
predictions
patterns called _______ tell the nature of further occurrences of certain events based on what has happened in the past
associations
patterns called _______ tells you what products your customers are most likely to purchase at the same time
associations
patterns called ________ find the commonly co-occurring groupings of things, such as beer and diapers bought together in market basket analysis
classification
popular _____ tasks: -credit approval -store location -target marketing -fraud detection
numeric
prediction problems where the variables have _____ values are most accurately described as regressions
data mining
process through which preciously unknown patterns in data were discovered
existing
statistics collects sample data to test the hypothesis while data mining uses ______ data to discover novel patterns and relationships
big
statistics looks for right sized data while data mining looks for data sets as _____ as possible
loosely
statistics start with well-defined propositions while data mining's propositions are ______ defined
CRISP-DM
steps of the _____-__ process: 1. Business understanting 2. Data understanding 3. Data preparation 4. Model building 5. Testing and evaluation 6. Deployment
creative
striking it rich in data mining requires _______ thinking
data mining
the ____ ______ process identifies valid, novel, potentially useful, and ultimately understandable patterns in data stored ing structured databases
CRISP-DM
the _____-___ process is the most comprehensive, highly repetitive and experimental
robustness
the ______ assessment for classification identifies the model's ability to overcome noisy data to make somewhat accurate predictions
scalabiltiy
the _______ assessment for classification identifies the model's ability to construct a prediction model efficiently, given a large amount of data
plummeted
the cost of data storage has _____ recently, making dat mining feasible for more firms
end user
the miner of data is often a(n) ___ _____
distance measure
the most commonly used similarity measure in cluster analysis is a _____ ______
free
the most popular ____ data mining tools are Weka and RapidMiner
commercial
the most popular _______ data mining tools are SPSS, PASW, and SAS Enterprise Miner
commercial
the number of users of free/open source data mining software now exceeds that of users of _____ software versions
data
the understanding of customers comes primarily from analyzing the vast ____ amounts routinely collected
identifiers
third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by removing _______
prediction, clustering, and association
three broad categories of data mining tasks are: _______, ________, and ______
customers
understanding _______ better has helped Amazon and others become more successful
value, retention
CRM maximizes customer ______ and ______
marketing
CRM maximizes return on ________ campaigns