ISDS 2001 CH. 4

¡Supera tus tareas y exámenes ahora con Quizwiz!

subsets

Apiori Algorithm finds ______ that are common to at least a minimum number of the item sets

bottom-up

Apiori algorithm uses a _____-___ approach

ratio

____ data include measurement variables commonly found in the physical sciences and engineering EX: mass, length, time, salary data

data mining

_____ ____ tools are used to identify customer buying patterns

data mining

_____ _____ seeks to identify four major types of patterns: 1. Associations 2. predictions 3. clusters 4. sequential relationships

decision trees

_____ ______ partition data by branching along different attributes so that each leaf node has all the patterns of one class

cluster

_____ analysis for data mining places customers into groups having very similar characteristics

ordinal

_____ data contain codes assigned to objects or events as labels that also represent a rank order EX: credit score as low, medium, or high

nominal

_____ data contain measurements of simple codes assigned to objects as labels which are not measurements EX: marital status (single, married, divorced)

categorical

_____ data represent the label of multiple classes used to divide a variable into specific groups EX: race, sex, age group

data

_____ is the most critical ingredient for data mining which may include soft/unstructured components

decision trees

______ _____ are an analysis procedure which classifies observations into distinct groups based upon the values of predictor/input variables

association rule

______ _____ mining finds interesting relationships between variables

association rules

______ _______ are based on their support and confident measures

cluster

______ algorithms are used when the data records do not have predefined class identifiers

interval

______ data are variables that can be measured on interval scales EX: temperature

numerical

______ data represent the numeric values of specific variables

classification

______ methods learn from previous examples containing inputs and the resulting class labels

CRISP-DM

______-___ is the most comprehensive, common, and standardized data mining process

market segmentation

_______ ______ is an analysis that aids in dividing customers into groups based upon demographics so that you can target those groups with different advertising campaigns

association rule

_______ ______ mining is a popular data mining method that is commonly used as an example used to explain what data mining is and what it can do to a technologically less savvy audience

Apiori

_______ algorithm is the most common for association rule mining

parallel

_______ processing is sometimes used for data mining because of the massive data amounts and search efforts involved

classification

________ is perhaps the most frequently used data mining method for real-world problems

CRISP-DM

________ provides a systematic and orderly way to conduct data mining projects

preparation

a critical task to the DM process is data ______

project management

a data mining process must follow a systematic ________ ______ process to be successful

classification

a number of different algorithms (ID3, C4.5, C5, CART, and SPRINT) are commonly used for ________

classification

assessment methods for _______: 1. predictive accuracy 2. speed 3. robustness 4. scalability 5. Interpretability

affinity

association rule mining finds an ____ of two products to be commonly together in a shopping cart

market basis

association rule mining is also known as _______ _____ analysis - helps understand the purchase behavior of a buyer in the retail business

data mining

association rule mining is often used as and example to describe _____ ______ to ordinary people

maximum, minimum

cluster analysis creates groups that have ______ similarity among members within each group and ______ similarity among members across the groups

natural

cluster analysis for data mining is used for automatic identification of ______ groupings of things

fraud

cluster analysis has been used extensively for ______ detection and market segmentation of customers in CRM

segmentation

cluster analysis is also known as ______

relationship management

customer _______ _____ extends traditional marketing by creating one-on-one relationships with customers

preprocessing

data ________ contains four major steps: 1. data consolidation 2. data cleansing 3. data transformation 4. data reduction

flat

data mining can use simple ____ files as data sources or it can be performed on data in the data warehouses

nontrivial

data mining is a ______ process

intelligence

data mining is a easy to develop ______ from data that an organization collects, organizes, and stores

knowledge discovery

data mining is also known as _____ ______

intersection

data mining is at the _______ of many disciplines, including statistics, artificial intelligence, and mathematical modeling

hidden

data mining tools use mathematical techniques for extracting ______ patterns for predictive purposes

mathematical

data mining tools use patterns in data to develop ______ rules for predicting outcomes for future observations

private and personal

data that is collected, stored, and analyzed in data mining is often ____ and _______

cluster

example of a ______ pattern: - Market segmentation of customers

data mining

general recognition of the untapped value hidden in large data resources has recently increased the popularity of ____ ______

potentially useful

if the data is _____ ______, then results should lead to some business benefit

novel

if the data is _____, then previously unknown patterns are discovered

valid

if the data is _____, then the discovered patterns hold true on new data

knowledge mining

if using a mining analogy, "______ ______" would be more appropriate than "data mining"

logistics

in retailing and logistics application, data mining can optimize ______ by predicting seasonal effects

promotions

in retailing and logistics application, data mining improves the store layout and sales ________

shelf life

in retailing and logistics application, data mining minimizes losses due to limited ____ ____

million, billion

in statistics, a few hundred/thousand data points are large enough while in data mining, several _____ to a few ______ are large enough for studies

claim

in the insurance application, data mining can identify and prevent fraudulent ____ activities

numerical

interval and ratio data are classified under ______ data

crystal-ball

one myth of data mining is that it provides instant solutions and ____-___ predictions

de-identification

one way to accomplish privacy and protection of individual's rights when data mining is by __-_______ of customer records prior to applying data mining applications

categorical

ordinal and nominal data are classified under _______ data

sequential relationships

patterns called _______ ________ discover time-ordered events, such as predicting that an existing banking customer will open a savings account followed by an investment account within a year

clusters

patterns called _______ identify natural groupings of things based on their own characteristics

predictions

patterns called _______ tell the nature of further occurrences of certain events based on what has happened in the past

associations

patterns called _______ tells you what products your customers are most likely to purchase at the same time

associations

patterns called ________ find the commonly co-occurring groupings of things, such as beer and diapers bought together in market basket analysis

classification

popular _____ tasks: -credit approval -store location -target marketing -fraud detection

numeric

prediction problems where the variables have _____ values are most accurately described as regressions

data mining

process through which preciously unknown patterns in data were discovered

existing

statistics collects sample data to test the hypothesis while data mining uses ______ data to discover novel patterns and relationships

big

statistics looks for right sized data while data mining looks for data sets as _____ as possible

loosely

statistics start with well-defined propositions while data mining's propositions are ______ defined

CRISP-DM

steps of the _____-__ process: 1. Business understanting 2. Data understanding 3. Data preparation 4. Model building 5. Testing and evaluation 6. Deployment

creative

striking it rich in data mining requires _______ thinking

data mining

the ____ ______ process identifies valid, novel, potentially useful, and ultimately understandable patterns in data stored ing structured databases

CRISP-DM

the _____-___ process is the most comprehensive, highly repetitive and experimental

robustness

the ______ assessment for classification identifies the model's ability to overcome noisy data to make somewhat accurate predictions

scalabiltiy

the _______ assessment for classification identifies the model's ability to construct a prediction model efficiently, given a large amount of data

plummeted

the cost of data storage has _____ recently, making dat mining feasible for more firms

end user

the miner of data is often a(n) ___ _____

distance measure

the most commonly used similarity measure in cluster analysis is a _____ ______

free

the most popular ____ data mining tools are Weka and RapidMiner

commercial

the most popular _______ data mining tools are SPSS, PASW, and SAS Enterprise Miner

commercial

the number of users of free/open source data mining software now exceeds that of users of _____ software versions

data

the understanding of customers comes primarily from analyzing the vast ____ amounts routinely collected

identifiers

third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by removing _______

prediction, clustering, and association

three broad categories of data mining tasks are: _______, ________, and ______

customers

understanding _______ better has helped Amazon and others become more successful

value, retention

CRM maximizes customer ______ and ______

marketing

CRM maximizes return on ________ campaigns


Conjuntos de estudio relacionados

Foundations of Networking Midterm

View Set

PART 2 CP4D ASSESSMENT TEST, version 2.5, accuracy 74%

View Set

Intro to Nutrition: Chapter 4 Quiz/Quiz 4

View Set