ch 4 data mining

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

types of patterns

-association -prediction -cluster -sequential

data processes steps (4)

1. data consolidation 2. data cleaning 3. data transformation 4. data reduction

estimation methodologies

1. simple split 2. k fold 3. leave one out 4. bootstrapping 5. jackknifing 6.area under ROC curve

In data mining, classification models help in prediction. Select one: True False

TRUE

What is data

a collection of facts usually obtained as the result of experiences, observations, or experiments

what is a pattern

a mathematical relationship among data items

What is the difference between classification and regression

classification: is labeled as a class regression: numeric values

normalize data discretize, aggregate data construct new attributes are examples of what process

data transformation

What steps account for 85% of total project time

steps 1,2,3

What is data mining

the nontrivial process of identifying valid, novel, potentially useful and understandable patterns in data stored in structured databases

clustering -outlier analysis

unsupervised (k means)

data mining characteristics

- sources of data is often a consolidated DW -DM environment is isually a client-server or a web based IS -data is the most critical ingredient for DM -the miner is often an end user -creative thinking is needed -bc large amounts of data, parallel processing might be necessary

data mining applications

-customer relationship mgmt -banking and other financial - retailing and logistics - manufacturing and maintenance - brokerage and securities trading - insurance - computer hardware and software - science and engineering

assessment methods for classification

-predictive accuracy (hit rate) -speed (modeling building, predicting) -robustness -scalability -interpretability (understanding and insight by the model)

data mining process (6)

1. business understanding 2. data understanding 3. data preparation 4. model building 5. testing and evaluation 6. deployment

2 types of data mining

1. hypothesis driven 2. discovery driven

Reasons why data mining is gaining attention

1. more intense competition 2. recognition of value in data sources 3. availability of quality data on customers, vendors... 4. integration of data into data warehouses 5. exponential increase in data processing 6. reduction in cost for hardware, software for data storage 7. movement toward the demassification (conversion of info resources into nonphysical form)

What are the most common standard processes for data mining?

CRISP (cross industry standard process) SEMMA (sample, explore, modify, model, assess) KDD (knowledge discovery databases)

(data divided chart)

DATA categorical : nominal & ordinal Numerical: interval & ratio

Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system. Select one: True False

FALSE

In the Cabela's case study, the SAS/Teradata solution enabled the direct marketer to better identify likely customers and market to them based mostly on external data sources. Select one: True False

FALSE

In the Memphis Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime. Select one: True False

FALSE

Ratio data is a type of categorical data. Select one: True False

FALSE

Statistics and data mining both look for data sets that are as large as possible. Select one: True False

FALSE

The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit. Select one: True False

FALSE

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales. Select one: True False

TRUE

If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining." Select one: True False

TRUE

Interval data is a type of numerical data. Select one: True False

TRUE

The cost of data storage has plummeted recently, making data mining feasible for more firms. Select one: True False

TRUE

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. Select one: True False

TRUE

data may consist of numbers, letters, words, images, voice reording

TRUE

What is the main reason parallel processing is sometimes used for data mining? Select one: a. because any strategic application requires parallel processing b. because the most of the algorithms used for data mining require it c. because of the massive data amounts and search efforts involved d. because the hardware exists in most organizations and it is available to use

c. because of the massive data amounts and search efforts involved

In the Cabela's case study, what types of models helped the company understand the value of customers, using a five-point scale? Select one: a. simulation and geographical models b. reporting and association models c. clustering and association models d. simulation and regression models

c. clustering and association models

What is the difference between classification and clustering?

classification: supervised clustering: unsupervised

in classification problems, the primary sources for accuracy estimation is the

confusion matrix

what is ordinal data

contain codes assigned to objects that also represent the rank in order - 1 low 2 medium 3 high

What is nominal data

contains measurements of simple codes assigned to objects as labels - 1 single 2 married 3 divorced

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from Select one: a. asking the customers what they want. b. developing a philosophy that is data analytics-centric. c. collecting data about customers and transactions. d. analyzing the vast data amounts routinely collected.

d. analyzing the vast data amounts routinely collected.

All of the following statements about data mining are true EXCEPT Select one: a. the valid aspect means that the discovered patterns should hold true on new data. b. the potentially useful aspect means that results should lead to some business benefit. c. the novel aspect means that previously unknown patterns are discovered. d. the process aspect means that data mining should be a one-step process to results.

d. the process aspect means that data mining should be a one-step process to results.

input missing values reduce noise in data eliminate inconsistencies are examples of what process

data cleaning

collect data select data integrate data are examples of what process

data consolidation

Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for_________________

data mining

how does data mining work?

data mining extracts patterns from data

Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as

data processing

reduce number of variables reduce number of cases balance skew data are examples of what process

data reduction

Data are often buried deep within very large ___________________ , which sometimes contain data from several years.

databases

What is the most commonly used similarity measure in cluster analysis

distance measure

What are the most popular application areas for data mining?

healthcare and medicine

What is the most commonly used clustering algorithms?

k means & self organizing maps

other names for data mining

knowledgeable extraction pattern analysis knowledge discovery information harvesting pattern searching data dredging

what is ratio data

measurement variables commonly found in sciences and engineering -mass, length, time, energy

What is interval data

numeric values of specific variables -temperature

cluster algorithms are used when the data records do not have

predefined class identifiers

In the Memphis Police Department case study, shortly after all precincts embraced Blue CRUSH, ________________________________ became one of the most potent weapons in the Memphis police department's crime-fighting arsenal.

predictive analysis

Simple split

split the data into 2 manually exclusive sets training = 70% testing = 30%

Prediction -classification -regression

supervised

association rule mining is used to discover

two or more items that go together

association -link analysis -sequence analysis

unsupervised (bar code scanners)


Ensembles d'études connexes

Control Panel- IT Fundamentals Lesson 4

View Set

Module 4: Chapter 18.3: Social Movements and Social Change

View Set

Chapter 7: Selecting and Financing Housing

View Set

Case Study 19: Prioritization, Delegation and Assignment

View Set