ISDS 2001 Test 4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Decision Tree Analysis

(a machine-learning technique) is arguably the most popular classification technique in the data mining arena.

Data Mining Characteristics and Objectives

Data is the most critical ingredient for DM which may include soft/unstructured data.

Categorical Data

Data that represents the labels of multiple classes used to divide a variable into specific groups.

analytical decision making

In an article in Harvard Business Review, Thomas Davenport (2006) argued that the latest strategic weapon for companies is ________. A) customer relationship management B) e-commerce C) online auctions D) analytical decision making

nontrivial

The __________ process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.

Apriori Algorithm

The most commonly used algorithm to discover association rules by recursively identifying frequent itemsets.

Data Mining

The nontrivial process of identifying (1)valid, (2)novel, (3)potentially useful, and (4) ultimately understandable patterns in data stored in structured databases.

True

True or False: if using a mining analogy, "knowledge mining" would be a more appropriate term that "data mining"

True

True or False: the cost of data storage has plummeted recently, making data mining feasible for more firms

analyzing the vast data amounts routinely collected.

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

All of the above

Why has data mining gained the attention of the business world? A) More intense competition at the global scale driven by customers' ever-changing needs and wants in an increasingly saturated marketplace. B) Consolidation and integration of database records, which enables a single view of customers and vendors. C) Significant reduction in the cost of hardware and software for data storage and processing. D) All of the above

not new

although the term data mining is relatively new, the ideas behind it are

Data Mining

availability of quality data on customers, vendors, transactions, Web, etc.

Data Mining

become an imperative and common practice for a vast majority of organizations

Ordinal Data

codes assigned as rank order. Ex: credit score as (1) low, (2) medium, or (3) high

Data Mining

is a term used to describe discovering or "mining" knowledge from large amounts of data

Data Mining

is relatively new, the ideas behind it are not new.

Data Mining

is the process through which previously unknown patterns in data were discovered

Data Mining

more intense competition at the global scale

Regression

prediction problems where the variables have numeric values

Nontrivial Process

process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases

Data Mining Myths

provides instant solutions and crystal-ball predictions is not yet viable for business applications requires a separate, dedicated database can only be done by those with advanced degrees is only for large firms that have lots of customer data is another name for the good-old statistics

Predictions

tell the nature of future occurrences of certain events based on what has happened in the past, such as predicting the winner of the Super Bowl (classification) forecasting the absolute temperature of a given day (regression)

Valid

the discovered patterns should hold true on new data

Structured Database

the non trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where data is organized on records structured by categorical, ordinal, and continuous variables.

False

true or false: data mining provides instant solutions and crystal ball predictions

Data mining tools- facts

- use mathematical techniques for extracting hidden patterns for predictive purposes - use patterns in data to develop mathematical rules for predicting outcomes for future observations - are used to identify customer buying patterns

Data Mining Tools Facts

- use mathematical techniques for extracting hidden patterns for predictive purposes. - use patterns in data to develop mathematical rules for predicting outcomes for future observations. - are commonly used to identify customer buying patterns to increase sales and for fraud detection, among other things. - In data mining, classification models help in prediction. - A data mining study is specific to addressing a well-defined business task, and different business tasks require different sets of data.

other names for data mining

-knowledge discovery -knowledge extraction -pattern analysis

Why has data mining only recently gained the attention of the business world?

1) More intense competition at the global scale. 2)Recognition of the value in data sources. 3) Availability of quality data on customers, vendors, transactions, Web, etc. 4) Consolidation and integration of data repositories into data warehouses.(single location) 5) The exponential increase in data processing and storage capabilities; 6) decrease in cost.

Data Mining Truths

1. If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining." 2. The cost of data storage has plummeted recently, making data mining feasible for more firms. 3. Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from analyzing the vast data amounts routinely collected. 4. Parallel processing is sometimes used for data mining because of the massive data amounts and search efforts involved 5. The number of users of free/open source data mining software now exceeds that of users of commercial software versions.

Why is Data Mining gaining attention?

1. More intense competition at the global scale. 2. Recognition of the value in data sources. 3. Availability of quality data on customers, vendors, transactions, Web, etc. a. A large portion of "understanding the customer" can come from analyzing the vast amount of data that a company routinely collects. b. This has helped Amazon and many other successful businesses. 4. Consolidation and integration of data repositories into data warehouses (DW). 5. The exponential increase in data processing and storage capabilities; and decrease in cost. a) The cost of data storage has plummeted recently, making data mining feasible for more firms. 6. Movement toward conversion of information resources into nonphysical form.

predictive analytics in law enforcement

1. policing with less 2. new thinking on cold cases 3. the big picture starts small 4. success brings credibility 5. just for the facts 6. safer streets for smarter cities

Data Mining

A ____________ study is specific to addressing a well defined business task, and different business tasks require different sets of data

Association

A category of data mining algorithm that establishes relationships about items that occur together in a given record

Application Case 4.4: Data Mining in Cancer Research What do you think are the promises and major challenges for data miners in contributing to medical and biological research endeavors?

According to the American Cancer Society, half of all men and one-third of all women in the United States will develop cancer during their lifetimes; approximately 1.68 million new cancer cases will be diagnosed in 2017. Cancer is the second most common cause of death in the United States and in the world, exceeded only by cardiovascular disease. Data mining shows tremendous promise for helping to understand cancer, leading to better treatment and saved lives. Data mining is not meant to replace medical professionals and researchers, but to complement their invaluable efforts to provide data-driven new research directions and to ultimately save more lives. Without the cooperation and feedback from the medical experts, data mining results are not of much use. The patterns and relationships found via data mining methods should be evaluated by medical professionals who have years of experience in the problem domain to decide whether they are logical, actionable, and novel to warrant new research directions.

the process aspect means that data mining should be a one-step process to results.

All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

Data Mining

Although the term __________ is relatively new, the ideas behind it are not new.

Is data mining a new discipline? Explain.

Although the term data mining is relatively new, the ideas behind it are not. Many of the techniques used in data mining have their roots in traditional statistical analysis and artificial intelligence work done since the early part of the 1980s. New or increased use of data mining applications makes it seem like data mining is a new discipline. In general, data mining seeks to identify four major types of patterns: Associations, Predictions, Clusters and Sequential relationships. These types of patterns have been manually extracted from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches. As datasets have grown in size and complexity, direct manual data analysis has increasingly been augmented with indirect, automatic data processing tools that use sophisticated methodologies, methods, and algorithms. The manifestation of such evolution of automated and semiautomated means of processing large datasets is now commonly referred to as data mining.

Knowledge Mining

Another name for data mining

4 Major Types of Patterns of Data Mining

Associations Predictions Clusters Sequential relationships

Application Case 4.1: Visa Is Enhancing the Customer Experience While Reducing Fraud with Predictive Analytics and Data Mining How did Visa improve customer service while also improving retention of fraud?

By creating more accurate fraud identification systems, Visa was able to decrease the number of false positives and reducing customer concerns and complaints that went along with them.

Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique.

Classification is for prediction that can be based on historical data and relationships, such as predicting the weather, product demand, or a student's success in a university. If what is being predicted is a class label (e.g., "sunny," "rainy," or "cloudy") the prediction problem is called a classification, whereas if it is a numeric value (e.g., temperature such as 68°F), the prediction problem is called a regression.

Data Mining Truths

Cost of data storage has plummeted recently, making data mining feasible Understanding customers comes primarily from analyzing the data Parallel processing is sometimes used for data mining The number of users of free/open source data mining software now exceeds that of users of commercial

Define data mining. Why are there many different names and definitions for data mining?

Data mining is the process through which previously unknown patterns in data were discovered. Another definition would be "a process that uses statistical, mathematical, and artificial learning techniques to extract and identify useful information and subsequent knowledge from large sets of data." This includes most types of automated data analysis. A third definition: Data mining is the process of finding mathematical patterns from (usually) large sets of data; these can be rules, affinities, correlations, trends, or prediction models. -- Data mining has many definitions because it's been stretched beyond those limits by some software vendors to include most forms of data analysis in order to increase sales using the popularity of data mining.

Application Case 4.2: Dell Is Staying Agile and Effective with Analytics in the 21st Century What was the challenge Dell was facing that led to their analytics journey?

Dell noticed that customers were spending a significant amount of time evaluating products before they contacted sales. Dell wanted to ensure that this evaluation of products was positive for the company, and wanted to make sure that they were providing accurate information in a format that was expedient for customers. The problem was that the company had a huge variety of information available, and figuring out how to understand that information required additional research. All of this is true.

Data Mining

Discovering or mining knowledge from large amounts of data. It is a process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and subsequent knowledge from large sets of data.

Application Case 4.4: Data Mining in Cancer Research How can data mining be used for ultimately curing illnesses like cancer?

Even though cancer research has traditionally been clinical and biological in nature, in recent years data-driven analytic studies have become a common complement. In medical domains where data- and analytics-driven research have been applied successfully, novel research directions have been identified to further advance the clinical and biological studies. Data mining algorithms that predict cancer survivability with high predictive power are valuable but cannot replace the medical professionals. Using data mining techniques, medical researchers are able to identify novel patterns, paving the road toward a cancer-free society. Data mining methods are capable of extracting patterns and relationships hidden deep in large and complex medical databases.

Clusters (segmentation) identify natural groupings of things based on their known characteristics, such as assigning customers in different segments based on their demographics & past purchase behaviors.

Examples: a) Market segmentation of customers b) Establishing new tax brackets c) Harry Potter - Hogwarts Sorting Hat d) Seating guests at a wedding

What recent factors have increased the popularity of data mining?

Following are some of the most pronounced reasons: • More intense competition at the global scale driven by customers' ever-changing needs and wants in an increasingly saturated marketplace. • General recognition of the untapped value hidden in large data sources. • Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. • Consolidation of databases and other data repositories into a single location in the form of a data warehouse. • The exponential increase in data processing and storage technologies. • Significant reduction in the cost of hardware and software for data storage and processing. • Movement toward the de-massification (conversion of information resources into nonphysical form) of business practices.

What recent factors have increased the popularity of data mining?

General recognition of the untapped value hidden in large data sources

Data Mining

Generally speaking, ____________ is a way to develop intelligence (i.e., actionable information or knowledge) from data that an organization collects, organizes, and stores

Data Mining

Generally speaking, data mining is a way to develop intelligence (i.e., actionable information or knowledge) from data that an organization collects, organizes, and stores. A wide range of data mining techniques are being used by organizations to gain a better understanding of their customers and their own operations and to solve complex organizational problems.

Application Case 4.6: Data Mining Goes to Hollywood: Predicting Financial Success of Movies

Goal: Predicting financial success of Hollywood movies before the start of their production process

Application Case 4.6: Data Mining Goes to Hollywood: Predicting Financial Success of Movies

How: Use of advanced predictive analytics methods

data mining

In ___________, classification models help in prediction

classification models

In data mining, __________ help in prediction.

OPENING VIGNETTE Miami-Dade Police Department Is Using Predictive Analytics to Foresee and Fight Crime Why do law enforcement agencies and departments like Miami-Dade Police Department embrace advanced analytics and data mining?

Law enforcement agencies have embraced advanced analytics and data mining because it allows them to address many of the needs that they have in their departments. Specifically, it allows them to be more efficient in their use of money and resources. They are able to do this by being more selective in the types of activities that they engage in. Additionally, data may be used to help them look at outstanding crimes (cold cases), and to find new avenues that may be explored to solve these previously unsolved crimes.

statistical analysis and artificial intelligence (AI)

Many of the techniques used in data mining have their roots in traditional

How do you think Hollywood did, and perhaps still is performing, this task without the help of data mining tools and techniques?

Most is done by gut feel and trial and error. This may keep the movie business as a financially risky endeavor, but also allows for creativity. Sometimes uncertainty is a good thing.

Clustering

Partitioning a given data set into segments (natural groupings) in which the members of a segment share similar qualities.

Categorical Data

Represents the labels of multiple classes used to divide a variable into specific groups. Examples include race, sex, age group, and educational level. Subdivided into nominal and ordinal data.

Application Case 4.6: Data Mining Goes to Hollywood: Predicting Financial Success of Movies

Results: promising

Statistical Analysis

Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other.

Classification

Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior

Application Case 4.3: A Mine on Terrorist Funding How can data mining be used to fight terrorism?

The application case discusses use of data mining to detect money laundering and other forms of terrorist financing. Using data mining on data about imports and exports or finding an observed price deviation can help to detect tax avoidance/evasion, money laundering, or terrorist financing. Other applications could be to track the behavior and movement of potential terrorists, as well as text mining emails, blogs, and social media threads.

OPENING VIGNETTE Miami-Dade Police Department Is Using Predictive Analytics to Foresee and Fight Crime What are the top challenges for law enforcement agencies and departments like Miami-Dade Police Department? Can you think of other challenges (not mentioned in this case) that can benefit from data mining?

The primary problems were rising crime, impatient city leaders, and budget pressures. Challenges for many agencies revolve around being able to provide the best possible service within a limited budget. This means that agencies must be able to be efficient in their use of time and resources, as well as ensuring that their results are positive. These issues are believed to be consistent across many departments and jurisdictions. In addition, other areas may struggle with specific questions about the use of funds, and the cost-benefit of different types of enforcement or possible prevention programs.

describe the process through which previously unknown patterns in data were discovered

The term data mining was originally used to ________. A) include most forms of data analysis in order to increase sales B) describe the analysis of huge datasets stored in data warehouses C) describe the process through which previously unknown patterns in data were discovered D) All of the above

What do you think about data mining and its implication for privacy? What is the threshold between discovery of knowledge and infringement of privacy?

There is a tradeoff between knowledge discovery and privacy rights. Retailers should be sensitive about this when targeting their advertising based on data mining results, especially regarding topics that could be embarrassing to their customers. Otherwise, they risk offending these customers, which could hurt their bottom line. This news story ran in Forbes and New York Times, among other notable publications.

Data Mining

This has helped Amazon and many other successful businesses

Application Case 4.2: Dell Is Staying Agile and Effective with Analytics in the 21st Century What solution did Dell develop and implement? What were the results?

To solve this problem, Dell created a single data mart that contained information from a wide variety of sources. In the Dell case study, engineers working closely with marketing, used lean software development strategies and numerous technologies to create a highly scalable, singular data mart. This data mart became the singular repository for information that was used for making decisions in the company. This decision had many positive results, including saving significant amounts in operational costs as well as driving increased revenues.

False

True or False: Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

False

True or False: In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

False

True or False: Statistics and data mining both look for data sets that are as large as possible.

True

True or False: Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

False

True or False: data mining is another name for the good old statistics

Application Case 4.1: Visa Is Enhancing the Customer Experience While Reducing Fraud with Predictive Analytics and Data Mining What challenges were Visa and the rest of the credit card industry facing?

Visa was facing twin challenges. The focus of the predictive analytics system in the Visa credit card case was two-fold: on more accurately detecting and handling fraudulent claims and reducing customer concerns and complaints that went along with these claims. The first challenge was the growing rates of credit card fraud, while the second challenge was inaccurate fraud identification systems that created customer issues. For example, customers were on a dream vacation or a critical business trip and tried to use their Visa credit card for a large and unexpected purchase of goods or services. This flagged the transaction as possible fraud in the Visa fraud risk tools. Visa would then deny the transaction and freeze the credit card. This is a false positive result. Customers were very unhappy when this happened to them.

instant, business, separate, advanced, large

What are the most common myths about data mining? • Data mining provides _____, crystal-ball predictions. • Data mining is not yet viable for ______ applications. • Data mining requires a ______, dedicated database. • Only those with ______ degrees can do data mining. • Data mining is only for ______ firms that have lots of customer data.

Artifical Intelligence

What does AI stand for?

Data

___ the most critical ingredient for DM which may include soft/unstructured data

Data Mining

a large portion of "understanding the customer" can come from analyzing the vast amount of data that a company routinely collects

Data Mining

a strategic weapon for companies to compete with the giants of Amazon, Capital One, and Marriott

major characteristics and objectives of data mining

a) Data are often buried deep in large databases which contain data from several years b) Environment is usually client/server or web based c) Sophisticated new tools d) Miner is often an end-user with little or no programming skill. Armed with data drills and power tools to ask ad-hoc queries and get answers quickly. e) Miners must be creative to interpret the results when they find unexpected results. f) Data mining tools are combined with spreadsheets and other software tools so the mined data can be analyzed and deployed quickly and easily. g) Massive data and search efforts may require parallel processing when data mining.

Associations find the commonly co-occurring groupings of things, such as

a) beer and diapers bought together in market basket analysis b) tells you what products your customers are most likely to purchase at the same time.

Predictions tell the nature of future occurrences of certain events based on what has happened in the past, such as

a) predicting the winner of the Super Bowl (classification) or b) forecasting the absolute temperature of a given day (regression). c) Prediction problems where the variables have numeric values are most accurately defined as regressions.

Application Case 4.6: Data Mining Goes to Hollywood: Predicting Financial Success of Movies Why is it important for many Hollywood professionals to predict the financial success of movies?

a. Hard to predict box-office receipts for a movie b. Predictive models in early stages of movie production is effective to minimize investments in flops. The movie industry is the "land of hunches and wild guesses" due to the difficulty associated with forecasting product demand, making the movie business in Hollywood a risky endeavor. If Hollywood could better predict financial success, this would mitigate some of the financial risk.

Application Case 4.7: Predicting Customer Buying Patterns—The Target Story Did Target go too far? Did it do anything illegal? What do you think Target should have done? What do you think Target should do next (quit these types of practices)?

a. Lawful to store and analyze transaction & customer data. The company did not use an private or personal data. Legally speaking, there was no violation of any laws. b. Disturbing to identify teen pregnancy, terminal disease, divorce, or bankruptcy. Target might have made a tactical mistake, but they certainly didn't do anything illegal. They did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. Indeed, even the father apologized when realizing his daughter was actually pregnant. The fact is, we live in a world of massive data, and we are all as consumers leaving traces of our buying behavior for anyone to see.

What do you think about data mining and its implication for privacy? What is the threshold between discovery of knowledge and infringement of privacy?

a. Target sent a teen maternity ads because Target's analytic model suggested she was pregnant based on her buying habits. b. Tradeoff between knowledge discovery and privacy rights. c. Risk offending customers and hurt the bottom line.

Application Case 4.7: Predicting Customer Buying Patterns—The Target Story What do you think about data mining and its implication for privacy? What is the threshold between discovery of knowledge and infringement of privacy?

a. Target sent a teen maternity ads because Target's analytic model suggested she was pregnant based on her buying habits. b. Tradeoff between knowledge discovery and privacy rights. c. Risk offending customers and hurt the bottom line. There is a tradeoff between knowledge discovery and privacy rights. Retailers should be sensitive about this when targeting their advertising based on data mining results, especially regarding topics that could be embarrassing to their customers. Otherwise, they risk offending these customers, which could hurt their bottom line. This news story ran in Forbes and New York Times, among other notable publications.

Application Case 4.6: Data Mining Goes to Hollywood: Predicting Financial Success of Movies How can data mining be used for predicting financial success of movies before the start of their production process?

a. To determine how much to invest in the movie production b. To evaluate tradeoffs to maximize success of movie production. c. Classification problem for prediction >Dependent variable - Class no. 1-9 (flop to blockbuster) >Select combination of independent variables, e.g., MPAA rating, competition, actors (star value), genre, special effects, sequel, etc. The way the textbook authors, Sharda and Delen, did it was they applied individual and ensemble prediction models, and were able to identify significant variables influencing financial success. They also showed that by using sensitivity analysis, decision makers can predict with fairly high accuracy how much value a specific actor (or a specific release date, or the addition of more technical effects, etc.) brings to the financial success of a film, making the underlying system an invaluable decision aid.

Data Mining

consolidation and integration of data warehouses (DW)

Ratio Data

continuous data where both differences and ratios are interpretable. distinguishing feature of a ratio scale is the possession of a nonarbitrary zero value

Regression

data mining method for real world prediction problems where the predictive values (output var or dependent var) are numeric

Associations

find the commonly co-occurring groupings of things, such as beer and diapers bought together in market basket analysis --tells you what products your customers are most likely to purchase at the same time. --Also known as market basket analysis -helps understand the purchase behavior of a buyer in the retail business

Association

finds the commonly co-occurring groupings of things, such as: beer and diapers bought together in market basket analysis tells you what products your customers are most likely to purchase at the same time

Data Mining

recognition of the value in data sources

Categorical Data

represent labels of multiple classes used to divide variable into specific groups. EX. Race, sex, age group

Potentially Useful

results should lead to some business benefit

Nominal Data

simple codes assigned as labels. Ex: marital status as S-Single, M-Married, or D-Divorced

Data Mining

wide range of these type of techniques are being used by organizations to gain a better understanding of their customers and their own operations and to solve complex organizational problems


Kaugnay na mga set ng pag-aaral

Chapter 6_Inside classes and objects

View Set

Motivation and Emotion Exam 1 (chapter 1-3)

View Set

Secondary survey chapter 12 Arnheim pp, Types of shock, Medical Terminology, Injuries & Deformities of the hand, Budgeting terms, BOC Prep and NATA-BOC Exam Secrets Study Guide, BOC

View Set

ABEKA 9TH GRADE BIBLE KINGS OF ISRAEL SEMESTER VERSE EXAM(VERSES FOR MIDTERM)

View Set

Ch.7 Control of Microbial Growth

View Set

Maternal and Child Practice Exam 3

View Set

Inquizitive Chap 4: Civil Liberties

View Set