Data Minning Chapter one
define data mining #1?
Extraction of interesting patterns or knowledge from huge amount of data
example of Classification and prediction
classify cars based on gas mileage
Cluster analysis example
cluster houses
define Outlier in outlier analysis data mining functionalities
Data object that does not comply with the general behavior of the data
name one example of data mining?
Certain names are more prevalent in certain US locations
Origins of Data Mining Traditional Techniques may be unsuitable due to ?
Enormity of data, High dimensionality of data, Heterogeneous, distributed nature of data
Description Methods
Find human-interpretable patterns that describe the data
KDD Process: Several Key Steps
Learning the application domain ,Identifying a target data set,Data processing,Use of discovered knowledge
name one example of Risk analysis and management?
Forecasting
define outliers ?
Fraud detection and detection of unusual patterns
example of Subjective measures based on user belief on data
Patterns are interesting if they are unexpected
define Regression
Predict a value of a given continuous valued variable based on the values of other variables
example of regression
Time series prediction of stock market indices
Different views lead to different classifications Kinds of data to be mined: a-data view b-knowledge view c- method view
a
Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. describes which of the following evolution science? a-experimental science b-theoretical science c-computational science d-data science
a
Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation are example of : a-Market analysis and management b-Risk analysis and management c-Fraud detection
a
data relevant to the analysis task are retrieved from database: a-data selection b-data integration c-pattern evaluation d-data transformation
a
in typical DM System Architecture Database, data warehouse, WWW or other information repository is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
a
remove noise and inconsistent data: a-data cleaning b-data mining c-knowledge presentation d-data transformation
a
Kinds of knowledge to be discovered a-data view b-knowledge view c- method view
b
Objective measures Based on ?
statistics and structures of patterns
example of objective measures
support and confidence
example of medical data mining?
Health care and medical data mining
Search for only interesting patterns: An optimization problem
Highly desirable,No need to search through the generated patterns ,Measures can be used to rank the discovered patterns
Why Not Traditional Data Analysis?
Huge amount of data,High dimensionality of data ,High complexity of data
KDD stands for
Knowledge Discovery for Databases
name three Alternative names of data mining?
Knowledge discovery , knowledge extraction, data analysis
Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused ,Computers have become cheaper and more powerful,Competitive Pressure is Strong
there are different Potential Applications of data mining such as Data analysis and decision support name three of this applications
Market analysis and management,Risk analysis and management,Fraud detection and detection of unusual patterns
Major Issues in Data Mining:
Mining methodology ,user interaction , applications and social impacts
Database-oriented data sets and applications such as?(2)
Relational database, data warehouse
Data Exploration ways (3)?
Statistical Summary, Querying, and Reporting
KDD Process ?
Data Cleaning, Data Integration, Data selection, transformation, data mining, pattern evaluation, knowledge presentation
Multi-Dimensional View of Data Mining?
Data to be mined, knowledge to be mined, techniques utilized, applications adapted
Database or data warehouse server is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
b
Kinds of techniques utilized a-data view b-knowledge view c- method view
c
Knowledge base is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
c
experimental , theoretical, and computational ecology, or physics, or linguistics are example of which of the following evolution science? a-experimental science b-theoretical science c-computational science d-data science
c
indentify the truly interesting patterns a-data selection b-data integration c-pattern evaluation d-data transformation
c
mined knowledge is presented to the user with visualization or representation techniques a-data cleaning b-data mining c-knowledge presentation d-data transformation
c
Data mining engine is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
d
Certain names are more prevalent in certain US locations (O'Brien, O'Rurke, O'Reilly... in Boston area) is example of ?
data mining
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) is example of?
data mining
name one example of market analysis and management ?
target marketing
define Prediction Methods ?
Use some variables to predict unknown or future values of other variables
an essential process where intelligent methods are applied to extract data patterns a-data cleaning b-data mining c-knowledge presentation d-data transformation
b
multiple data sourced maybe combined : a-data selection b-data integration c-pattern evaluation d-data cleaning
b
traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. describes which of the following evolution science? a-experimental science b-theoretical science c-computational science d-data science
c
Data Mining Tasks(6)
clustering,classification,association rule discovery ,sequential pattern discovery,regression,deviation detection
The flood of data from new scientific instruments and simulations The ability to economically store and manage Yottabyte of data online The Internet and computing Grid that makes all these archives universally accessible Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes describes which of the following evolution science? a-experimental science b-theoretical science c-computational science d-data science
d
data transformed or consolidated into forms appropriate for mining (Done with data preprocessing): a-data selection b-data integration c-pattern evaluation d-data transformation
d
name four of Evolution of Sciences
experimental science,theoretical science,computational science,data science
User interface is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
f
according to scientific viewpoint Data mining may help scientists name 2 ways ?
in Hypothesis Formation , in classifying and segmenting data
Text mining ways?(3)
news group,email,documents
Look up phone number in phone directory is example of ?
not data mining
Query a Web search engine for information about "Amazon" is example of?
not data mining
data sources are:(4)?
paper,files,web documents ,database
Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous speeds ,Traditional techniques infeasible for raw data ,Data mining may help scientists
data mining On What Kinds of Data?
Database-oriented data sets and applications,Advanced data sets and advanced applications
define data mining #2?
Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
Forecasting, customer retention, improved underwriting, quality control, competitive analysis are example of : a-Market analysis and management b-Risk analysis and management c-Fraud detection
b
Data Mining and Business Intelligence pyramid ?
data sources,data preprocessing and data warehouses ,data exploration , data mining , data presentation , decision making
Different views lead to different classifications
data view,knowledge view,method view
define Construct models (functions)
describe classes or concepts for future prediction
Sequential pattern mining example
digital camera and large sd
example of Multidimensional concept description: Characterization and discrimination
dry vs wet regions
Pattern evaluation module is example of ? a-store data b-fetch and combine data c- turn data into meaningful groups d-perform mining task e-finding interesting patterns f-interact with the user
e
A pattern is interesting if it is ?
easily understood ,valid,potentially useful , novel , validates some hypothesis
Data Mining: Confluence of Multiple Disciplines (5)
machine learning , applications , algorithm,visualization,database technology
Origins of Data Mining Draws ideas from ?
machine learning, pattern recognition, statistics,database systems
example of subjective measures reflect the needs and interest of particular user?
marketing manager is only interested in characteristics of customer who shops frequently
Mining methodologies are ?(4)
performance,parallel,handling noise and incomplete data,mining different kinds of knowledge
name example of computational evolution ?
physics
Trend and deviation example
regression analysis
Advanced data sets and advanced applications such as (3)?
text databases,data streams,world wide web
name three other applications of data mining potential application?
text mining,web mining,stream data mining
Objective and subjective measures need to be combined. (true,false)
true