Data mining
Multidimensional data mining is also called
Exploratory multidimensional data mining
Classification
The process of finding a model (or function) that describe and distinguishes data classes or concepts.
data mining
An essential process where intelligent methods are applied to extract data patterns
Training data
Class labeled data sets
What kinds of data can be mined?
Data mining can be applied to any kind of data as long as the data is meaningful for a target application
The most basic forms of data for mining applications are:
Database data Data warehouse data Transactional data
A frequent itemset typically refers to a set of times that often
Appear together in a transactional data set, for example, milk and bread which are frequently bought together
Data mining functionalities
Characterization, Discrimination, Association, Classification, Clustering, Outlier and Trend Analysis
Descriptive mining tasks
Characterizes properties of the data in a target data set
Clustering analyzes data objects without
Consulting class labels
Relational database can be accessed by
Database queries written in a relational query language e.g. SQL or with the assistance of graphical user interfaces
decision tree
Flowchart-like tree structure where each node denotes a test on an attribute value, each branch represent an outcome of the test and tree leaves represent classes or class distributions
The outlier data is referred to as
Outlier analysis or anomaly mining
Pattern evaluation
To identify the truly interesting patterns representing knowledge based on interestingness measures.
data transformation
Where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.
Knowledge presentation
where visualization and knowledge representation techniques are used to present the mined knowledge to users
Regression is used to
Missing or unavailable numerical data values rather than discrete class labels
Data mining means
Searching for knowledge (interesting patterns) in data. It is the process of discovering interesting patterns and knowledge from large amounts of data. The data sources can include databases, data warehouses, the web, other info repos or data that is streamed into the system dynamically
Data mining functionalities are used to
Specify the the kinds of patterns to be found in data mining tasks. In general such tasks can be classified into two categories: descriptive and predictive
Regression analysis
Statistical methodology that is most often used for numeric prediction, although other methods exist as well
data tombs
Data archives that are seldom visited
Taxonomy formation
The organization of observations into a hierarchy of classes that group similar events together
data discrimination
comparison of the general feature of the target class with one or a set of comparative classes (often called the contrasting classes) So it's a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes
Data cleaning
To remove noise and inconsistent data.
Association rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold
Data warehouse (2)
A repository of information collected from multiple sources stored under a unified scheme and usually residing at a single site
Data characterization
A summarization of the general characteristics or features of a target class of data.
Data warehouses are constructed via a process of:
Data cleaning, data integration, data transformation, data loading and periodic data refreshing
A data warehouse is usually modeled by a multidimensional data structure called
Data cube, in which each dimension corresponds to an attribute or a set of attributes in the scheme and each cell stores the value of aggregate measure such as count or sum(sales_amount)
Data mining is also known as
Knowledge discovery from data (KDD)
Predictive mining tasks
Perform Induction on the current day in order to Make predictions
data warehouse (1)
Repository of multiple heterogeneous data sources organized under a unified scheme at a single site to facilitate management decision making
Data Selection
where data relevant to the analysis task are retrieved from the database
data integration
where multiple data sources may be combined
The knowledge discovery process has 7 steps
1. Data cleaning 2. Data integration 3. Data selection 4. Data transformation 5. Data mining 6. Pattern evaluation 7. Knowledge presentation