Data mining

Ace your homework & exams now with Quizwiz!

Multidimensional data mining is also called

Exploratory multidimensional data mining

Classification

The process of finding a model (or function) that describe and distinguishes data classes or concepts.

data mining

An essential process where intelligent methods are applied to extract data patterns

Training data

Class labeled data sets

What kinds of data can be mined?

Data mining can be applied to any kind of data as long as the data is meaningful for a target application

The most basic forms of data for mining applications are:

Database data Data warehouse data Transactional data

A frequent itemset typically refers to a set of times that often

Appear together in a transactional data set, for example, milk and bread which are frequently bought together

Data mining functionalities

Characterization, Discrimination, Association, Classification, Clustering, Outlier and Trend Analysis

Descriptive mining tasks

Characterizes properties of the data in a target data set

Clustering analyzes data objects without

Consulting class labels

Relational database can be accessed by

Database queries written in a relational query language e.g. SQL or with the assistance of graphical user interfaces

decision tree

Flowchart-like tree structure where each node denotes a test on an attribute value, each branch represent an outcome of the test and tree leaves represent classes or class distributions

The outlier data is referred to as

Outlier analysis or anomaly mining

Pattern evaluation

To identify the truly interesting patterns representing knowledge based on interestingness measures.

data transformation

Where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.

Knowledge presentation

where visualization and knowledge representation techniques are used to present the mined knowledge to users

Regression is used to

Missing or unavailable numerical data values rather than discrete class labels

Data mining means

Searching for knowledge (interesting patterns) in data. It is the process of discovering interesting patterns and knowledge from large amounts of data. The data sources can include databases, data warehouses, the web, other info repos or data that is streamed into the system dynamically

Data mining functionalities are used to

Specify the the kinds of patterns to be found in data mining tasks. In general such tasks can be classified into two categories: descriptive and predictive

Regression analysis

Statistical methodology that is most often used for numeric prediction, although other methods exist as well

data tombs

Data archives that are seldom visited

Taxonomy formation

The organization of observations into a hierarchy of classes that group similar events together

data discrimination

comparison of the general feature of the target class with one or a set of comparative classes (often called the contrasting classes) So it's a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes

Data cleaning

To remove noise and inconsistent data.

Association rules are discarded as uninteresting if they do not satisfy both a

minimum support threshold and a minimum confidence threshold

Data warehouse (2)

A repository of information collected from multiple sources stored under a unified scheme and usually residing at a single site

Data characterization

A summarization of the general characteristics or features of a target class of data.

Data warehouses are constructed via a process of:

Data cleaning, data integration, data transformation, data loading and periodic data refreshing

A data warehouse is usually modeled by a multidimensional data structure called

Data cube, in which each dimension corresponds to an attribute or a set of attributes in the scheme and each cell stores the value of aggregate measure such as count or sum(sales_amount)

Data mining is also known as

Knowledge discovery from data (KDD)

Predictive mining tasks

Perform Induction on the current day in order to Make predictions

data warehouse (1)

Repository of multiple heterogeneous data sources organized under a unified scheme at a single site to facilitate management decision making

Data Selection

where data relevant to the analysis task are retrieved from the database

data integration

where multiple data sources may be combined

The knowledge discovery process has 7 steps

1. Data cleaning 2. Data integration 3. Data selection 4. Data transformation 5. Data mining 6. Pattern evaluation 7. Knowledge presentation


Related study sets

CG: Net+ 007 - 8: Wireless Technologies

View Set

Cvent Event Management Certification

View Set

SCM Final: Chapter 13 (Aggregate Planning & S&O Planning)

View Set