Data Mining
Interval-scaled attribute
measured on a scale of equal-size units
Frequent Itemset
set of items that often appear together in a transactional data set—for example, milk and bread
attribute
Data field, representing a characteristic or feature of a data object. Term often used interchangeably with "dimension", "feature", and "variable"
Cluster Analysis
1. Analyzes data objects without consulting class labels. 2. Statistical techniques identify groups of entities that have similar characteristics
Web search engine
Specialized computer server that searches for information on the Web. The search results of a user query are often returned as a list (sometimes called hits)
Information retrieval
The science of searching for documents or information in documents. Assumes that (1) the data under search are unstructured; and (2) the queries are formed mainly by keywords, which do not have complex structures.
ratio-scaled attribute
a numeric attribute with an inherent zero-point and for which we can speak of a value as being a multiple of another value
ETL (extraction, transformation, and loading)
1. Extract is the process of reading data that is assumed to be important. The data can be either raw collected from multiple and different types of sources or taken from a source database. 2. Transform is the process of converting the extracted data from its previous format into the format required by another database. The transformation occurs by using rules or lookup tables or by combining the data with other data. 3. Load is the process of writing the data into the target database, data warehouse or another system
Data mining
1. Often used to refer to the entire knowledge discovery process, which consist of discovering interesting patterns and knowledge from large amounts of data. 2. As a step, it is an essential process where intelligent methods are applied to extract data patterns.
Database management system (DBMS, also called Database system)
A collection of interrelated data, known as a database, and a set of software programs to manage and access the data.
data discrimination
A comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes.
data cube
A multidimensional data structure used to store and manipulate data in a multidimensional DBMS. Each dimension corresponds to an attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure.
class/concept descriptions
Allows to describe individual classes and concepts in summarized, concise, and yet precise terms.
OLTP (online transaction processing)
Capturing and storing data from ERP, CRM, POS, and other day-to-day business transactions. The main focus is on efficiency of routine tasks.
Measure of data dispersion
Measure how are the data spread out. The most common ... measures are the range, quartiles, and interquartile range; the five-number summary and boxplots; and the variance and standard deviation of the data.
Data Integration
Combining multiple data sources.
mode
Measure of central tendency that indicate the value that occurs most frequently in the set
measures of central tendency
Measure the location of the middle or center of a data distribution
observations
Observed values for a given attribute
Statistics
Studies the collection, analysis, interpretation or explanation, and presentation of data.
Data analytics
The science of examining raw data with the purpose of drawing conclusions about that information
Data mining functionalities
What is used to specify the kinds of patterns to be found in data mining tasks. Can be Descriptive or predictive. They include: Characterization, discrimination, frequent patterns, association and correlation, classification and regression, clustering, analysis and outlier analysis.
Data preprocessing
Where data are prepared for mining. Include data Cleaning, data integration, data selection and data transformation.
roll up
With an OLAP report, allows to look at coarser, "big picture" data by dropping one or more dimensions or climbing up along the dimension hierarchie
drill down
With an OLAP report, to further divide the data into more detail
Classification
the process of finding a model (or function) that describes and distinguishes data classes or concepts
type
the set of possible values—nominal, binary, ordinal, or numeric—the attribute can have
transactional database
A database designed to keep track of the day-to-day transactions of an organization (e.g.: a customer's purchase, a flight booking, or a user's clicks on a web page.)
Nominal Attribute
A qualitative attribute for which values are symbols or names of things. Each value represents some kind of category, code, or state. Also referred to as categorical.
binary attribute
A qualitative attribute with only two categories or states: 0 or 1. Referred to as Boolean if the two states correspond to true and false
ordinal attribute
A qualitative attribute with possible values that have a meaningful order or ranking among them, but the magnitude between successive values is not known
numeric attribute
A quantitative attribute that is a measurable quantity, represented in integer or real values. Can be interval-scaled or ratio-scaled.
Data warehouse
A repository of multiple heterogeneous data sources organized under a unified schema at a single site to facilitate management decision making. Constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.
attribute vector
A set of attributes used to describe a given object is called an attribute vector ...
data characterization
A summarization of the general characteristics or features of a target class of data.
discrete attribute
An attribute with a finite or countably infinite set of values. May or may not be represented as integers
continuous attribute
An attribute with an undefined or infinite set of values.
Outlier Analysis
Analysis of a data set may contain objects that do not comply with the general behavior or model of the data. Also called anomaly mining.
OLAP (Online analytical processing)
Analysis techniques with functionalities such as summarization, consolidation, and aggregation, as well as the ability to view information from different angles. Use for intelligence processing.
Database systems research
Focuses on the creation, maintenance, and use of databases for organizations and end-users. Often well known for their high scalability in processing very large, relatively structured data sets.
Machine Learning
Investigates how computers can learn (or improve their performance) based on data. A main research area is for computer programs to automatically learn to recognize complex patterns and make intelligent decisions based on data.
Data Cleaning
To remove noise and inconsistent data.