Data Mining

Ace your homework & exams now with Quizwiz!

Interval-scaled attribute

measured on a scale of equal-size units

Frequent Itemset

set of items that often appear together in a transactional data set—for example, milk and bread

attribute

Data field, representing a characteristic or feature of a data object. Term often used interchangeably with "dimension", "feature", and "variable"

Cluster Analysis

1. Analyzes data objects without consulting class labels. 2. Statistical techniques identify groups of entities that have similar characteristics

Web search engine

Specialized computer server that searches for information on the Web. The search results of a user query are often returned as a list (sometimes called hits)

Information retrieval

The science of searching for documents or information in documents. Assumes that (1) the data under search are unstructured; and (2) the queries are formed mainly by keywords, which do not have complex structures.

ratio-scaled attribute

a numeric attribute with an inherent zero-point and for which we can speak of a value as being a multiple of another value

ETL (extraction, transformation, and loading)

1. Extract is the process of reading data that is assumed to be important. The data can be either raw collected from multiple and different types of sources or taken from a source database. 2. Transform is the process of converting the extracted data from its previous format into the format required by another database. The transformation occurs by using rules or lookup tables or by combining the data with other data. 3. Load is the process of writing the data into the target database, data warehouse or another system

Data mining

1. Often used to refer to the entire knowledge discovery process, which consist of discovering interesting patterns and knowledge from large amounts of data. 2. As a step, it is an essential process where intelligent methods are applied to extract data patterns.

Database management system (DBMS, also called Database system)

A collection of interrelated data, known as a database, and a set of software programs to manage and access the data.

data discrimination

A comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes.

data cube

A multidimensional data structure used to store and manipulate data in a multidimensional DBMS. Each dimension corresponds to an attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure.

class/concept descriptions

Allows to describe individual classes and concepts in summarized, concise, and yet precise terms.

OLTP (online transaction processing)

Capturing and storing data from ERP, CRM, POS, and other day-to-day business transactions. The main focus is on efficiency of routine tasks.

Measure of data dispersion

Measure how are the data spread out. The most common ... measures are the range, quartiles, and interquartile range; the five-number summary and boxplots; and the variance and standard deviation of the data.

Data Integration

Combining multiple data sources.

mode

Measure of central tendency that indicate the value that occurs most frequently in the set

measures of central tendency

Measure the location of the middle or center of a data distribution

observations

Observed values for a given attribute

Statistics

Studies the collection, analysis, interpretation or explanation, and presentation of data.

Data analytics

The science of examining raw data with the purpose of drawing conclusions about that information

Data mining functionalities

What is used to specify the kinds of patterns to be found in data mining tasks. Can be Descriptive or predictive. They include: Characterization, discrimination, frequent patterns, association and correlation, classification and regression, clustering, analysis and outlier analysis.

Data preprocessing

Where data are prepared for mining. Include data Cleaning, data integration, data selection and data transformation.

roll up

With an OLAP report, allows to look at coarser, "big picture" data by dropping one or more dimensions or climbing up along the dimension hierarchie

drill down

With an OLAP report, to further divide the data into more detail

Classification

the process of finding a model (or function) that describes and distinguishes data classes or concepts

type

the set of possible values—nominal, binary, ordinal, or numeric—the attribute can have

transactional database

A database designed to keep track of the day-to-day transactions of an organization (e.g.: a customer's purchase, a flight booking, or a user's clicks on a web page.)

Nominal Attribute

A qualitative attribute for which values are symbols or names of things. Each value represents some kind of category, code, or state. Also referred to as categorical.

binary attribute

A qualitative attribute with only two categories or states: 0 or 1. Referred to as Boolean if the two states correspond to true and false

ordinal attribute

A qualitative attribute with possible values that have a meaningful order or ranking among them, but the magnitude between successive values is not known

numeric attribute

A quantitative attribute that is a measurable quantity, represented in integer or real values. Can be interval-scaled or ratio-scaled.

Data warehouse

A repository of multiple heterogeneous data sources organized under a unified schema at a single site to facilitate management decision making. Constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.

attribute vector

A set of attributes used to describe a given object is called an attribute vector ...

data characterization

A summarization of the general characteristics or features of a target class of data.

discrete attribute

An attribute with a finite or countably infinite set of values. May or may not be represented as integers

continuous attribute

An attribute with an undefined or infinite set of values.

Outlier Analysis

Analysis of a data set may contain objects that do not comply with the general behavior or model of the data. Also called anomaly mining.

OLAP (Online analytical processing)

Analysis techniques with functionalities such as summarization, consolidation, and aggregation, as well as the ability to view information from different angles. Use for intelligence processing.

Database systems research

Focuses on the creation, maintenance, and use of databases for organizations and end-users. Often well known for their high scalability in processing very large, relatively structured data sets.

Machine Learning

Investigates how computers can learn (or improve their performance) based on data. A main research area is for computer programs to automatically learn to recognize complex patterns and make intelligent decisions based on data.

Data Cleaning

To remove noise and inconsistent data.


Related study sets

International MGT 334-801 FINAL ch 14,15

View Set

Female Reproductive III (Mammary Glands)

View Set

Chapter 10: Assessment of Aptitude

View Set

BioAP Chapter 13 Meiosis and Sexual Life Cycles

View Set

Chapter 43: Caring for Clients with Ear Disorders

View Set

OB: Chapter 22: Nursing Management of the Postpartum Woman at Risk

View Set