Chapter 9 Data Warehousing

Ace your homework & exams now with Quizwiz!

NoSQL

A NoSQL (originally referring to "non SQL", "non relational" or "not only SQL") database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.

Operational data store (ODS)

An integrated, subject-oriented, continuously updatable, current-valued (with recent history) enterprise wide, detailed database designed to serve operational users as they do decision support processing.

rule discovery

Association - looking for patterns where one event is connected to another event

data-mining techniques

Data mining is sorting through data to identify patterns and establish relationships.

data scrubbing

Data scrubbing, also called data cleansing, is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated. A process of using pattern recognition and other artificial intelligence techniques to upgrade the quality of raw data before transforming and moving the data to the data warehouse.

ETL

Extract transform load (ETL) is the process of extraction, transformation and loading during database use, but particularly during data storage use. It includes the following sub-processes: Retrieving data from external data storage or transmission sources. extraction and loading happens periodically

mySQL

MySQL is an open source relational database management system. Information in a MySQL database is stored in the form of related tables. MySQL databases are typically used for web application development (often accessed using PHP)

on-line analytical processing (OLAP)

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.

PHP

PHP is a script language and interpreter that is freely available and used primarily on Linux Web servers. PHP, originally derived from Personal Home Page Tools, now stands for PHP: Hypertext Preprocessor, which the PHP FAQ describes as a "recursive acronym."

logical data mart:

RFID GPS

case reasoning

Reasoning that adapts previous solutions for similar problem in solving new problem in hand

GPS

The GPS (Global Positioning System) is a "constellation" of approximately 30 well-spaced satellites that orbit the Earth and make it possible for people with ground receivers to pinpoint their geographic location.

data visualisation

The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

Enterprise data warehouse (EDW)

a centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

logical data mart

a data mart created by a relational view of a data warehouse

dependent data mart

a data mart exclusively from an enterprise data warehouse and its reconciled data

independent data mart

a data mart filled with data extracted from the operational environment, without the benefit of a data warehouse.

RFID

allowing these kinds of opportunities for real-time data warehousing (with massive amount of data) coupled with real-time analytics to be used to greatly reduce the latency between event data capture and appropriate actions being taken.

real-time data warehouse

an enterprise data warehouse that accepts near-real-time feeds of transactional data from the systems of record, analyzes warehouse data, and in near-real-time relays business rules to the warehouse and systems of record so that immediate action can be taken in response to business events.

snowflake schema

an expanded version of a star schema in which dimension tables are normalized into several related tables.

fact tables

contain factual or quantitive data (measurements that are numerical, continuously valued, and additive) about a business, such as units sold, orders booked, and so on.

transient data

data in which changes to existing records are written over previous records, thus destroying the previous data content.

periodic data

data that are never physically altered or deleted once they have been added to the store.

derived data

data that have been selected, formatted, and aggregated for end-user decision support applications.

event data

data warehouse likely containing history of snapshots of status data or a summary of transaction. represent transactions, stored for a defined period but then deleted or archived to save storage.

Corporate Information Factory (CIF)

dependent data mart and operational data store architecture

reconciled data

detailed, current data intended to be the single, authoritative source for all decision support applications

dimension

hold descriptive data (context) about the subjects of the business. usually the source of attributes used to qualify, categorize, or summarize facts in queries, reports, or graphs.

status data

most of data stored in databases.

conformed dimension

one or more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table.

operational systems

operational systems is used to run a business in real time, based on current data also called a system of record.

grain

the level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements.

duration of the database

the natural duration is about 13 months or 5 calendar quarters, which is sufficient to see annual cycles in the data. some businesses, such as financial institutions, have a need for longer durations.

limitations of independent data mart

- separate ETL process for each data mart > redundant data and processing - inconsistency between data marts - difficult to drill down for related facts between data marts - excessive scaling costs are more applications are built - high cost obtaining consistency between marts

DSS schema

A decision support system (DSS) is a computerized information system used to support decision-making in an organization or a business. A DSS lets users sift through and analyze massive reams of data and compile information that can be used to solve problems and make better decisions.

data mart

a data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systems.

star schema

a simple database design in which dimensional data are separated from fact or event data. a dimensional model is another name for a star schema.

informational systems

a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications.

clusters in data mining

Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.

data warehousing

the process whereby organizations create and maintain data warehouses and extract meaning from and help inform decision making through the use of data in the data warehouses.

artificial intelligence

the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.


Related study sets

Chapter 24 - The Digestive System

View Set

*Psychology Experience Psychology Ch. 5 and 2 others

View Set

Ch 9 Prep U Questions, Ch. 9, Chapter 9: Teaching and Counseling

View Set