Chapter 10 Data Quality and Integration
data capture controls
Data capture is the retrieval of information from a document using methods other than data entry. The utility of data capture is the ability to automate this information retrieval where data entry would be inefficient, costly or inapplicable.
Data Federation
a technique for data integration that provides a virtual view of integrated data without actually creating one centralized database.
Master Data Management
disciplines, technologies, and methods used to ensure the currency, meaning, and quality of reference data within and across various subject areas.
data propagation
duplicates data across databases, usually with near-real-time delay. updates can be synchronous or asynchronous which decouples the updates to the remote copies. Major advantage is near-real-time cascading of data changes throughout the organization. To handle frequent updates. Real-time data warehousing applications require data propagation.
Data governance
high-level organizational groups and processes that oversee data stewardship across the organization. Usually guides data quality initiatives, data architecture, data integration, and master data management, data warehousing and business intelligence, and other data-related matters.
loading data into a data warehouse
Capture/Extract...obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse
data that are accurate, consistent, and available in a timely fashion
Data quality. SOX required.
static extract
a method of capturing a snapshot of the required source data at a point in time.
incremental extract
a method of capturing only the changes that have occurred in the source data since the last capture.
data steward
a person assigned the responsibility of ensuring that organizational applications properly support the organization's enterprise goals for the data quality.
incremental backup
a security copy that contains only those files that have been altered since the last full backup.
update mode
an approach to filling a data warehouse in which only changes in the source data are written to the data warehouse.
refresh mode
an approach to filling a data warehouse that involves bulk rewriting of the target data at periodic intervals.
dropdown lists
data capture step. when data is entered manually , ensure it is selected from preset options, via dropdown menus.
Data quality
important to minimize IT project risks, make timely business decisions, ensure regulatory compliance, and expand the customer base. - uniqueness - accuracy - consistency - completeness - timeliness - currency - conformance - referential integrity Mandated by SOX and Base1 II Accord.
Changed data capture (CDC)
technique that indicates which data have changed since the last data integration activity.
data transformation
the component of data reconciliation that converts data from the format of the source operational systems to the format of the enterprise data warehouse.
joining
the process of combining data from various sources into a single table or view
selection
the process of partitioning data according to predefined criteria
data aggregating
the process of transforming data from a detailed level to a summary level.
Transforming Quality Management (TQM)
total quality management. principles: defect prevention, continuous improvement of the processes that touch data, the use of enterprise data standards. Balances a focus on the customer and the product or service, results in decreased costs, increased profits, and reduced risks. builds on a strong foundation of measurements.
Informational and operational data
transient-not historical, not normalized, restricted in scope-not comprehensive, sometimes poor quality-inconsistencies and errors.