ISYS 464 Chapter 10 Data Quality & Integration

¡Supera tus tareas y exámenes ahora con Quizwiz!

Improving Data Quality: Apply Modern Data Management Principles and Technology (2)

1)Software tools for analyzing and correcting data quality problems: -Pattern matching -Fuzzy Logic -Expert systems 2)Sound data modeling and database design

Data Integration techniques: Data Federation

Data Federation: Enterprise Information Integration (EII) -Provides virtual view of data without actually creating once centralized database

Data Governance

High-Level Org Group and processes overseeing data stewardship across the organization.

Multi-Field Transformation: 1-to-M

One source to many targets.

Improving Data Quality: Improve Data Capture Process (5)

1)Automate data entry as much as possible 2)Manual data entry should be selected from preset options 3)Use trained operators when possible 4)Follow good user interface design principles 5)Immediate data validation for entered data

Reconciled Data Layer: After ETL (6)

1)Detailed: not summarized yet 2)Historical: periodic 3)Normalized: 3rd normal form or higher 4)Comprehensive: enterprise-wide perspective 5)Timely: data should be current enough to assist decision-making 6)Quality controlled: accurate with full integrity

Poor Data Quality Causes (3)

1)External data Sources: -Lack of control over data quality 2)Redundant data storage and inconsistent metadata: -proliferation of databases with uncontrolled redundancy and metadata 3)Data entry: -poor data capture controls Lack of organizational commitment -don't recognize poor data quality as an organizational issue

The ETL Process: Step 1 Data Reconciliation (3)

1)Extract and capture: obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse 2)Static Extract: capturing a snapshot of the source data at a point in time 3)Incremental Extract: capturing changes that have occurred since the last static extract (i.e., CDC)

The ETL Process (4)

1)Extract/Capture 2)Scrub or data cleansing 3)Transform 4)Load and index

Improving Data Quality (6)

1)Get business buy-in 2)Perform data quality audit 3)Establish data stewardship program 4)Improve data capture processes 5)Apply total quality management (TQM) practices 6)Apply modern DBMS technology

Improving Data Quality: Business Buy In (5)

1)Includes Executive Sponsorship 2)Building a business case 3)Prove a Return On Investment (ROI) 4)Avoidance of cost 5)Avoidance of opportunity loss

The ETL Process: Step 4 Data Reconciliation (3)

1)Load/Index: Place transformed data into the warehouse and create indexes 2)Refresh mode: bulk rewriting of target data at periodic intervals 3)Update mode: only changes in source data are written to data warehouse

Purpose of data quality (4)

1)Minimize IT Project Risk 2)Make Timely Business Decisions 3)Ensure Regulatory Compliance Example: SOX (Sarbanes-Oxley) 4)Expand Customer Base

Improving Data Quality: Data Stewardship Program* (5)

1)Oversight of data stewardship program 2)Manage data subject area 3)Oversee data definitions 4)Oversee production of data 5)Oversee use of data Question to which there is no single correct answer: Who should a data steward report to? the business unit or the IS/IT department?

The ETL Process: Step 2 Data Reconciliation (3)

1)Scrub/Cleanse: Uses pattern recognition and AI techniques to upgrade data quality. 2)Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies 3)Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

Data Governance Requirements (4)

1)Sponsorship from both senior management and business units 2)A data steward manager to coordinate all of the data stewards 3)Data stewards for different business units, subjects, and/or source systems 4)A governance committee to provide data management guidelines and standards

Improving Data Quality: Data Quality Audit (5)

1)Statistically profile all data files 2)Document the set of values for all fields. -ex: age range of all SFSU students 3)Analyze data patterns (distribution, outliers, frequencies) -ex: is there really a 118 year old student? 4)Verify whether controls and business rules are enforced -ex:Ben & Jerry's salary rule 5)Use specialized data profiling tools

The ETL Process: Step 3 Data Reconciliation (3)

1)Transform: Convert data from format of OS to format of data warehouse 2)Record-level: Selection-data partitioning Joining-data combining Aggregation-data summarization 3)Field-level: single-field-from one field to one field multi-field-from many fields to one, or one field to many

Reconciled Data Layer: Before ETL (4)

1)Transient: not historical 2)Not normalized (perhaps due to de-normalization for performance) 3)Restricted in scope: not comprehensive 4)Sometimes poor quality: inconsistencies and errors

Three main architectures supporting MDM (3)

1)identity registry: master data remains in each individual source system 2)integration hub: source systems "push" data changes to subscribers 3)persistent: one central source of the golden record is maintained

Data Steward

A person responsible for ensuring that organizational applications properly support the organization's data quality goals.

Single-Field Transformation: Algorithmic

Algorithmic transformation uses a formula or logical expression

Data Integration: CDC (2)

Changed data capture 1)Indicates which data have changed since previous data integration activity 2)example coming in topic of data warehouses

Data Integration techniques: Consolidation

Consolidation: Extract, Transform, and Load (ETL) - Consolidating all data into a centralized data (like the process used to load a data warehouse)

Data integration

Creates a unified view of business data Other possibilities: -application integration -business process -user interaction integration.

Data Integration techniques: Data propagation

Data propagation: enterprise application integration (EAI) and enterprise data replication (EDR) -intentionally duplicated data across databases with near real-time delay

Dimensions of Data Quality (8)

Example: How much has one customer spent purchasing our organization's products last year? 1)Uniqueness: Is there only one value stored in the database? 2)Accuracy: How many decimal places in the value? 3)Consistency: Same value derived from all methods? 4)Completeness: Includes ALL customer's transactions? 5)Timeliness: Is the value available NOW? 6)Currency: Includes transactions from one second ago? 7)Conformance: Data value is provided in US dollars? 8)Referential integrity: CustomerID is checked against valid customers?

Single-Field Transformation: Process

In general: some transformation process translates data from old form to new form

Multi-Field Transformation: M-to-1

Many sources rto one target.

Master Data Management

Master Data Management: The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data ("golden record") within and across various subject areas.

Single-Field Transformation: Table Lookup

Table lookup: another approach, uses a separate table keyed by source record code

Improving Data Quality: TQM Principles and Practices (2)

Total Quality Management 1)TQM Principles: -Defect prevention -Continuous improvement -Use of enterprise data standards 2)Balanced Focus: -Customer -Product/Service -Strong foundation of measurement


Conjuntos de estudio relacionados

Anxiety and Obsessive-Compulsive Related Disorders

View Set

PrepU Health Assess Ch. 1 Assignment 1

View Set

System Integration and Performance Questions

View Set

Business Education: Content Knowledge (5101)

View Set

Match the Terms in column B with the appropriate description in column A

View Set

Chapter 2: Principles of Physical Fitness

View Set

Lección 14 Lesson Test 2 Study Guide (all corrected)

View Set