Data Governance Exam 1 Review
Companies Assume if Data Must be Properly Functioning, then Data Must be What?
Reliable and trustworthy
True or False? Data Quality is dependent on context and on needs of the data consumer.
True
True or False? One of the challenges in managing the quality of data is that expectations related to quality are not always communicated.
True
Data Quality Service Level Agreements
• A Data Quality SLA specifies a company's expectations for response and remediation for Data Quality issues in each system. • 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐜𝐨𝐧𝐭𝐫𝐨𝐥 𝐝𝐞𝐟𝐢𝐞𝐝 𝐢𝐧 𝐚 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐒𝐋𝐀 𝐢𝐧𝐜𝐥𝐮𝐝𝐞𝐬: ▪ Data elements covered by the agreement. ▪ Business impacts associated with the flaws. ▪ Data Quality dimensions associated with each data element. ▪ Data Stewards ▪ Timeliness and deadlines for expected resolution. ▪ Escalation strategy and penalties.
Data Quality Dimensions
• A Data Quality dimension is a measurable feature or characteristic of data. • Data Quality dimensions provide a vocabulary for defining Data Quality measurements. • 𝟐𝟎𝟏𝟑 𝐃𝐀𝐌𝐀 𝐔𝐊 𝐒𝐢𝐱 𝐂𝐨𝐫𝐞 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧𝐬: 1. Completeness 2. Uniqueness 3. Timeliness 4. Validity 5. Accuracy 6. Consistency
Data-Centric Organization
• A data-centric organization values data as an asset and manages data through all phases of its lifecycle, including project development and ongoing operations. • 𝐎𝐛𝐬𝐭𝐚𝐜𝐥𝐞𝐬 𝐭𝐨 𝐞𝐬𝐭𝐚𝐛𝐥𝐢𝐬𝐡𝐢𝐧𝐠 𝐚𝐧 𝐞𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐯𝐢𝐬𝐢𝐨𝐧 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐚𝐧𝐝 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭: ▪ Existing culture. ▪ Ambiguity with ownership. ▪ Budget competition. ▪ Legacy systems.
Data Profiling
• A form of data analysis used to inspect data and assess quality. • 𝐔𝐬𝐞𝐬 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐭𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 𝐭𝐨 𝐝𝐢𝐬𝐜𝐨𝐯𝐞𝐫 𝐭𝐡𝐞 𝐭𝐫𝐮𝐞 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞, 𝐜𝐨𝐧𝐭𝐞𝐧𝐭, 𝐚𝐧𝐝 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐨𝐟 𝐚 𝐝𝐚𝐭𝐚 𝐬𝐞𝐭: ▪ Count of nulls ▪ Max/min values ▪ Max/min length ▪ Frequency distribution ▪ Data type and format
DG Team Activities: Develop Business Glossary
• 𝐃𝐚𝐭𝐚 𝐒𝐭𝐞𝐰𝐚𝐫𝐝𝐬 𝐚𝐫𝐞 𝐠𝐞𝐧𝐞𝐫𝐚𝐥𝐥𝐲 𝐫𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐥𝐞 𝐟𝐨𝐫 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐠𝐥𝐨𝐬𝐬𝐚𝐫𝐲 𝐜𝐨𝐧𝐭𝐞𝐧𝐭. 𝐀 𝐠𝐥𝐨𝐬𝐬𝐚𝐫𝐲 𝐢𝐬 𝐧𝐞𝐜𝐞𝐬𝐬𝐚𝐫𝐲 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐩𝐞𝐨𝐩𝐥𝐞 𝐮𝐬𝐞 𝐰𝐨𝐫𝐝𝐬 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭𝐥𝐲. ▪ Developing a business glossary is part of Metadata Management. • 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐆𝐥𝐨𝐬𝐬𝐚𝐫𝐢𝐞𝐬 𝐇𝐚𝐯𝐞 𝐭𝐡𝐞 𝐅𝐨𝐥𝐥𝐨𝐰𝐢𝐧𝐠 𝐎𝐛𝐣𝐞𝐜𝐭𝐢𝐯𝐞𝐬: ▪ Enable common understanding of the core business concepts and terminology. ▪ Reduce the risk that data will be misused. ▪ Improve the alignment between technology and business. ▪ Make it easier for employees to access data.
Data Enhancement
• The process of adding attributes to a data set to increase its quality and usability. • 𝐄𝐱𝐚𝐦𝐩𝐥𝐞𝐬 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐞𝐧𝐡𝐚𝐧𝐜𝐞𝐦𝐞𝐧𝐭𝐬 𝐚𝐭 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬: ▪ Time/date stamps ▪ Audit data ▪ Reference vocabularies ▪ Contextual information ▪ Geographic information ▪ Demographic information ▪ Psychographic information ▪ Valuation information
Data Governance Metrics
• To minimize resistance and expedite adoption, a DG program must measure its success and demonstrate progress: • 𝐒𝐚𝐦𝐩𝐥𝐞 𝐌𝐞𝐭𝐫𝐢𝐜𝐬: ▪ 𝐕𝐚𝐥𝐮𝐞: Contribution to business objectives, reduction of risk, improved efficiency in operations. ▪ 𝐄𝐟𝐟𝐞𝐜𝐭𝐢𝐯𝐞𝐧𝐞𝐬𝐬: of communication, training, steward productivity, achievement of goals. ▪ 𝐒𝐮𝐬𝐭𝐚𝐢𝐧𝐚𝐛𝐢𝐥𝐢𝐭𝐲: performance of policies, conformance of standards.
Data Quality
• 𝐄𝐧𝐬𝐮𝐫𝐢𝐧𝐠 𝐭𝐡𝐚𝐭 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐨𝐟 𝐡𝐢𝐠𝐡 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐢𝐬 𝐜𝐞𝐧𝐭𝐫𝐚𝐥 𝐭𝐨 𝐝𝐚𝐭𝐚 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭. ▪ If businesses cannot rely on data to meet business needs, then the effort to collect, store, secure, and enable access to it is wasted. • 𝐏𝐨𝐨𝐫 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐜𝐨𝐬𝐭𝐥𝐲 𝐭𝐨 𝐚𝐧 𝐨𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧: ▪ Organizations spend between 10-30% of revenue handling data quality issues. ▪ IBM established the cost of poor quality of data in US in 2016 to be $3 trillion.
Generic Data Governance Model
• 𝐃𝐚𝐭𝐚 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐒𝐭𝐞𝐞𝐫𝐢𝐧𝐠 𝐂𝐨𝐦𝐦𝐢𝐭𝐭𝐞𝐞: ▪
Data Governance
• 𝐃𝐚𝐭𝐚 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐢𝐬 𝐝𝐞𝐟𝐢𝐧𝐞𝐝 𝐚𝐬 𝐭𝐡𝐞 𝐞𝐱𝐞𝐫𝐜𝐢𝐬𝐞 𝐨𝐟 𝐚𝐮𝐭𝐡𝐨𝐫𝐢𝐭𝐲 𝐚𝐧𝐝 𝐜𝐨𝐧𝐭𝐫𝐨𝐥 𝐨𝐯𝐞𝐫 𝐭𝐡𝐞 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐚𝐬𝐬𝐞𝐭𝐬. ▪ A management responsibility that ensures the most important information assets are available have integrity, are trustworthy, are secure, are used appropriately, and enable actionable insights. • Data Governance guides all other data management functions.
Data Mapping
• 𝐃𝐚𝐭𝐚 𝐌𝐚𝐩𝐩𝐢𝐧𝐠: ▪ The process of establishing traceability between data pieces. - E.g., Business term is Customer Status, Customer Status is stored in database X in a column called "cust_st", Customer Status us stored in database Y in a column called "customer_status." • Understanding how business terms map to their possible database locations is critical.
DG Team Activities: Assess Regulatory Compliance Requirements
• 𝐄𝐯𝐞𝐫𝐲 𝐞𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐢𝐬 𝐚𝐟𝐟𝐞𝐜𝐭𝐞𝐝 𝐛𝐲 𝐠𝐨𝐯𝐞𝐫𝐧𝐦𝐞𝐧𝐭𝐚𝐥 𝐚𝐧𝐝 𝐢𝐧𝐝𝐮𝐬𝐭𝐫𝐲 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐢𝐨𝐧𝐬, 𝐢𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐢𝐨𝐧𝐬 𝐭𝐡𝐚𝐭 𝐝𝐢𝐜𝐭𝐚𝐭𝐞 𝐡𝐨𝐰 𝐝𝐚𝐭𝐚 𝐚𝐧𝐝 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐚𝐫𝐞 𝐭𝐨 𝐛𝐞 𝐦𝐚𝐧𝐚𝐠𝐞𝐝. ▪ Part of the DG function is to monitor and ensure regulatory compliance. • 𝐏𝐨𝐭𝐞𝐧𝐭𝐢𝐚𝐥 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐃𝐆 𝐭𝐞𝐚𝐦: ▪ How is the regulation relevant? ▪ What constitutes compliance? ▪ How do we demonstrate compliance? ▪ What are the systems and data in scope for the regulation? ▪ What is the risk of and penalty for non-compliance?
Metadata Implementation Guidelines
• 𝐆𝐮𝐢𝐝𝐞𝐥𝐢𝐧𝐞𝐬: ▪ Follow an incremental, prioritized approach to minimize risk and business disruption. ▪ Metadata readiness assessment. ▪ Organizational and cultural change ▪ Common Metadata gathering and managing ▪ Utilize metrics.
Metadata for Big Data Ingestion
• 𝐌𝐚𝐧𝐲 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐩𝐫𝐨𝐟𝐞𝐬𝐬𝐢𝐨𝐧𝐚𝐥𝐬 𝐚𝐫𝐞 𝐟𝐚𝐦𝐢𝐥𝐢𝐚𝐫 𝐚𝐧𝐝 𝐜𝐨𝐦𝐟𝐨𝐫𝐭𝐚𝐛𝐥𝐞 𝐰𝐢𝐭𝐡 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐝𝐚𝐭𝐚 𝐬𝐭𝐨𝐫𝐞𝐬, 𝐰𝐡𝐞𝐫𝐞 𝐞𝐯𝐞𝐫𝐲 𝐢𝐭𝐞𝐦 𝐜𝐚𝐧 𝐛𝐞 𝐜𝐥𝐞𝐚𝐫𝐥𝐲 𝐢𝐝𝐞𝐧𝐭𝐢𝐟𝐢𝐞𝐝 𝐚𝐧𝐝 𝐭𝐚𝐠𝐠𝐞𝐝. ▪ Tagging data is a common practice to add meaning to data being ingested in the data lake. • 𝐃𝐚𝐭𝐚 𝐋𝐚𝐤𝐞𝐬: ▪ An emerging technology primarily used by data scientists. - Billions of rows worth of data. - Without Metadata to make sense, a data lake essentially becomes a data swamp.
What does Metadata Management Focus On?
• 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐅𝐨𝐜𝐮𝐬𝐞𝐬 𝐎𝐧: ▪ Documenting and managing organizational knowledge of data related business terminology in order to ensure people understand data content and can use data consistently. ▪ Collect and integrate Metadata from diverse sources to ensure people understand similarities and differences between data from different parts of the company. ▪ Provide standard ways to make Metadata accessible to consumers.
Metadata
• 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐢𝐧𝐜𝐥𝐮𝐝𝐞𝐬 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐚𝐛𝐨𝐮𝐭 𝐭𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐚𝐧𝐝 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐞𝐬, 𝐝𝐚𝐭𝐚 𝐫𝐮𝐥𝐞𝐬, 𝐚𝐧𝐝 𝐜𝐨𝐧𝐬𝐭𝐫𝐚𝐢𝐧𝐭𝐬, 𝐚𝐧𝐝 𝐥𝐨𝐠𝐢𝐜𝐚𝐥 𝐚𝐧𝐝 𝐩𝐡𝐲𝐬𝐢𝐜𝐚𝐥 𝐝𝐚𝐭𝐚 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐬. ▪ It is information about data's structure and meaning. • Without reliable Metadata, companies do not know what data they have, what it represents, where it originates from, who should access it, or its quality. • 𝐓𝐨 𝐛𝐞 𝐝𝐚𝐭𝐚 𝐝𝐫𝐢𝐯𝐞, 𝐚 𝐜𝐨𝐦𝐩𝐚𝐧𝐲 𝐦𝐮𝐬𝐭 𝐚𝐥𝐬𝐨 𝐛𝐞 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐝𝐫𝐢𝐯𝐞𝐧. ▪ An organization without Metadata is like a library without a card catalog.
Implementation Guidelines
• 𝐌𝐨𝐬𝐭 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧𝐬 𝐍𝐞𝐞𝐝 𝐭𝐨 𝐏𝐥𝐚𝐧 𝐟𝐨𝐫: ▪ Metrics on the value of data and cost of poor-quality data. ▪ Operating model for IT/business interactions. ▪ Changes in how projects are executed. ▪ Changes to business processes. ▪ Funding for remediation and improvement projects. ▪ Funding for Data Quality operations.
Most Common Business Drivers of Data Governance
• 𝐑𝐞𝐠𝐮𝐥𝐚𝐭𝐨𝐫𝐲 𝐜𝐨𝐦𝐩𝐥𝐢𝐚𝐧𝐜𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐜𝐨𝐦𝐦𝐨𝐧 𝐝𝐫𝐢𝐯𝐞𝐫 𝐟𝐨𝐫 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐢𝐧 𝐭𝐫𝐚𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥𝐥𝐲 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐞𝐝 𝐟𝐢𝐞𝐥𝐝𝐬 𝐥𝐢𝐤𝐞 𝐡𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞 𝐚𝐧𝐝 𝐟𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬. ▪ Efficiently satisfying regulatory requirements necessitates a form of governance. • Data science and big data have created an additional need for data governance.
Business Drivers of Metadata
• 𝐑𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐇𝐞𝐥𝐩𝐬: ▪ Increase confidence in data by providing context and enabling the measurement of quality. ▪ Increase value of strategic information by enabling multiple uses. ▪ Improve operational efficiency by identifying redundant data and processes. ▪ Prevent use of out of date or incorrect data. ▪ Reduce data search time. ▪ Improve communication between IT employees and data consumers. ▪ Create accurate impact analysis to reduce risk of project failure. ▪ Improve time to market by reducing systems development lifecycle. ▪ Support regulatory compliance.
What Goals do Successful Data Quality Programs Focus On?
• 𝐒𝐮𝐜𝐜𝐞𝐬𝐬𝐟𝐮𝐥 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐩𝐫𝐨𝐠𝐫𝐚𝐦𝐬 𝐟𝐨𝐜𝐮𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐛𝐞𝐥𝐨𝐰 𝐠𝐨𝐚𝐥𝐬: ▪ Developing a governed approach to make data fit for purpose based on data consumers' needs. ▪ Defining standards and specifications for Data Quality controls as part of the data lifecycle. ▪ Defining and implementing processes to measure, monitor, and report on Data Quality levels. ▪Identifying and advocating for opportunities to improve the quality of data.
What Principles do Successful Data Quality Programs Focus On?
• 𝐒𝐮𝐜𝐜𝐞𝐬𝐬𝐟𝐮𝐥 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐩𝐫𝐨𝐠𝐫𝐚𝐦𝐬 𝐟𝐨𝐜𝐮𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐛𝐞𝐥𝐨𝐰 𝐩𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞𝐬: ▪ 𝐂𝐫𝐢𝐭𝐢𝐜𝐚𝐥𝐢𝐭𝐲: - A DQ program should focus on the data most critical to the enterprise and its customers. ▪ 𝐋𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭: - The quality of data should be managed across the data lifecycle, from creation or procurement through disposal. This includes managing data as it moves within and between systems. ▪ 𝐏𝐫𝐞𝐯𝐞𝐧𝐭𝐢𝐨𝐧: - The focus of a DQ program should be on preventing data errors and conditions that reduce the usability of data. ▪ 𝐑𝐨𝐨𝐭-𝐂𝐚𝐮𝐬𝐞 𝐑𝐞𝐦𝐞𝐝𝐢𝐚𝐭𝐢𝐨𝐧: - Problems with the quality of data should be understood and addressed at their root causes, rather than just their symptoms. ▪ 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞: - DG activities must support the development of high-quality data and Data Quality program activities must support and sustain a governed data environment. ▪ 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝𝐬-𝐃𝐫𝐢𝐯𝐞𝐧: - All stakeholders in the data lifecycle have Data Quality requirements. ▪ 𝐎𝐛𝐣𝐞𝐜𝐭𝐢𝐯𝐞 𝐌𝐞𝐚𝐬𝐮𝐫𝐞𝐦𝐞𝐧𝐭 𝐚𝐧𝐝 𝐓𝐫𝐚𝐧𝐬𝐩𝐚𝐫𝐞𝐧𝐜𝐲: - Data Quality levels need to be measured objectively and consistently. ▪ 𝐄𝐦𝐛𝐞𝐝𝐝𝐞𝐝 𝐢𝐧 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐞𝐬: - Business process owners are responsible for the quality of data produced through their processes. They must enforce Data Quality standards in the process. ▪ 𝐒𝐲𝐬𝐭𝐞𝐦𝐚𝐭𝐢𝐜𝐚𝐥𝐥𝐲 𝐄𝐧𝐟𝐨𝐫𝐜𝐞𝐝: - System owners must systematically enforce Data Quality requirements. ▪ 𝐂𝐨𝐧𝐧𝐞𝐜𝐭𝐞𝐝 𝐭𝐨 𝐒𝐞𝐫𝐯𝐢𝐜𝐞 𝐋𝐞𝐯𝐞𝐥𝐬: - DQ reporting and issues management should be incorporated into Service Level Agreements (SLA).
Technical Focus of Lineage
• 𝐓𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐅𝐨𝐜𝐮𝐬 𝐨𝐟 𝐋𝐢𝐧𝐞𝐚𝐠𝐞: ▪ Builds on business lineage to try to capture all systems which share the data elements. • What happens if Customer Status needs an additional value like "expired"?
Business Drivers of Good Quality Data
• 𝐓𝐡𝐞 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐝𝐫𝐢𝐯𝐞𝐫𝐬 𝐟𝐨𝐫 𝐞𝐬𝐭𝐚𝐛𝐥𝐢𝐬𝐡𝐢𝐧𝐠 𝐚 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐩𝐫𝐨𝐠𝐫𝐚𝐦 𝐢𝐧𝐜𝐥𝐮𝐝𝐞: ▪ Increasing the value of the organizational data and the opportunities to use it. ▪ Reducing risks and costs associated with poor Data Quality. ▪ Improving organizational efficiency and productivity. ▪ Protecting and enhancing the organization's reputation. • Poor Data Quality often results in fines, lost revenue, lost customers, and negative media exposure.
Sources of Metadata
• 𝐓𝐡𝐞 𝐦𝐚𝐣𝐨𝐫𝐢𝐭𝐲 𝐨𝐟 𝐦𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐢𝐬 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐚𝐬 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐞𝐝. ▪ The key to using the Metadata is to collect it in a usable form. • Business Metadata requires stewards with good writing and facilitation skills to develop enterprise definitions of data in a business glossary.
Goals of Data Governance
• 𝐓𝐨 𝐞𝐧𝐚𝐛𝐥𝐞 𝐭𝐡𝐞 𝐜𝐨𝐦𝐩𝐚𝐧𝐲 𝐭𝐨 𝐦𝐚𝐧𝐚𝐠𝐞 𝐝𝐚𝐭𝐚 𝐚𝐬 𝐚𝐧 𝐚𝐬𝐬𝐞𝐭, 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐠𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐩𝐫𝐨𝐠𝐫𝐚𝐦 𝐦𝐮𝐬𝐭 𝐛𝐞: ▪ Sustainable. ▪ Embedded. ▪ Measured.
Data as an Organizational Asset
𝐀𝐬𝐬𝐞𝐭: • 𝐀𝐧 𝐚𝐬𝐬𝐞𝐭 𝐢𝐬 𝐚𝐧 𝐞𝐜𝐨𝐧𝐨𝐦𝐢𝐜 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞 𝐭𝐡𝐚𝐭 𝐩𝐫𝐨𝐝𝐮𝐜𝐞𝐬 𝐯𝐚𝐥𝐮𝐞. 𝐖𝐡𝐢𝐥𝐞 𝐰𝐞 𝐜𝐚𝐧 𝐦𝐨𝐧𝐞𝐭𝐢𝐳𝐞 𝐟𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐚𝐬𝐬𝐞𝐭𝐬, 𝐦𝐨𝐧𝐞𝐭𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐬𝐭𝐢𝐥𝐥 𝐞𝐯𝐨𝐥𝐯𝐢𝐧𝐠. ▪ Profit & Loss statements could soon feature data as an asset. • As organizations increasingly depend on data, the value of data will be better established. Data driven organizations rely on data, not instincts to make better decisions.
Benefits of Good Data Quality
𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬: • Improved customer experience • Higher productivity • Reduced risk • Increased revenue • Competitive advantage gained from insights on customers, products, etc.
Data Management
𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭: • Development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles. • Companies now recognize their data as a strategic asset where they can learn insights about customers, products, internal processes, etc. • 𝐃𝐚𝐭𝐚 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐢𝐞𝐬 𝐚𝐫𝐞 𝐛𝐫𝐨𝐚𝐝: ▪ Profiling data, aligning data to regulations, extracting data, defining data.
Data Modeling
𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠: • The basic building blocks of all data models are entities, attributes, and relationships. • Entity - person, place, thing, or event for which data. • Attribute - characteristic of an entity. • Relationships - describe the association among entities: ▪ 𝐎𝐧𝐞-𝐭𝐨-𝐦𝐚𝐧𝐲 (𝟏:𝐌): - A painter paints many different paintings, but each one is painted by only one painter. ▪ 𝐌𝐚𝐧𝐲-𝐭𝐨-𝐚𝐧𝐲 (𝐌:𝐌): - Employee learns many job skills, each job skill might be learned by many employees. ▪ 𝐎𝐧𝐞-𝐭𝐨-𝐨𝐧𝐞 (𝟏:𝟏): - A Verizon manager oversees one store.
Data Warehouse
𝐃𝐚𝐭𝐚 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞: • A home for high value data or data assets that originates in other corporate applications. • 𝐃𝐚𝐭𝐚 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐢𝐧𝐠: ▪ A coordinated and periodic copying of data from various sources both inside and outside the company, into an environment optimized for analytical and informational processing. • Stores current and historical data • Provides analysis and reporting tools • Relies exclusively on data obtained from other applications.
Difference Between Data and Information
𝐃𝐚𝐭𝐚: • Raw facts with no context. ▪ E.g., Name: John Smith Age: 20 DOB: 3/13/2000 Major: Finance Salary: $20,000 𝐈𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧: • Processed data arranged in a meaningful way to facilitate decision making. ▪ E.g. "At age 20, John was earning $1,000 a month while studying finance." 𝐊𝐞𝐲 𝐏𝐨𝐢𝐧𝐭𝐬: • Data constitutes the building blocks of information. • Information is produced by processing data • Information is used to reveal the meaning • Accurate and timely information is essential to good decision-making.
What is a Database?
𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞: • A database is a shared, integrated software structure that houses a collection of: ▪ Data (raw facts of interest) ▪ Metadata (Data about data or information about data's structure and meaning). 𝐃𝐁𝐌𝐒: • Database Management System (DBMS) is a collection of programs that manage the database and control its access, making it possible to share the data with applications and users.
Goals of Data Management
𝐆𝐨𝐚𝐥𝐬: • Understanding and supporting the information needs of the enterprise and stakeholders. • Capturing, storing, protecting, and ensuring privacy of data. • Ensuring quality of data and information. • Ensuring privacy and confidentiality of data. • Preventing unauthorized access, manipulation or use of data and information. • Ensuring data can be used effectively to add value to the organization.
Principles of Data Management
𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞𝐬: 1. Data is an asset with unique properties. 2. The value of data should be expressed in economic terms. 3. Managing data means managing the quality of data. 4. It takes metadata to manage data. ▪ Managing any asset requires having data about that asset (number of employees, accounting codes, etc.) 5. It takes planning to manage data. 6. Data management is cross-functional; it requires a range of skills and expertise. ▪ Data management requires both technical and non-technical skills and the ability to collaborate. 7. Data management requires an enterprise perspective. 8. Data management is lifecycle management. 9. Managing data includes managing risks associated with data. ▪ In addition to being an asset, data also represents risk to an organization. Data can be lost, stolen, or misused. 10. Data management requirements must drive IT solutions.
Data Management is Cross-Functional
𝐓𝐡𝐞 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: • Data is managed in different places within an organization by teams that have responsibility for different phases of the data lifecycle. 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐒𝐤𝐢𝐥𝐬: • Design skills to plan for systems. • Technical skills to administer hardware and build software. • Data analysis skills to understand and interpret data. • Strategic thinking to see opportunities to serve customers. • The challenge for managers is getting employees with the above range skills to recognize how the pieces fit together and collaborate to achieve common goals.
DG Team Activities: Develop DG Strategy
• A good strategy defines scope and approach of DG efforts. • 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐲 𝐜𝐨𝐧𝐭𝐞𝐧𝐭𝐬 𝐝𝐞𝐩𝐞𝐧𝐝𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐧𝐞𝐞𝐝𝐬 𝐭𝐡𝐞 𝐨𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐦𝐢𝐠𝐡𝐭 𝐢𝐧𝐜𝐥𝐮𝐝𝐞: ▪ 𝐂𝐡𝐚𝐫𝐭𝐞𝐫: - Identifies the business drivers, vision, mission, and principles for DG, including readiness assessment, internal process discovery, and current issues or success criteria. ▪ 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐚𝐧𝐝 𝐀𝐜𝐜𝐨𝐮𝐧𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: - Defines structure and responsibility for DG activities. ▪ 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐑𝐨𝐚𝐝𝐦𝐚𝐩: - Timeframes for the rollout policies and directives, business glossary, architecture, asset valuation, standards and procedures, expected changes to business and technology processes, and deliverables to support auditing activities and regulatory compliance. ▪ 𝐏𝐥𝐚𝐧 𝐟𝐨𝐫 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐒𝐮𝐜𝐜𝐞𝐬𝐬: - Describing a target state of sustainable DG activities.
Data Cleansing
• Also known as "scrubbing," transforms data to make it conform to data standards and rules. • Includes detecting and correcting data errors to bring quality of data to an acceptable level. • Cleansing should decrease over time as root causes are discovered and resolved. • 𝐓𝐡𝐞 𝐧𝐞𝐞𝐝 𝐟𝐨𝐫 𝐜𝐥𝐞𝐚𝐧𝐬𝐢𝐧𝐠 𝐜𝐚𝐧 𝐛𝐞 𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐞𝐝 𝐛𝐲: ▪ Implementing controls to prevent data entry errors. ▪ Correcting data in the source system. ▪ Improving the business processes that create the data.
Business Intelligence, Big Data, and Metadata
• Analysts spend the majority of their time searching for data. • In order to build accurate prediction models, accurate data is required. • The first step in analyzing data is easily obtaining reliable data that is permitted for use.
Data Quality Reporting
• Assessing and resolving Data Quality issues will not benefit the company unless there is an effective sharing method. • 𝐑𝐞𝐩𝐨𝐫𝐭𝐢𝐧𝐠 𝐒𝐡𝐨𝐮𝐥𝐝 𝐅𝐨𝐜𝐮𝐬 𝐎𝐧: ▪ Data Quality scorecard. ▪ Data Quality trends. ▪ SLA metrics. ▪ Data Quality issue management. ▪ Positive effects of improvement projects.
Data Management & Technology
• Because almost all of today's data is stored electronically, data management tactics are strongly influenced by technology. • Data requirements aligned with business strategy should drive decisions about technology. • Surveys consistently show that a
Data Quality Business Rule Types
• Business rules describes how companies should operate internally to be successful and compliant. • Data Quality business rules describe how data should exist in order to be useful and usable. • Business rules can be aligned with dimensions of Data Quality to describe Data Quality needs: • 𝐂𝐨𝐦𝐦𝐨𝐧 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐑𝐮𝐥𝐞𝐬 𝐓𝐲𝐩𝐞𝐬: ▪ Definitional conformance. ▪ Value presence and record completeness. ▪ Format compliance. ▪ Value domain membership. ▪ Range conformance. ▪ Mapping conformance. ▪ Consistency rules. ▪ Accuracy verification. ▪ Uniqueness verification. ▪ Timeliness verification.
The Data Lifecycle
• Customers should be made aware of how their data is managed. 𝐓𝐡𝐞 𝐋𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞: • 𝐓𝐡𝐞 𝐟𝐨𝐜𝐮𝐬 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐨𝐧 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞 𝐡𝐚𝐬 𝐬𝐞𝐯𝐞𝐫𝐚𝐥 𝐢𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐢𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬: ▪ 𝐂𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐦𝐮𝐬𝐭 𝐡𝐚𝐯𝐞 𝐚 𝐯𝐢𝐬𝐢𝐨𝐧 𝐟𝐨𝐫 𝐡𝐨𝐰 𝐝𝐚𝐭𝐚 𝐰𝐢𝐥𝐥 𝐛𝐞 𝐮𝐬𝐞𝐝. ▪ 𝐂𝐫𝐞𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐮𝐬𝐚𝐠𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐜𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐩𝐨𝐢𝐧𝐭𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞. - It costs money to produce data. Data is valuable only when it is consumed or applied. ▪ 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐦𝐚𝐧𝐚𝐠𝐞𝐝 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞: - Low quality data represents cost and risk, rather than value. - Organizations often find it challenging to manage the quality of data because data is often created as a by-product or operation processes and organizations often do not set standards for quality. ▪ 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐦𝐚𝐧𝐚𝐠𝐞𝐝 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞: - Because Metadata is a form of data, and because organizations rely on it to manage other data, Metadata quality must be managed in the same way as quality of other data. ▪ 𝐃𝐚𝐭𝐚 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐦𝐚𝐧𝐚𝐠𝐞𝐝 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐥𝐢𝐟𝐞𝐜𝐲𝐜𝐥𝐞: - Data requires protection, data must be protected throughout its lifecycle, from creation to disposal. ▪ 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐞𝐟𝐟𝐨𝐫𝐭𝐬 𝐬𝐡𝐨𝐮𝐥𝐝 𝐟𝐨𝐜𝐮𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐜𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐝𝐚𝐭𝐚: - Organizations should focus on the most critical data and minimizing data ROT (Data that is Redundant, Obsolete, Trivial) because there is so much data.
DG Team Activities: Define Data Governance for the Company
• DG efforts must support business strategy and goals. • Successful DG requires a clear understanding of what is being governed and who is being governed, as well as who is governing. • DG activities cross departmental and technological boundaries for integrated view of data. • DG is successful when it's adopted enterprise wide instead of in siloed departments.
DG Team Activities: Underwrite Data Management Projects
• DG employees should be engaged in software development projects to help manage data requirements. • Every project with a significant data component should capture Data Management requirements early in the SDLC. • 𝐄𝐱𝐚𝐦𝐩𝐥𝐞: Building a New Reporting UI: ▪ Where should the software developers obtain the data? ▪ What if there are issues with the data? ▪ Should the data be visible to everyone? ▪ Can we export the data externally?
Scope and Focus of Data Governance
• Data Governance focuses on how decisions are made about data and how people and processes are expected to behave in relation to data. • The scope will depend on the company's needs, but the program will develop enterprise policies, data stewardship, and change management.
What Characteristics does Data Quality Refer to?
• Data Quality refers both to the characteristics associated with high quality data and to the processes used to measure or improve the quality of data.
Data Stewardship
• Data Stewardship is the accountability and responsibility for data and processes that ensure effective control and use of a particular data asset. • Data Stewards manage data assets on behalf of others and in the company's best interest.
Typical DBMS Capabilities
• Data dictionary • Query and reporting • Data presentation for different users • Backup and recovery • Data integrity management
DG Team Activities: Perform Discovery and Business Alignment
• Discovery to identify and assess existing policies, risks they address, encouraged behaviors, and opportunities to identify the usefulness of data. • Data Quality (DQ) analysis is part of discovery; it provides insights into existing issues and risks associated with poor DQ.
DG Team Activities: Develop Organizational Touchpoints
• Establishing work relationships with other departments is vital to the success of the program: ▪ 𝐏𝐫𝐨𝐜𝐮𝐫𝐞𝐦𝐞𝐧𝐭 𝐚𝐧𝐝 𝐂𝐨𝐧𝐭𝐫𝐚𝐜𝐭𝐬: - The CDO works with Vendor/Partner Management or Procurement to develop and enforce standard contract language vis-a-vis data management contracts. ▪ 𝐁𝐮𝐝𝐠𝐞𝐭 𝐚𝐧𝐝 𝐅𝐮𝐧𝐝𝐢𝐧𝐠: - If the CDO is not directly in control of all data acquisition related budgets, then the office can be a focal point for preventing duplicate efforts and ensuring optimization of acquired data assets. ▪ 𝐑𝐞𝐠𝐮𝐥𝐚𝐭𝐨𝐫𝐲 𝐂𝐨𝐦𝐩𝐥𝐢𝐚𝐧𝐜𝐞: - The COO understands and works within required local, national, and international regulatory environments, and how these impact the organization and their data management activities. ▪ 𝐒𝐃𝐋𝐂 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭/𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤: - The DG program identifies control points where enterprise policies, processes, and standards can be developed in the system application.
Problems with Traditional File Structure
• Files maintained separately by different departments. • Data redundancy - unnecessarily duplicated data. • Inconsistent data ▪ E.g., John Smith's age vs his DOB • Lack of flexibility and data sharing. • Poor security.
DG Team Activities: Develop Goals, Principles, and Policies
• Goals, policies, and principles guide the DG program into the desired future state. • Data policies, must be effectively communicated, monitored, enforced, and periodically re-evaluated by cross-functional teams, such as data management, data stewards, and business owners.
Defining High Quality Data
• High quality data is fit for the purpose of data consumers. • 𝐂𝐮𝐫𝐫𝐞𝐧𝐭 𝐬𝐭𝐚𝐭𝐞 𝐚𝐧𝐝 𝐨𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐫𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬 𝐦𝐮𝐬𝐭 𝐛𝐞 𝐚𝐬𝐬𝐞𝐬𝐬𝐞𝐝 𝐛𝐲 𝐢𝐧𝐪𝐮𝐢𝐫𝐢𝐧𝐠 𝐚𝐛𝐨𝐮𝐭: ▪ What do stakeholders mean by "high quality data"? ▪ What is the impact of low-quality data on business processes? ▪ How will higher data quality enable business strategy. ▪ What priorities drive the need for Data Quality improvements. ▪ What is the tolerance for low quality data. ▪ What governance is in place to support Data Quality improvement? ▪ What additional governance will be needed?
Data Quality ISO Standard
• ISO 8000, the international standard for Data Quality is being developed to enable the exchange of complex data in an application neutral form. • ISO 8000 defines characteristics that can be tested by any organization to determine conformance of the data. • ISO 8000 defines quality data as "portable data that meets requirements." ▪ Portable data means it can be separated from an application.
Defining High Quality Data Strategy
• Improving Data Quality requires a strategy accounting for the work that needs to be done and the way stakeholders will execute it. • Data Quality priorities must align with business strategy and utilize a framework guiding the effort. • 𝐀 𝐟𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐬𝐡𝐨𝐮𝐥𝐝 𝐢𝐧𝐜𝐥𝐮𝐝𝐞 𝐦𝐞𝐭𝐡𝐨𝐝𝐬 𝐭𝐨: ▪ Understand and prioritize business needs. ▪ Identify the data critical to meeting business needs. ▪ Define business rules and Data Quality standards based on business requirements. ▪ Assess data against expectations. ▪ Share findings and get feedback from stakeholders. ▪ Prioritize and manage issues. ▪ Identify and prioritize opportunities for improvement. ▪ Measure, monitor, and report on Data Quality. ▪ Manage metadata produced through Data Quality processes.
Data Quality Improvement Lifecycle
• Improving Data Quality requires the ability to assess the relationship between inputs and outputs, in order to ensure that inputs meet the requirements of the process and that outputs conform to expectations. • 𝐒𝐡𝐞𝐰𝐡𝐚𝐫𝐭 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐨𝐥𝐯𝐢𝐧𝐠: ▪ Plan ▪ Do ▪ Check ▪ Act
Business Drivers of Data Management
• In a globally competitive market, data holds the key to competitive advantage. Failure to manage data is failure to manage capital. • Primary driver for data management is to enable companies to obtain value from their data assets in the same way effective management of financial and physical assets enables companies to obtain value from various assets.
Data Quality and Metadata
• It is difficult to measure something we don't understand. Metadata defines what our data represents. • Having a robust process to define data is essential to formalize Data Quality measures.
Common Causes of Data Quality Issues
• Lack of leadership. • Data entry processes. • Data processing functions. • System design. • Fixing issues with lack of regression testing.
Big Data
• Massive sets of structured and unstructured data from web traffic, social media, sensors, etc. • Volumes of data are too great for a typical DBMS • Can reveal more patterns and relationships. • At its core, big data is about predictions. • "Datafy" an event - capture an event in a quantified format so it can be tabulated/analyzed.
Critical Data
• Most organizations have a lot of data, not all of which is of equal importance. • Specific drivers for critical data differ by industry, common characteristics exist across companies. • 𝐃𝐚𝐭𝐚 𝐜𝐚𝐧 𝐛𝐞 𝐚𝐬𝐬𝐞𝐬𝐬𝐞𝐝 𝐰𝐡𝐞𝐭𝐡𝐞𝐫 𝐢𝐭 𝐢𝐬 𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐝 𝐛𝐲: ▪ Regulatory reporting. ▪ Financial reporting. ▪ Business policy. ▪ Ongoing operations. ▪ Business strategy.
DG Team Activities: Embed Data Governance
• One goal of the DG organization is to embed in a range of process behaviors related to managing data as an asset. • The company must accept that governance is required to maximize the value of data for everyone. • Establish a Data Governance community of practice. • 𝐄𝐧𝐬𝐮𝐫𝐞 𝐬𝐭𝐞𝐰𝐚𝐫𝐝𝐬 𝐟𝐫𝐨𝐦 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐝𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭𝐬 𝐫𝐞𝐠𝐮𝐥𝐚𝐫𝐥𝐲 𝐫𝐞𝐯𝐢𝐞𝐰 𝐜𝐫𝐨𝐬𝐬 𝐝𝐚𝐭𝐚: ▪ E.g., Bank of America might have an employee who always is a customer of Bank of America. HR and customer Data Stewards should be in agreement of what the data for Jane Doe means, where it is stored, who should access it, etc.
DG Team Activities: Engage in Change Management
• Organizational Change Management (OCM) is the vehicle for bringing about change in an organization's systems and processes. • 𝐃𝐚𝐭𝐚 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 𝐞𝐦𝐩𝐥𝐨𝐲𝐞𝐞𝐬 𝐚𝐫𝐞 𝐫𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐥𝐞 𝐟𝐨𝐫 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐡𝐚𝐧𝐠𝐞 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐭𝐨: ▪ 𝐏𝐥𝐚𝐧: - Planning change management, including performing stakeholder analysis, gaining sponsorship, and establishing a communications approach to overcome resistance to change. ▪ 𝐓𝐫𝐚𝐢𝐧: - Creating and executing training plans for DG programs. ▪ 𝐈𝐧𝐟𝐥𝐮𝐞𝐧𝐜𝐞 𝐒𝐲𝐬𝐭𝐞𝐦𝐬 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭: - Engaging with the PMO to add DG steps to the SDLC. ▪ 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭 𝐏𝐨𝐥𝐢𝐜𝐢𝐞𝐬: - Communicating data policies and the organization's commitment to Data Management activities. ▪ 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧: - Increasing awareness of the role and responsibilities of data stewards and other DG professionals, as well as the objectives and expectations for Data Management projects.
Metadata and Data Management
• Organizations require metadata to manage data as an asset. • 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚 𝐝𝐞𝐬𝐜𝐫𝐢𝐛𝐞𝐬: ▪ What data an organization has ▪ What it represents ▪ How it is classified ▪ Where it came from ▪ How it moves throughout the organization ▪ Who can and cannot use it ▪ Whether it is of high quality • Metadata management provides a starting point for improvements in data management overall.
Other Business Drivers
• Poor customer experiences linked to "bad" data. • Internal productivity loss in dealing with "bad" data. • Lack of insight into customer behaviors and needs. • Reactive approach to always fixing data issues. • Limited accountability for data • It is critical to align data governance objectives with overall company business strategy.
Techniques for Data Quality
• Preventive actions • Corrective actions • Quality check and audit code modules • Effective Data Quality metrics. • Statistical process control. • Root cause analysis.
DG Team Activities: Perform Readiness Assessment
• Readiness assessment is an initial current state assessment of the company's data, process, people, and systems. • 𝐂𝐨𝐦𝐦𝐨𝐧 𝐓𝐲𝐩𝐞𝐬 𝐨𝐟 𝐀𝐬𝐬𝐞𝐬𝐬𝐦𝐞𝐧𝐭𝐬: ▪ 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐌𝐚𝐭𝐮𝐫𝐢𝐭𝐲: - Understand what the organization does with data; measure its current data management capabilities and capacity. ▪ 𝐂𝐚𝐩𝐚𝐜𝐢𝐭𝐲 𝐭𝐨 𝐂𝐡𝐚𝐧𝐠𝐞: - Some DG requires behavioral change; it is important to measure the capacity for the organization to change behavior required for adopting DG. ▪ 𝐂𝐨𝐥𝐥𝐚𝐛𝐨𝐫𝐚𝐭𝐢𝐯𝐞 𝐑𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬: - This assessment characterizes the organization's ability to collaborate in the management and use of data. - Since stewardship is collaborative in nature, if an organization does not know how to collaborate, culture will be an obstacle to stewardship. ▪ 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐀𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭: - Examines how well the organization aligns uses of data with business strategy.
Relational DBMS
• Represent data as two-dimensional tables. • Each table contains data on entities and attributes. • Table: Grid of Columns and Rows ▪ Rows aka records ▪ Fields (aka columns) - represent attributes and store values. ▪ A primary key uniquely identifies each record. ▪ A foreign key is a primary key from one table used as a lookup value to identify records in another table.
Principles of Data Governance
• Since the 2000s, the following principles have been used to build a case for data governance: ▪ 𝐋𝐞𝐚𝐝𝐞𝐫𝐬𝐡𝐢𝐩 & 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐲: - Successful DG starts with visionary and committed leadership. ▪ 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐃𝐫𝐢𝐯𝐞𝐧: - DG is a business program, and, as such, must govern IT decisions related to data as much as it governs business interaction with data. ▪ 𝐒𝐡𝐚𝐫𝐞𝐝 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐲: - Across all Data Management knowledge areas, DG is a shared responsibility between business data stewards and technical data management professionals. ▪ 𝐌𝐮𝐥𝐭𝐢-𝐋𝐚𝐲𝐞𝐫𝐞𝐝: - DG occurs at both the enterprise and local levels and often at levels in between. ▪ 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐁𝐚𝐬𝐞𝐝: - Because DG activities require coordination across functional areas, the DG program must establish an operating framework that defines accountabilities and interactions. ▪ 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞-𝐁𝐚𝐬𝐞𝐝: - Guiding principles are the foundation of DG activities, and especially of DG policy. Reference to principles can mitigate potential resistance.
Ways to Measure Data Value
• Some ways to measure data value: ▪ 𝐑𝐞𝐩𝐥𝐚𝐜𝐞𝐦𝐞𝐧𝐭 𝐂𝐨𝐬𝐭: - The replacement or recovery cost of data lost in a disaster or data breach, including the transactions, domains, catalogs, documents, and metrics within an organization. ▪ 𝐌𝐚𝐫𝐤𝐞𝐭 𝐕𝐚𝐥𝐮𝐞: - The value as a business asset at the time of a merger or acquisition. ▪ 𝐏𝐨𝐭𝐞𝐧𝐭𝐢𝐚𝐥 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐎𝐩𝐩𝐨𝐫𝐭𝐮𝐧𝐢𝐭𝐢𝐞𝐬: - The value of income that can be gained from opportunities identified in the data, by using the data for transactions, or by selling the data. ▪ 𝐒𝐞𝐥𝐥𝐢𝐧𝐠 𝐃𝐚𝐭𝐚: - Some organizations package data as a product or sell insights gained from their data. ▪ 𝐑𝐢𝐬𝐤 𝐂𝐨𝐬𝐭: - A valuation based on potential penalties, remediation costs, and litigations costs, derived from legal or regulatory risk from the absence of data that is required to be present.
Data Lineage and Impact Analysis
• Standard Metadata tools provide visibility into how data moves throughout the organization. • Lineage is similar to a data lifecycle that includes data's origin, destination, and changes. • Employees making data changes must know how the change will affect downstream systems.
Metadata Governance and Metrics
• The data governance team should define standards and manage changes to metadata. ▪Workflow tool should be utilized to document and view progression of changes. • Adopt industry based metadata standards early the planning cycle. • Metrics can be used to determine the effectiveness or impact of lack of metadata: ▪metadata repository completeness. ▪Metadata quality
Data Valuation
• Value is the difference between the cost of a thing and the benefit derived from that thing. ▪ For some assets, like stocks, calculating value is easy. For data, not so much because neither the costs nor benefits for data are standardized. • 𝐀 𝐩𝐫𝐢𝐦𝐚𝐫𝐲 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 𝐭𝐨 𝐝𝐚𝐭𝐚 𝐚𝐬𝐬𝐞𝐭 𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐢𝐬 𝐭𝐡𝐚𝐭 𝐭𝐡𝐞 𝐯𝐚𝐥𝐮𝐞 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐜𝐨𝐧𝐭𝐞𝐱𝐭𝐮𝐚𝐥 & 𝐭𝐞𝐦𝐩𝐨𝐫𝐚𝐥: ▪ What is valuable to company A may not be valuable to company B ▪ What was valuable yesterday may not be valuable tomorrow. • 𝐂𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐧𝐞𝐞𝐝 𝐭𝐨 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝 𝐚𝐬𝐬𝐞𝐭𝐬 𝐢𝐧 𝐟𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐭𝐞𝐫𝐦𝐬 𝐟𝐨𝐫 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐦𝐚𝐤𝐢𝐧𝐠: ▪ Establishing ways to associate financial value with data is critical. • 𝐃𝐚𝐭𝐚 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐩𝐫𝐨𝐟𝐞𝐬𝐬𝐢𝐨𝐧𝐚𝐥𝐬 𝐬𝐡𝐨𝐮𝐥𝐝 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝 𝐭𝐡𝐞 𝐦𝐞𝐚𝐧𝐢𝐧𝐠 𝐨𝐟 𝐭𝐡𝐞𝐢𝐫 𝐰𝐨𝐫𝐤: ▪ This can help an organization transform its understanding of its own data and, through that, its approach to data management.
Data Policies
• 𝐀 𝐝𝐨𝐜𝐮𝐦𝐞𝐧𝐭𝐞𝐝 𝐬𝐞𝐭 𝐨𝐟 𝐩𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞𝐬/𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝𝐬 𝐭𝐡𝐚𝐭 𝐬𝐞𝐭 𝐭𝐡𝐞 𝐝𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐚 𝐠𝐢𝐯𝐞𝐧 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐩𝐮𝐫𝐩𝐨𝐬𝐞/𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭, 𝐨𝐫 𝐚𝐧𝐲 𝐫𝐞𝐬𝐭𝐫𝐢𝐜𝐭𝐢𝐨𝐧𝐬 𝐩𝐥𝐚𝐜𝐞𝐝 𝐮𝐩𝐨𝐧 𝐭𝐡𝐞𝐢𝐫 𝐮𝐬𝐞. ▪ They are often the organizations' response to external regulation but may also relate to contractual limitations placed upon the organization. • 𝐏𝐨𝐥𝐢𝐜𝐲 𝐝𝐞𝐬𝐜𝐫𝐢𝐛𝐞𝐬 "𝐰𝐡𝐚𝐭 𝐭𝐨 𝐝𝐨 𝐚𝐧𝐝 𝐰𝐡𝐚𝐭 𝐧𝐨𝐭 𝐭𝐨 𝐝𝐨." ▪ HR data must be encrypted and when stored in the data warehouse. • 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐬𝐜𝐫𝐢𝐛𝐞𝐬 "𝐡𝐨𝐰 𝐭𝐨 𝐝𝐨 𝐢𝐭." ▪ HR data shall be encrypted using the Advanced Encryption Standard algorithm.
Business Focus of Lineage
• 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐅𝐨𝐜𝐮𝐬 𝐨𝐟 𝐋𝐢𝐧𝐞𝐚𝐠𝐞: ▪ Begin with a set of prioritized data elements. ▪ From target locations trace back to source systems where the elements originated from. ▪ If coupled with data quality measurements, lineage can be used to pinpoint where system design adversely impacts the quality of the data.
Types of Metadata
• 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚: ▪ Naming and definitions of concepts, subject areas, entities, attributes, calculations, rules, etc. • 𝐓𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚: ▪ Technical details of data, systems, and processes that move data round. ▪ Physical columns, ETL, recovery and backup rules, primary keys, etc. • 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐌𝐞𝐭𝐚𝐝𝐚𝐭𝐚: ▪ Describes details of processing and accessing data. ▪ Error logs, batch programs, reports, archiving.
What Should a Business Glossary Include?
• 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐠𝐥𝐨𝐬𝐬𝐚𝐫𝐲 𝐬𝐡𝐨𝐮𝐥𝐝 𝐢𝐧𝐜𝐥𝐮𝐝𝐞: ▪ Term name, definition, acronym, abbreviation, synonyms. ▪ Associated systems. ▪ Stakeholders. ▪ Algorithms/calculations supporting definitions. ▪ Lineage/preferred source.
Common Tasks in the Day of a Data Steward
• 𝐂𝐨𝐦𝐦𝐨𝐧 𝐓𝐚𝐬𝐤𝐬: ▪ Creating and maintaining metadata. ▪ Documenting business rules and standards. ▪ Managing data quality issues. ▪ Executing operational data governance via daily projects.
DG Team Activities: Define the DG Operating Framework
• 𝐂𝐨𝐦𝐦𝐨𝐧 𝐚𝐫𝐞𝐚𝐬 𝐭𝐨 𝐜𝐨𝐧𝐬𝐢𝐝𝐞𝐫 𝐰𝐡𝐞𝐧 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐢𝐧𝐠 𝐭𝐡𝐞 𝐟𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤: ▪ 𝐕𝐚𝐥𝐮𝐞 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐭𝐨 𝐭𝐡𝐞 𝐎𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧: - If an organization sells data, obviously DG has a huge business impact. Organizations that use data as a crucial commodity will need an operating model that reflects the role of data. ▪ 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐌𝐨𝐝𝐞𝐥: - Decentralized business vs centralized, local vs international, etc. are factors that influence how business occurs, and therefore, how the DG operating model is defined. ▪ 𝐂𝐮𝐥𝐭𝐮𝐫𝐚𝐥 𝐅𝐚𝐜𝐭𝐨𝐫𝐬: - Such as acceptance of discipline and adaptability to changes. ▪ 𝐈𝐦𝐩𝐚𝐜𝐭 𝐨𝐟 𝐑𝐞𝐠𝐮𝐥𝐚𝐭𝐢𝐨𝐧: - Highly regulated organizations will have different mindsets and operating models of DG than those less regulated.
What Costs does the Data Lifecycle Involve?
• 𝐂𝐨𝐬𝐭𝐬: ▪ Acquiring ▪ Storing ▪ Administering ▪ Disposing of data
DG Team Activities: Implementing Data Governance
• 𝐃𝐆 𝐜𝐚𝐧'𝐭 𝐛𝐞 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐞𝐝 𝐨𝐯𝐞𝐫𝐧𝐢𝐠𝐡𝐭. 𝐈𝐭 𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐬 𝐩𝐥𝐚𝐧𝐧𝐢𝐧𝐠 - 𝐧𝐨𝐭 𝐨𝐧𝐥𝐲 𝐭𝐨 𝐚𝐜𝐜𝐨𝐮𝐧𝐭 𝐟𝐨𝐫 𝐨𝐫𝐠𝐚𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐜𝐡𝐚𝐧𝐠𝐞, 𝐛𝐮𝐭 𝐚𝐥𝐬𝐨 𝐬𝐢𝐦𝐩𝐥𝐲 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐢𝐭 𝐢𝐧𝐜𝐥𝐮𝐝𝐞𝐬 𝐦𝐚𝐧𝐲 𝐜𝐨𝐦𝐩𝐥𝐞𝐱 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐢𝐞𝐬 𝐭𝐡𝐚𝐭 𝐧𝐞𝐞𝐝 𝐭𝐨 𝐛𝐞 𝐜𝐨𝐨𝐫𝐝𝐢𝐧𝐚𝐭𝐞𝐝. ▪ "Data Governance is a journey." ▪ Need for implementation which could be 3-5 years. • 𝐄𝐚𝐫𝐥𝐲 𝐃𝐆 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐢𝐞𝐬 𝐜𝐨𝐮𝐥𝐝 𝐢𝐧𝐜𝐥𝐮𝐝𝐞: ▪ Defining DG procedures required to meet high priority goals. ▪ Establishing business glossary and documenting technology standards. ▪ Coordinate with enterprise architecture team to support better understanding of data and systems. ▪ Assessing key data assets and assigning financial value to them.
Data Asset Valuation
• 𝐃𝐚𝐭𝐚 𝐀𝐬𝐬𝐞𝐭 𝐕𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐢𝐬 𝐭𝐡𝐞 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐨𝐟 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐚𝐧𝐝 𝐜𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐢𝐧𝐠 𝐭𝐡𝐞 𝐞𝐜𝐨𝐧𝐨𝐦𝐢𝐜 𝐯𝐚𝐥𝐮𝐞 𝐨𝐟 𝐝𝐚𝐭𝐚. ▪ The key to understanding the value of a non-fungible item (like data) is understanding how it is used and the value brought by its usage.