Chapt 4: Data Cleansing
Dimensions of data quality (8)
Accuracy, completeness, confidentiality, consistency, integrity, precision, reliability, timeliness
Examples of dirty data
Incomplete, wrong, inappropriate, non-conforming, duplicate, poor data entry
Impact of dirty data
Increased risk, increased cost, low confidence, decreased revenue
Principles of Data Cleansing (10)
Planning is essential, organizing data improves efficiency, prevention is better than cure, responsibility belongs to everyone, partnership improves efficiency, prioritization reduces duplication, sets targets and performance measures, feedback is a two-way street, education and training improves techniques reduces cost and improves overall data quality, accountability transparency and audit-ability are important
Pre-emptive ways to minimize dirty data (3)
Standardize terms, redesign IT platforms (automation), enhance process management
Advantages of high quality data (8)
Cost savings, increased efficiency, protection of reputation and brand, higher customer satisfaction, ability to capitalize on market opportunities through customer profiling, enable more informed decisions, reduction of risk and fraud, compliance with industry and government legislation
What is data cleansing?
it is data scrubbing, identify dirty data and replace, modify and delete it