FIT5145 Module 6 - Data Curation and Management
Data Management Mistakes
- Flakey data management plan - plan to manage - Tools used in place of management - Maslow's hammer ... if all you have is a hammer.... e.g. ETL - orchestrations, scheduling? - Lack of meta data management - where its going, how did it get there, transformation. - Master data is not mastered (e.g. lives in application) - Data quality is believed to be an IT issue -> business issue - Data warehouse != big data - dumping ground for reporting - Business intelligence and data warehousing separated by a management wall - Self service business intelligence = lack of understanding / responsibility. Neither business or IT takes control of strategy - Big data is the new panacea - its not. - Assuming goodwill with the security of your data -- 88% of all data breaches are internal.
Audit
A systematic and independent examination of books, accounts, documents and vouchers of an organisation to ascertain how far the financial statements present a true and fair view of the concern
Carla Rudder
About Dr. David Bray. Everyone responsible for privacy and security.
Data Governance
An approach to managing information across an entire organisation storing - backing-up - making accessible - destroying when records no longer needed. The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets.
DCC Sequence
Conceptualise - Conceive and plan the creation of data, including capture and storage Create or Receive - Create data including administrative, descriptive, structural and technical metadata Appraise and Select - Evaluate data and select for long-term curation and preservation, noting policies and legalities Ingest - Transfer data to an archive, repository or data centre Preservation - undertake actions to ensure long-term preservation and retention of the authoritative nature of the data. Store - Store the data in a secure manner adhering to relevant standards Access, Use, and Reuse - Ensure that data is accessible to both designated users and re-users. Transform - create new data from the original.
Data Management Functions
Data Architecture Data Development Database Operations Data Security Reference and Master Data Data Warehousing and Business Intelligence Document and Content Management Meta Data Management Data Quality ... with Data Governance at the centre of all of this
Implicit Data
Data not explicitly stored, but inferred with reasonable precision from available data.
Non-Fungible
Data that is irreplaceable
Government DM
Full records of decision making Subject to FOI requests, and retain records for a fixed period. Multiple layers of security. Key producer of data
Business DM
Governance, compliance, information management
Privacy
Having control over how one shares oneself with others
Confidentiality
Information privacy - how information about an individual is treated and shared
Cybersecurity
Loss in information privacy Measures taken to protect a computer or computer system against unauthorized access or attack
Science DM
Particular emphasis on reproducibility and credibility producing artifacts of knowledge for public good A lot of effort in gathering data.
Digital Curation
Preservation and cataloguing are important, has a strong overlap with business/organisation and science management practice
Security
Protection of data, preventing it from being improperly uses.
Medical DM
Significant privacy issues Conflicting corporate financial constraints, government regulations and furthering of medical science.
Homophily
Tendency to associate with similar individuals Is important for enabling prediction
Jennifer Golbeck
The curly fry conundrum homophily
Data Management
The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.
Ethics
The moral handling of data (especially, data about others)
Pharmacovigilance
The practice of monitoring the effects of drugs
Compliance
The process of ensuring you meet regulations to relevant laws and regulations
Data Management how
Think about: Workflows - when planning data collection, never assume that variable names mean the same thing to everyone Backups - Create a backup plan Naming Conventions - Ambiguous field names can be dangerous - Use a data dictionary Variables - never record compound variables when the component variables are available - Be explicit about variable type, units of measurement, definitions.
Chris Bain
talks about levels of hospital data protection, both in policy, access control, and technical capability that prevent inappropriate use of data.