Data Management FINAL
Difference between databases and spreadsheets
Databases are more relational
Who developed relational databases?
Edgar Codd
What does ETL stand for?
Extract, Transform, Load. Puts data from a source into a warehouse
FROM statement syntax
FROM schema.table
GETDATE
Gets current system datetime
DATEDIFF
Gets difference between dates
What does unique strategy require?
It requires unique data analytics
Are all systems centralized?
NO!
Can cloud vendors easily share info with analytics systems?
NO!
What is Data
Related items of (chiefly numerical) information considered collectively, typically obtained by scientific work and used for reference, analysis, or calculation.
AS
Renames field
SELECT statement
SELECT <what> FROM <where> WHERE <conditions>
Data Mart
Smaller, department-specific, data stores that usually contain information from a single domain.
ORDER BY
Sorts Results
What does SQL stand for
Structured Query Language
What does the relational model organize data into?
Tables
value
The data value from a table that represents an observation of a single variable.
key
The name used to identify the variable described by the value.
Data governance
The set of polices, procedures and controls that safeguard an organization's information while making it useful for both transactional and analytic purposes.
What are data cubes useful for?
Useful for reporting, not for analytics
how are many to many relationships involved
a linking table
DISTINCT
eliminates duplicates
Each column in a table corresponds to a
field
foreign keys
fields that define relationships between tables
unique key
fields that must be unique per row
gather syntax
gather(data, key, value, column(s))
Where does the power of databases lie
in relationships
compound keys
keys that use more than one attribute
are personal databases in the world of analytics
no
In a one-to-many relationship, the BLANK key from the "one" table becomes the BLANK key in the "many" table
primary, foreign
Each row in a table corresponds to a
record
COUNT
record count
What do foreign keys enforce?
referential integrity
spread syntax
spread(data, key, value)
7 Pillars of DM
structure, uniqueness, integration, quality, access, privacy, governance
primary key
uniquely identifies rows
Where can enterprise data be sourced from?
• Transactional systems vs. analytical systems • Enterprise Resource Planning (ERP) systems • Ancillary systems • Structured vs. unstructured sources
What is Master Data Management?
"It comprises the processes, governance, policies, standards and tools that consistently define and manage the critical data of an organization to provide a single point of reference."
Data Management
"The development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise."
properties of a primary key
-Each table should have at least one - Each table may have only one - Each row's value must be unique - A primary key may not be NULL (blank)
3 different relationship types and how common they are
-One-to-one (rare) - One-to-many (common) -Many-to-many (dangerous
Big Q's of Data Governance
-Ownership - Stewardship - Access -Definitions
Characteristics of enterprise databases
-Support many concurrent users - Scale with increasing demand (storage and compute) -Provide professional management/maintenance tools
properties of unique keys
-Tables may have multiple unique keys •-The primary key is a unique key •-A unique key may be the primary key
5 common data problems
1. Column headers are values, not variable names. 2. Multiple variables are stored in one column. 3. Variables are stored in both rows and columns. 4. Multiple types of observational units are in the same table. 5. A single observational unit is in multiple tables.
Where does unique data analytics come from?
1. Commercially available data sources 2. Proprietary data sources 3. Applying known analytic techniques in a new space 4. Proprietary analytic techniques
DatePart function
A function that examines a date and displays a portion of the date.
What is data integration
Aggregating data from multiple sources, both internal and external to the organization. It ties data together
What does API stand for?
Application, Programming, Interface
What is an Enterprise Data Warehouse?
Comprehensive data store containing as much information as possible that might be useful for analysis.
What is more flexible: a data warehouse or a data lake?
data lake
CAST
converts data types