ETL Quiz
Which is NOT a way that data can be structured? -Pre-Structured Data -Structured Data -Semi-Structured Data -Unstructured Data
Pre-Structured Data
Define METADATA
Data about the data, such as file name, timestamps of when file is added, etc.
Please select all of the following that accurately describe OLTP Data Systems. -Structured around transactions and data storage -Optimized for concurrent transactions -Typically de-normalized -For client facing applications -Usually updated in batches
-Structured around transactions and data storage -Optimized for concurrent transactions -For client facing applications
Select all of the following that are types of Transformation: -Load Condition -Cleaning -Derivation -Key Restructuring -Full Refresh -Deduplication -Incrementation
1. Cleaning 2. Derivation 3. Key Restructuring 4. Deduplication
What is Business Intelligence?
1. Set of technologies that provide historical, current, and predictive views of business operations. 2. Set of tools for collecting and analyzing data to make business decisions. 3. A domain encompassing ideas from data collection, to transformation logic, to business analysis.
Define REPORTING DATA
Aggregated data compiled for the purpose of analytics
Define REFERENCE DATA
Any information that is usually standardized by some other party.
Which is NOT part of the ETL Process? -BDD -ODS -ERP -DWH
BDD, or Behavior Driven Development, is not part of ETL. ODS is Operational Data Store. ERP is Enterprise Resource Planning, used to keep track of business resources and status of business commitments like orders and payroll. This is a quite common Data Source used in the ETL Process. DWH is Data Warehouses.
What is a Data Mart?
Custom designed databases for business intelligence for a specific aspect of your company that you wish to draw conclusions from.
Which is NOT a Data Source? -Flat Files -Data Marts -External Data -OLTP Databases
Data Marts
Which of the following is a Type of ETL Test? -Unit Testing -Data Increment Testing -Target to Source Testing -Data Quality Testing
Data Quality Testing.
Define MASTER DATA
Data that is persistent and define the primary business entities within the company.
Define TRANSACTIONAL DATA
Data that is used in on-going operations within the company
TRUE OR FALSE: ETL Developers are responsible for making business-decisions based on the data reported.
False, ETL Developers do not make decisions, but instead prepare the data/information for other decision-makers within the organization.
TRUE OR FALSE: We use OLAP as an alternative Data System to OLTP when developing applications that we may ever want to perform data analysis on.
False, OLAP is not used for developing Client Facing applications. We would always use OLTP in this scenario. OLAP would be used to compliment an OLTP Data System application when it comes time to extract its data for analysis.
TRUE OR FALSE: The main types of Data Extraction are Batch Extractions and Streaming Extractions.
False, these more accurately describe types of Incremental Loads not Extractions.
TRUE OR FALSE: Business Reporting is the process of extracting data from various locations, transforming the data, and adding the data into a new target location.
False, this more accurately describes the ETL Process.
Dimensional Databases are used for: -Presenting all data from the Data Warehouse into a structured viewable format, tailored for faster queries. -Holding data for the Data Mart revolving around Fact Tables and surrounded Dimensions. -Holding Measures in various Dimensions to outline the dataset. We surround each Dimension with Fact Tables to represent Facts about each Dimension. -All of the above.
Holding data for the Data Mart revolving around Fact Tables and surrounded Dimensions.
Which is NOT a Data Quality Trait? -Relevancy -Accuracy -Granularity -Modularity
Modularity
Define ANALYTICAL DATA
Numerical values or metrics that provide business intelligence and help make business decisions.
Describe ETL.
Retrieving data from various locations, conforming that information, and adding into a new data location.
Which of the following are Dimensional Database Schema design patterns? -Star Schema -Galaxy Schema -Spiral Schema -Cubic Schema
Star Schema and Galaxy Schema
TRUE OR FALSE: ETL Testing is the process of validating, verifying, and qualifying data while preventing duplicate records and data loss.
True
Which type of Data would you NOT find in a Data Warehouse? -Transactional Data -Raw Data -Summarized Data -Metadata
Transactional Data (used in day-to-day operations of a business, likely found in Data Sources not DWH)
TRUE OR FALSE: Data is a collection of units of raw material that holds specific meaning.
True