ETL theory test
3) Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?
Data mining can be define as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository
ETLProcess
1. Decide where the DATA will be extracted from (Production Database(s), External sources) 2. Test the DATA extraction (Use an SQL Query) 3. Extract the DATA (Use SQL Server Data Tools Integration Services - SSDT) 4. Decide what DATA needs to be TRANSFORMED (Needs to Fit the Data Warehouse Data Structure) 5.Transform the DATA (Use SQL Server Data Tools Integration Services - SSDT) AND Excel Power Query 6. Load the Transformed DATA into the Data Warehouse 7. Use SQL Server Data Tools Integration Services - SSDT
Sources formats can include:
Database, Excel, Txt, Csv, XML
2) Explain what are the ETL testing operations includes?
ETL testing includes Verify whether the data is transforming correctly according to business requirements Verify that the projected data is loaded into the data warehouse without any truncation and data loss Make sure that ETL application reports invalid data and replaces with default values Make sure that data loads at expected time frame to improve scalability and performance
ETL Tool: Transform
Excel: using Power Query or Power BI and SQL Server using SSIS
Sources include:
External sources like web, company data like product catalog
ETL
Extract, Transform, and Load; tools that are used to standardize data across systems, allowing it to be queried
What is ETL?
In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.
What are Facts?
It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.
SSDT
SQL Server Data Tools. Using MS Server SSDT to extract data from production database(s). SSDT has tools to automate the extract process: Extract one table data, Can include restrictions. and Using Queries to extract data from one or more tables
ETL Tool: Extract & Load Data
The DBMS tools for extracting data i.e. SQL Server uses SQL Server Integration Services (SSIS)
14) Using SSIS ( SQL Server Integration Service) what are the possible ways to update table?
To update table using SSIS the possible ways are: Use a SQL command Use a staging table Use Cache Use the Script Task Use full database name for updating if MSSQL is used
Power Query
an Excel tool to transform data. Has functionality of Macros without the Macro. Tracks the changes performed on the data as steps. Save as a Query. Can reuse and re-run the Queries
Facts. What are examples?
customer, contact etc.
Extract
fetch data form multiple sources. Data needs to be extracted from the Data Sources. this is Identified in the Planning & Data Mart / Data Warehouse Design phases. Once sourced, Extract the data using the tools available within the application area i.e. DBMS tools, website extraction, file transfers from external organisations (if available)
load
load the respective table. Need to regularly load the Data into the Data Warehouse, Frequency depends on the requirements of the organisation. Tools to Load data depends on the Data Warehouse Implementation system i.e. SQL Server uses the SSDT and SSIS to Extract, Transform and Load data. The load should be scheduled at the regular time intervals to keep the data warehouse data up to date. SQL Server allows jobs to Load the data outlined in the SSIS Packages. Deployment of the SSIS Packages.
transform
restructure data to store properly in data warehouse. The Data from the Data Sources needs to match the Data Warehouse. The Data Warehouse needs to ensure: Have all the attributes required, The correct Data types, The correct Data format i.e. F and M or Female and Male. Only the attributes defined in the Data Model for the Data Warehouse Therefore: Data needs to be Cleansed and Transformed before Loading into the Data Warehouse