ETL theory test

Ace your homework & exams now with Quizwiz!

3) Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?

Data mining can be define as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository

ETLProcess

1. Decide where the DATA will be extracted from (Production Database(s), External sources) 2. Test the DATA extraction (Use an SQL Query) 3. Extract the DATA (Use SQL Server Data Tools Integration Services - SSDT) 4. Decide what DATA needs to be TRANSFORMED (Needs to Fit the Data Warehouse Data Structure) 5.Transform the DATA (Use SQL Server Data Tools Integration Services - SSDT) AND Excel Power Query 6. Load the Transformed DATA into the Data Warehouse 7. Use SQL Server Data Tools Integration Services - SSDT

Sources formats can include:

Database, Excel, Txt, Csv, XML

2) Explain what are the ETL testing operations includes?

ETL testing includes Verify whether the data is transforming correctly according to business requirements Verify that the projected data is loaded into the data warehouse without any truncation and data loss Make sure that ETL application reports invalid data and replaces with default values Make sure that data loads at expected time frame to improve scalability and performance

ETL Tool: Transform

Excel: using Power Query or Power BI and SQL Server using SSIS

Sources include:

External sources like web, company data like product catalog

ETL

Extract, Transform, and Load; tools that are used to standardize data across systems, allowing it to be queried

What is ETL?

In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While, load does the process of writing the data into the target database.

What are Facts?

It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.

SSDT

SQL Server Data Tools. Using MS Server SSDT to extract data from production database(s). SSDT has tools to automate the extract process: Extract one table data, Can include restrictions. and Using Queries to extract data from one or more tables

ETL Tool: Extract & Load Data

The DBMS tools for extracting data i.e. SQL Server uses SQL Server Integration Services (SSIS)

14) Using SSIS ( SQL Server Integration Service) what are the possible ways to update table?

To update table using SSIS the possible ways are: Use a SQL command Use a staging table Use Cache Use the Script Task Use full database name for updating if MSSQL is used

Power Query

an Excel tool to transform data. Has functionality of Macros without the Macro. Tracks the changes performed on the data as steps. Save as a Query. Can reuse and re-run the Queries

Facts. What are examples?

customer, contact etc.

Extract

fetch data form multiple sources. Data needs to be extracted from the Data Sources. this is Identified in the Planning & Data Mart / Data Warehouse Design phases. Once sourced, Extract the data using the tools available within the application area i.e. DBMS tools, website extraction, file transfers from external organisations (if available)

load

load the respective table. Need to regularly load the Data into the Data Warehouse, Frequency depends on the requirements of the organisation. Tools to Load data depends on the Data Warehouse Implementation system i.e. SQL Server uses the SSDT and SSIS to Extract, Transform and Load data. The load should be scheduled at the regular time intervals to keep the data warehouse data up to date. SQL Server allows jobs to Load the data outlined in the SSIS Packages. Deployment of the SSIS Packages.

transform

restructure data to store properly in data warehouse. The Data from the Data Sources needs to match the Data Warehouse. The Data Warehouse needs to ensure: Have all the attributes required, The correct Data types, The correct Data format i.e. F and M or Female and Male. Only the attributes defined in the Data Model for the Data Warehouse Therefore: Data needs to be Cleansed and Transformed before Loading into the Data Warehouse


Related study sets

Exam 4: Fluid & Electrolyte Imbalances (NCLEX)

View Set

PHAR - Drug Design and Development

View Set

15) Innovation Ideation & Prototyping

View Set

Penny Chapter 32 - Fetal Environment and Maternal Complications

View Set

Surg 1 TEST NUMBER 3 Postoperative care/supplemental oxygen

View Set

MGT 3303 Chapter 08: Managing Human Resources

View Set

Use hygiene Practices for Food Safety

View Set