5.2 Extract, Transform, and Load Relevant Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Loading Data important considerations:

1. Ensuring the transformed data is stored in a format and structured acceptable to the receiving software 2. Understand how the new program will interpret data formats

Four step transformation process:

1. Under the data and the desired outcome 2. Standardized, structured, and clean the data 3. Validate data quality and verify data meets data requirements 4. Document the transformation process

Extracting data steps:

1. understand data needs and the data available 2. perform the data extraction 3. verify the data extraction quality and document what you have done

delimiter

A character, or series of characters, that mark the end of one field and the beginning of the next field

ETL Process

A set of procedures for blending data. The acronym stands for extract, transform, and load data

Check each item listed below that is part of the process for transforming data.

A. Validate data quality and verify data meets data requirements B. Standardize, structure, and clean the data C. Document the transformation process D. Understand the data and the desired outcome

Check each example of structured data in the list below

C. Phone numbers of employees saved in a database D. Customer addresses saved in a customer relation database

what's the fourth step of the transformation process?

Document the transformation process

What do the letters in the acronym ETL stand for?

Extract, Transform, and Load

What is the first step in the ETL process?

Extracting data

A data owner sends you an e-mail with a file to prepare for analysis. The file contains data from multiple database tables all merged into a single file. There are multiple fields in the file each separated by a "~" symbol. For fields that contain large amounts of text, the file contains a "+" at the beginning and end of the text field. Indicate which of the following best describes (1) the type of file the data owner sent, (2), what the "+" is called, and (3) what the "~" is called.

Flat file, text qualifier, delimiter

what's the second step of the transformation process?

Standardized, structured, and clean the data

Chunhua has been building financial forecasting models for the company for several years. For each model, she saves all the data that could possibly be used in the model, even if she doesn't use all the data in her finished model. She does not document anything about the different items she has saved. When her intern, Minsuh pulls the data, she cannot understand what all the fields mean. How would Minsuh most accurately describe the data?

The data has become a data swamp

what's the first step of the transformation process?

Under the data and the desired outcome

what's the third step of the transformation process?

Validate data quality and verify data meets data requirements

data lake

collection of structured, semi-structured, and unstructured data stored in a single location

data swamps

data repositories that are not accurately documented so that the stored data cannot be properly identified and analyzed

data marts

data repositories that hold structured data for a subset of an organization

Examples of Semi-Structured Data

data stored in csv, xml, or json format

metadata

data that describes other data

unstructured data

data that has no uniform structure

structured data

data that is highly organized and fits into fixed fields

defining the question well makes it ___

easier to define what data is needed to address the question

Example of structured data

general ledger and data in a relational database

examples of unstructured data

images, video, documents

dark data

information the organization has collected and stored that would be useful for analysis but is not analyzed and is thus generally ignored

semi-structured data

organized in some ways but is not fully organized to be inserted into a relational database

flat file

text file that contains data from multiple tables or sources and merges that data into a single row

data owner

the person or function in the organization who is accountable for the data and can give permission to access and analyze the data

text qualifier

two characters that indicate the beginning and ending of a field and tell the program to ignore any delimiters contained between the characters

without defining the data well early in the process, it is more likely that

wrong data or incomplete data will be extracted


Kaugnay na mga set ng pag-aaral