Matillion Certification

Ace your homework & exams now with Quizwiz!

Create Views by Importing JOB from a JSON file - Steps

1. Click Project and Import 2. Click Browse to find the create_flights_views.json file (this is saved during live training) 3. Select the Create Flights Views job 4. See new Job - Create Flights Views is not listed on Menu; double click on it to open it 5. Right Click on Canvas to revalidate the Job; Click Revalidate Job 6. Right Click on Cavas to Update the Database and Schema Name using Job Variables 7. Click Manage Job Variables 8. Change the job_database_name Value to your database name if you are using your own instance 9. Change the job_source_schema_name Value to your schema. Ok to use the db_source schema name that is already there 10. Right Click on canvas to run the Job 11. Check that the Job is finished successfully 12. Check on each step of the Job by exapdning it (click on the arrow on the Right) 13. Click on the db_source Schema 14. Click to Open the Views folder 15. Check that the 3 Views have been created 16. Job is no longer needed once the Views are created. Right Click on the Create Flights Views Job to delete it.

Importing a Job from a JSON file Steps: File contains multiple tables that will be used transformation Jobs

1. Click project and Import 2. Find and select the JSON file using 'Browse' 3. Select 'Create Tables' job 4. See/confirm that it is now in your menu on upper left. Double click on Create Tables job to open it 5. Job appears on Canvas. Right click on the canvas to revalidate the Job and click 'Revalidate Job' 6. Right click on canvas to check or update the database and schema names. 7. Click 'Manage Job Variables' 8. Type your database name for Value of: job_database_name if you are using your instance otherwise leave as is 9. Ensure job_dwh_scheman_name is pointing to your data warehouse schema 10. Ensure job_source_schema_name is pointing to your source schema 11. Right click on canvas to run the Job - click Run Job 12. Check that the Job has finished successfully by expanding the Job (click on arrow on the right) 13. Click on Triangle next to Development in the Environment window to expand 14. Right click on db-_dwh to refresh the Schema list (click Refresh after right clicking) 15. Click on Tables to open the Tables folder 16. Check/confirm that the Tables you wan is part of your schema (in training, fct_flightanalysis_summary) 17. Right click db_source to refresh the schema (click Refresh after right-clickinging) 18. Click Tables to open the Tables folder under this schema 19. Confirm that the Tables yo want are there (in training, 2 new tables) 20. Do not need the job anymore. Right click Create Tables job on upper left list to delete it.

Interacting with AWS Console (sending a message, etc.)

1. Continue Working on the DWH Orchestraton Job 2. Find the CloudWatch Publish component 3. Drag and Connect to the Stage Airports (RDS Query) Component 4. Click Project to create another Environment Variable 5. Click Manage Environment Variables 6. Click + to Add another variable 7. Type 'env_rds_query_duration' as the name 8. Select Numeric as the Type, Copied as the Behavior, and 0.0 as the default value 9. Also Type 0.0 as the value of the env_query_row_count variable. 10. Variables used by CloudWatch Publish component have to be numeric 11. Select the Stage Airports (RDS Query) Component and Click the Export Tab 12. Click Edit to include a Second System Property 13. Click + sign and Add Duration as Second System Property 14. Click the env_rds_query_duration Variable that was previously created 15. Select the CloudWatch Publish Component and click on Properties 16. Change Name of Component to 'Publish to CloudWatch' 17. Click on Namespace and Type a Name Space you have on AWS (for training, type 'mtln-training-course 18. Click on Metrics to map it 19. Click + to Add a new Metric 20. Name first metric 'RDS_Duration' 21. Select env_rds_query_duration to select the Environment Variable containing the result coming from the RDS Query component 22. Click + to add Second Metric 23. Name it RDS_Row_Count (this has to be defined in advance in your AWS namespace 24. Select env_query_row_count to select the Environment Variable contianing the row total coming out of the RDS Query Component 25. Find the SQS Message Component and Drag to the Canvas 26. Connect it to the And Component and the End Success Component (in between) 27. Select SQS Message and Click on Properties 28. Leave the Default Name. Click on Queue Name 29. Select a queue defined in your AWS Account in advance. (For training, MTLN-TRAINING-ETL_COMPLETE_QUEUE) 30. Click Message to Add the Message you want to send to the queue 31. Type 'ELT Comleted Successfully' This message will be read on the AWS accoun when youopen the queue that was selected. 32. Right Click on Canvas and select Run Job to run it 33. Expand the Job and Explore its Steps (arrow and triangle) 34. Explore the output of the CloudWatch Publish Component by clicking on it 35. Check that you have an output for the Query Duration and another one for the Row Count 36. Explore the output of the SQS Component by clicking on it 37. You can read the status of the message sent to the SQS queue

Data Cleansing (Coverte Distances with the Fixed Flow Component)

1. Continue Working on the Load fct_flights_analysis Job 2. Find the Fixed Flow Component (search box under components) 3. Change Name to 'km_and_miles' 4. Change Columns Property 5. Add 3 columns by clicking + 3 times 6. Source_type, target_type, and fact as names of 3 columns 7. For Factor Column: Data Types=Number, Size=10, Decimal Place=8 8. Set Values for Columns 9. Click + twice to add two Values 10. Type miles under source type 11. Type km under target type 12. Type 1.60934 as the factor 13. For second value under source type, input km 14. For second value under target type, input miles 15. For facto under second value, input 0.621371 16. Find the Join Component 17. Drag and Drop to Cavan and Connect the Fixed Flow Component (km_and_miles) and the Rank Component (Rank Longest Airtime) to the Join Component 18. Select Join Component to Change its Properties 19. Select Main Table and click Longest Air Time as the Main Table 20. Add Main Table Alias, type 'main' 21. Click Joins to Add Join Condition, Click + to Add a Condition 22. Select km_and_miles as the Join Table 23. Type 'convert' as the join alias 24. Click Left as the Join Type 25. Select/Click Join Expression to Add Join Expressions 26. Select distance-type from the main table 27. Click on the equals sign 28. Click on the source_type from the convert table 29. Select Output Columns 30. Clicking Add All 31. Select the convert-source_type because we want to delete it as its also contained in the main table 32. Click minus sign to delete it 33. Find the Calculator Component 34. Drag and Drop and connect to Join 35. Change Name of Calculator Component to Calculate Distance Fields 36. Add to Calculations Property - Click on It 37. Click + to Add a New Expression, Click + to Add a Second Expression 38. distance_km as Name of first expression 39. Copy and paste the expression provided in notes: round(case when "distance_type"='km' then "distance" else "distance"*"factor" end) 40. Name second expression 'distance_miles' 41. Copy and paste the expression provided in notes: round(case when "distance_type"='miles' then "distance" else "distance"*"factor" end) 42. Click on Sample (while Calculator Component is selected) 43. Click Data and see the two new distance columns being created using the calculator component

Key Features of Matillion ETL

Drag-and-drop browser interface. Build sophisticated, powerful ETL/ELT jobs. Push-down ELT technology uses the power of your data warehouse to process complex joins over millions of rows in seconds. Live feedback, validation, and data preview, in-tool, as you build your ETL/ELT jobs. Collaboration baked-in. Build jobs together in disparate locations, like in Google Docs. Version control, import/export and server-side undo. Over 25 data Admin Menu to administer your ETL instance. Over 100 connectors to popular online services that you can connect to and pull your data from. Easy UI and Basic Functions so you can build your first job in minutes. In-client support. Enterprise features such as Generated Documentation and Data Lineage (As of version 1.63.4, the visual lineage tab from Matillion ETL has been removed. For more information, and to find out what the future of lineage in Matillion ETL looks like, read Matillion ETL Supports Data Lineage Tools with Metadata API Integrations from the Matillion blog).

AWS Data Load Components

Dynamo DB EMR Load S3 Load S3 Load Generator S3 Unload

Matillion ETL

ETL/ELT tool built specifically for cloud database platforms including Amazon Redshift, Google BigQuery, Snowflake and Azure Synapse. It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality.

Schemas in Matillion

Each Matillion environment specified a default schema Default Schema is the palce that Tables are Searched for in Components that read and write data (e.g. Table Input) Default Schema is also where the matilion ETL will write its own internal views that represent Transformation Componens You can have Multiple Schemas in an Environment, but that adds complexity

Assert Components

Empower users to Verify Conditions of: Scalar Variables Tables External Tables Views

Environment Variables

Environment variables are name:value pairs that are stored inside the Matillion ETL client and are fully configurable by its users. They can be used across matillion ETL, in configurations and in all jobs through many components (unlike Transformation Variables) Environment Variables must be created or declared before being used When a Job begins, all Environmental Variables are at their Default Value Values can be updated as the Job Runs through the following methods: Iteration components: Set variables value to a new value for each iteration. Can be run in parallel - must set variable behavior to Copied to ensure each paraplled componene sees and iterates its own copy Python Scripts: Can push new values into variables using built-in-context object SQS Runs: Executions triggered by SQS messages can also set variable values Variable Exports: Export tab of the Properties panel allows you to edit mappings between the runtime information that the component makes available and the variables you have already defined. All Orchestration components and some transformation components support exporting runtime information into variables during Job execution.

Available Jobs in METL

Import Export Add Transformation Job Add Orchestration Job

Search Tab (what does it search through)

In the lower right Panel, next to Tasks Tab User can perform a search (partial match) for any term Search spans all jobs, notes and component properties within the current version of the project and returns any matches Matches point to Components within the Jobs that contain the searched term within their properties in any form

Matillion UI (list all)

Interface Features of Matillion UI 1. Project Menu (Drop down menu) Project (Switch Project, Manage Project, Rename Project, Delete Project, and Input Data Report) Version (Manage Versions, Recycle Bin) Environment (Add Environment, Manage Environment Variables) Data Sharing (Export, Import) Adminestration and Authentication (too many to list) Additional Options (Manage SQS Configuration, Manage Change Data Capture, Manage Pub/Sub Configration, Migrate, Git) Log Out 2. Navigation Panel (Top Left) All available Jobs (Orchestration and Transformation) are displayed here. Double Click on a Job to Open it Right Click to open Pop Up Menu (create, delete, import and exper jobs, and create job folders) 3. Job Canvas Shows contents of selected job (all the Components) Connect components with Connection Rings Connection Rings are Color Coded Green/Success: Continue onto next component if executed successfully Red/Failure: Continue onto next component if failed to execute Gray/Unconditional: Continue onto next component no matter what Blue/Iterate: Continue on to another component (regarding iterator components, so no success or failure criteria) Note: All Orchestration Jobs begin with a Start Component and, thus cannot receive links from another Component 4. Components, Shared Jobs and Environments Panels (bottom left) Components panel is a list of all available components. Can be searched using the search field Shared Jobs Panel shows all the bundled Shared Jobs that can be used in multiple Projects Environments Panel: An environment describes a single connection to a database, on which jobs can be run. You can set up multiple environments. All environments associated with the current instance are listed here. 5. Properties Panel Clicking on a particular component reveals its Properties shown in the Properties Panel in the bottom (below the Canvas) Properties: A unique set of user-defined properties that effect the function of the selected component Export: Export certain values from the selected component as a variable to be used elsewhere Sample: Many components allow users to produce a sample output of data up to a specified number of rows—this can be used to confirm a component is set up correctly before using it in a live job Metadata: Display data for each column featured in the sample data; including column names, types, sizes and scales (this tab is only available in Transformation components) Lineage: (Enterprise Mode Only) track columns across transformations in order to understand how that column has been affected by components up to the point of inspection. (Note: As of version 1.63.4, the visual lineage tab from Matillion ETL has been removed. For more information, and to find out what the future of lineage in Matillion ETL looks like, read Matillion ETL Supports Data Lineage Tools with Metadata API Integrations from the Matillion blog.) SQL: Display the SQL equivalent of the component and its current setup Plan: get the plan from the SQL query generated by the job up to (and including) the selected component Help: Display a snippet of the help documentation for the selected component 6. Tasks Panel The Task Tabs show information about tasks that have been run during the current session. Tasks: a list of up to the last 20 tasks attempted by the current user in the current project session—including queued, failed or cancelled tasks Search: allows users to search the current project for matching components Console: the Matillion ETL instance console Command Log: an archive of interactions between the current browser session and the Matillion ETL server Notices: a list of notifications concerning the Matillion ETL project—including updates to Matillion ETL 7. Admin Menu: The Admin dropdown menu allows users with "Admin" privileges to manage the Matillion ETL instance 8. Help Menu: The Help dropdown menu provides a lists useful information about Matillion ETL, these include: Get Support Support Information Manage Connections License About

JDBC

Java™ database connectivity (JDBC) is the JavaSoft specification of a standard application programming interface (API) that allows Java programs to access database management systems.

Job Concurrency

Jobs are queued via the Quartx Job Scheduler No limit on size of queue (Unbound) Matillion ETL instances allows up to 16 jobs to run concurrently The number of vCPUs determines the number of processes that can run inside each job (n*2) where n is the number of vCPUs in an Instance Multiple runs of the same job are run sequenctally, running different jobs are run concurrently

Transformation Jobs

Jobs concerned with transforming data within Tables. This includes filtering data, changing data types, and removing rows. Primarily use Components named after the functions and data manipulation language (DML) commands they represent (e.g. Rank and Aggregate) A Transformation Job is Chiefly Defined by the Components it contains No clear Start point unlike Orchestration Jobs and many flows can run at once by creating multiple strings of Components - Job Concurrency

Orchestration Jobs

Jobs that deal with management of resources (such as Tables) as well as loading data from external sources Primarily concerned with Data Definition Language (DDL) Statements (create, drop, alter resources) and loading data from external resources An Orchestration Job is Chiefly Defined by the Components it Contains Orchestration Jobs have a clear Start Point Data can be loaded using connectors

Getting Started with Matillion ETL

Launch (right METL version (Snowflake, Redshift, etc.) and right Instance size) Connect to your Instance (AWS, GCP, Azure) Create a Project Set up Credentials (to communicate with your cloud platform)

Multiple Environment Connections

Typically, jobs share one connection with each job queued in the database, and one job equals one connection to the database. The Multiple Environment Connections users to run multiple jobs across multiple Environment connections Offers users Concurrent Loading, which is faster, by enabling multiple connections between and Environment and the Data Warehouse Can be configured in the 3rd Page - Data Warehouse Defaults when creating or editing and Environment Users have an option available in their Matillion environment to configure the number of concurrent connections for jobs that run in that environment to use. The maximum number of concurrent connections depends on the instance size (minimum: one concurrent connection). Maximum size/number of concurrent connections depends on the instance size. Please note that any operations executed within a transaction block will run on a single connection even if multiple connections have been specified in the environment setup.

Notes

Users can enter Notes on the Canvas to annote their Jobs Create by Right Clicking on Canvas and Select Create Note Click on Note to Edit or Delete Standard keys to edit text (Control-b for bold, etc.) Control-k to hyperlink

Variables (which ones are supported by Matillion ETL)

Variables are name-value pairs stored within each environment. Variables can be used in all sorts of parameters and expressions to allow the user to pass and centralize environment specific configuration. They are also easily accessible through Python and Bash scripts where they are substituted in at run time. Matillion ETL supports: Environment Variables Automatic Variables Job Variables Grid Variables

Job Variables

Variables can be defined within the scope of a single job. These variables can still be "copied" and "shared" with regard to their behaviour in a Fixed Flow component and Matillion ETL's various iterator components. When a Job begins all Variables are at Default Value. Values can be updated via: Iterator Components: Set component to a new value for each iteration Python Scripts: Can push new values into variables using their built-in "context" object.

Exporting Variables within a Shared Job

Variables used inside Shared Jobs can be exported 1. Create one or more ob variables: Right Click on a Component and Click Manage Variables 2. Create the Shared Jobs 3. Create a set of Job Variables in the Intended Job that will be included in the newly created Shoared Job 4. Click the Shared Job in the Job Canvas and Seleect the Expost Tab and Click Edit 5. Take the Variables from the Shared in the Source Colum and Map them to the Variables from the Current Job in the Target Variable column

Assert Scalar Variables Component

Verify a Variables Value or else Halt the running of a Job

Assert View

Verify that certain conditions of a view are true, or else Halt the running of a Job

Assert Table

Verify that the Target Table has been created with the correct metadata

Grid Variable Export

When a Job calls a child Job with the Run Orchestration Component, the values of Grid Variables can be exported from the child Job to the parent Job. This is managed from the Component's Grid Export Tab: 1. Define separate grid variables in both the child and the parent job 2. Click the Run Orchestration Component on the Job and Click the Grid Export Tab 3. Grid Export Tab liss the grid variables currently exported from that component. Click + to create a New One 4. In the Export Grid Variable dialog, selct a Grid Variable from the Map Into drop-down (grid in parent job that the component's grid variable will be exposted to). 5. Select a Grid Variable from the Grid Variable drop-down (the grid in the child job that will be exported). 6. Under Column Mapping, each column in the parent grid is listed. For each column, select the column in the child grid that will be exported to it. Every column in the parent grid must be matched with a column in the child grid. You do not have to use every column in the child grid. One chlid column can map to multiple parent columns.

Single Schema Setup in Matillion

When you set up your first project and environment is setup for your with a default Schema (that is Public) with a Value [Environment Default] If you leave/set the properties of your Components as Schema: [Environment Default], changing the Schema of the Environment will change the Schemas of the Component as well (so you only have to change it in one place) Whenever a Component asks for a Schema, it will also need a Table or Tables. The Schema Defines which set of Tables are visible Matillion favors a Single Schema per Environment setup. This is particularly useful when you want to seprate the different environments within a sigle dtabase, but expect all the same daabase tables to exist in each schema. You can change the current environment (and consequently the Default Schema) from the Environments Panel You can run a job in any environment. Design and Run in Test Environment, then Run in Staging or Production Environments

Managing Versions (From Project Menu)

You can Manage the Versions of your Orchestrations and Transformation Jobs from Matillion Versioning supports many use cases, but perhaps the most common use is to capture and freeze your development at a point in time to designate as "live" or "production" versions, and to then continue working on your default version. Matillion creates a copy of all the jobs in your current project at that point in time All projects have a Default Vesion (the main working version) You can create multiple new versions that branch from the default at different points in your Project - Note: a new version does not include the undo histry for the Project Those versions can have new versions of their own Changes can be made to the default or any of the branches as long as they are unlocked Create a New Version: Project Menu>Manage Versions and Click on + (Enter Name, Select: Lock/Unlock, and Enter Description) All Versions are Listed in the Manage Versions dialog and you can click on a Version and click OK to Switch to it Versions are referenced and used in the following: Using Queues (in SQS Queue) In the Shceduler Task History (shows which version is running or has run By URL (version can be referenced in the URL: https://<server name>/#<Group Name>/<Project Name>/<Version Name>)

Load Data onto a Table and Calculate/Analyze (Calculate Average Delay Per Flight)

1. Continue working on the Load fct_flight_analysis Job; will create a scond output 2. Find the Calculator Component 3. Drag and Drop and Connect to the existing Calculator Component 4. Change its Name to 'Calculate Average Delay' 5. Click on Calculations to Add some calculations to the Component 6. Click + three times to add 3 expressions 7. Name First Expression 'delay_time' 8. Create the First Expression by Clicking on crs_arrival_time and clicking on '-' sign and then clicking on 'arrival_time' 9. Name Second Expression 'flight_year' 10. Copy and paste the expression provided in the notes 'DATE_PART(YEAR, "flight date") 11. Name Third Expression 'flight-month' 12. Copy and paste the expression provided in the notes 'DATE_PART(MONTH, "flight date") 13. Find the Aggregate Component 14. Drag and Drop and Connect to the Calculate Average Delay Component (To group dat by year, month, and airport to calcualte average delay in same group conditions) 15. Chanage Name of Aggregate Component to Group to Airport, Year, Month 16. Click Groupings to Change the Groupings Property 17. Clcik and select the flight_year, flight_month, destination columns 18. Select to change the Aggregations Property 19. Click + to Add aggregation and find delay_time and click Average for Aggregation Type 20. Find the Table Output Component 21. Drag and Drop and Connect to the Aggregate Component 22. Select the correct Schema; change to db_dwh 23. Select Target Tabe and change to fct_flightanalysis_summary 24. Select Column Mapping and Click Add All 25. Two Output Columns have been mapped correctly by the system, will have to do the other two manually 26. Clich desination to map the airportcode 27. Clcik avg_delay_time to map the averagedelaytime Output Column 28. Change the Truncate Setting to Truncate (we want to get an empty table each time we run the Job) 29. Go to Project Menu and Click Manage Environment Variables 30. Create 2 new Environment Variables - Click + Sign Twice 31. Name First Variable 'env_soruce_schema_name', select Copied as Behavior and type 'db_source' as Default Value 32. Name Second Variable 'env_dwh_schema', select Copied as Behavior and 'db_dwh" as Default Value 33. Select the flct_flight_analysis Table and Change its Schema Properties by slecting th Use Variable box and typing 'env' to find the ${env_dwh_schema} 34. Repeat the same for the fct_flightanalysis_summary Table selecting the same ${env_dwh_schema} Schema 35. Run the Job by Right Clicking on the Canvas 36. Select and Expand the Running Job from the Tasks Tab (right arrow on the right 37. Expand the Task and see that 10,255 Rows have been inserted into the fct_flightanalysis_summary Table 38. Expand the Environments Section 39. Notice the two new Facts Tables listed under Tables (fct_flightanalysis and fct_flightanalysis_summary)

BDW 3-8: Detech Changes in Airports Data

1. Creating Transformation Job to detech changes from the airports data 2. Right Click Load dmn aiports Job to copy it 3. Right Click default version of the project 4. Click paste and confirm 5. Right Click on the Copy and Click Manage Job 6. Name New Job (in training 'Load dmn_airport (with changes)' 7. Open new job (double click on it) 8. Right Click on Canvas to Validate the Job 9. Delete the training_airports Tabe 10. From the Environments Section drag the training_airports_changes Tabe to the Canvas 11. Find the Covert Type Component, Drag to Canvas and Connect to the training_airport_changes Component 12. Select Covert Type and Change its Name to 'Convert to Number 13. Click Change Coversions Property 14. Click + Sign to add a new conversion 15. Find the num_carriers column and click on it 16. Click Number as the Type, Type 10 as the Size 17. Click + to add a Second Coversion 18. Find and Select 'is_active" for Column 19. Click Number as the Type, Type 10 as the Size 20. Drag and Drop another Convert Type Component to Canvas and Connect to the Covert to Number (Convert Type) Component 21. Change its Name to 'Convert to Boolean' 22. Click Change to Conversions Property 23. Change Column to 'is_active' 24. Click Boolean as the Type 25. Connect the output of the Covert Boolean (Convert table) to the Join Table 26. Select the Join Table and Change the Main Tabe to 'Covert to Boolean" (make it the main input) 27. Find the Detect Changes Component 28. Drag and Drop and Connect to the Join Tables component 29. Open the Environments Section 30. Expand Development Environment and the db_dwh Schema 31. Open the Tables and Drag and Drop the dmn_airport Table to the Canvas 32. Connect output of the dmn_airport Table to the Detect Changes Component 33. Select Detect Changes Component and Click on the Master Table Property 34. Select dmn_airport as the Master Table 35. Click match Keys to change the Master Keys property 36. Click and Select the iata column and send to the right hand side 37. Click to change the Compare Columns Property 38. Send all columns to the rights side 39. Select the iata column from the right side and move back to the left. Click Ok to be done 40. Change the Output Column Mapping Property next 41. Click Add All to include all columns. Notice each column has a master and a compare version 42. Find the Aggregate Component 43. Drag and Drop and connect to the Detect Changes Component (this Component checks have many columns are the same, have changed or been deleted) 44. Select Aggregate Component to change its properties 45. Select Groupings Property 46. Find the Indicator column and move to the right side 47. Change the Aggregations Property next 48. Click + to add a new aggregation 49. click iata as the Source column and Count as the Aggregation Type 50. Click Sample Tab and Data to check aggregtion results 51. Notice 3354 rows are Identical, 4 are New, 17 have Changed, 4 have been Deleted

Scheduling a Job

1. Get DWH Orchestration Job on the Canvas 2. We need to change the condition of the 'If' Component in order to force the job to scucceed. 3. Click on Condition 4. Type 1000 in the Value Condition 5. Click on the Project Menu 6. Click Manage Schedules 7. Click + Sign and Type 'My Test Schedule' to create a new Schedule 8. Pick time zone (for training London) 9. Type 16 for the Hours of the Simulation 10. Tye 17 for Simulation of the Minutes 11. Click Monday for Days of the Week 12. Click default for Version (only version you have so far) 13. Click DWH Orchestration for the Job 14. Click development for the Environment 15. Click Test to check/confirm all Parameters set up correctly 16. Check/Confirm no errors and Click OK 17. Notice New Schedule on the List. Click OK 18. Click on Schedule and the Arrow on the Right to Explore all the steps 19. Notice Job finished successfully because the If Component returned True 20. Schedule created to run the DWH Orchestration Job

Using Fixed Iterator Component within Matillion (Loading 3 flied from the same S3 Bucket)

1. Open DWH Orchestration Job 2. In Components Section, exapnd the Iterate Folder 3. Drag th Fixed Interator Component on top of the S3 Load Component 4. Fixed Iterator is then connected to the top of the S3 Load Component 5. Click on Project Menu at top left of UI 6. Click on Managemen Environment Variable (to create the first environment variable) 7. Click + sign to add an Environmental Variable 8. Type Variable Name (in training 'env_flights_year) 9. Click on Copied as the Behavior and click OK 10. Slect the Fixed Iterator Component and Select the "Variables to Iterate" Property 11. Click the Environment Variable you just created (env_flights_year) 12. Move the Variable to the right hand side 13. Specifiy the Iterations Value Property, click on it 14. Click on the + Sign to include the vlaues for the 3 years of flight data you want to load 15. Type the years and plus to enter the years (2016, 2017, 2018) 16. Select the S3 Load Component so we can chane the Pattern setting 17. Click on Pattern 18. Need to link/concatenante the file name to each of the fixed values. We will use the environmental variable created to do this. Type 'trining_s3_flights_$(env_flights_year).gz' 19. Right Click on Canvas to Run the Job 20. Expand the Running Job in the Task tab (Arrow on the right) 21. Click on each Iteration to see how aproximately 1.5 million rows have been processed for each year

Loading Data from and RDS databases running mySQL - Steps (Uses RDS Query component for AWS only)

1. Open Matillion ETL project and right click on version name 2. Click "Add Orchestration Job' 3. Type component name (RDS Query) in Components section to find the component you want to use 4. Drag and drop component close Start 5. Select Name of RDS Query Component to Stage Airports and Enter it in Properties Tab 6. Change/Select Database type (in training to mysql) 7. Select RDS Endpoint (database) (provided in notes) 8. Enter database name (for training 'training') 9. Enter username (for the database) 10. Enter password and store in component 11. Add JDBC Option (required in later versions of METL) (useSSL in training) and type false for Value 12. Add SQL query to get the data from the RDS database (copy and paste from Notes) 13. Click Sample to explore the data 14. Click on table listed on left column to explore the column names and data types 15. Specify Target Table name (training_airports) 16. Select Staging Area - Select an available S3 bucket for AWS 17. Right click on Canvas to run the job 18. Click on the Arrow in Tasks section to explore details of your job while it is running or after it is done. 19. Click on triangle next to job name to expand and see details. 20. Notice 3,372 Rows have been inserted into the Table 21. Click to open Environments section 22. Expand Development environment 23. Expand the db_source schema 24. Open Tables folder 25. See that a new table named 'training_airports' (or whatever name you gave your Target Table) has been created

Loading flight data using the S3 Load Generator Component - Steps

1. Open the DWH Orchestration Job 2. Click to expand the Components section 3. Find the S3 Load Generator (type S3 and select it) 4. Drag and drop the component onto the canvas, near the start 5. Use the path for your Amazon S3 Storage (for training, copy and paste the path provided) 6. Select the file you want as the target (in training, select training_s3_flights_2016.gz) 7. Select applicable compression mode (training, gzip) 8. Click 'Get Sample' to explore the data in the file 9. Click Next 10. Click 'Guess Schema: Matillion ETl will infer the data types you need for your destination table using the first 50 rows from the source data in S3 11. Click 'Empty Field as Null. This Configuration section sows all the common settings used when working with CSV files. 12. Click Finish and notice two new components created: Crate Table and S3 Load 13. Connect Start to Create Table (with mouse) 14. Select Create Table to change its properties 15. Change name to training_flights 16. Change create/replace property from Create to Replace 17. You can now fix the types of any data sets that were not inferred correctly by the system from the first 50 rows sampled 18. Training specific, type 10 for actual_elapsed_time column, for the airtime column, for the taxi_in column, and the taxi_out column 19. Modify New Table Name to 'training_flights' 20. Right click on 'Create Table" to run that Component Only - will create an empty table with the columns and data types specified. 21. Click on the S3 Load Component to modify its properties 22. Select the Target Table that was previously created (in training, trainingss3flights2016) 23. Click on the training_flights table 24. Right Click on canvas to Run the Job 25. Click on Arrow to the right of Run to explore each step of the execution 26. Click on arrow to the left to expand and notice 1.4 million rows 27. Right Click dn_source in Environments to Refresh its list of objects 28. Confirm that the new Table 'Training Flights" is listed

How to create a Dimensions Table - Steps

1. Open the Job 2. Find the Rewrite Table component on the Component section 3. Drag the Rewrite component near the Join component to connect them 4. Change the name of the Rewrite component to what you want (in training, dmn_airport) 5. Change schema (do not want a Dimension being created in the Default schema 6. Select the db_dwh schema by clicking on it 7. Specify Target Table name (in training, use dmn_airport) 8. Run the job (right click on canvas, click Run Job) 9. Explore the Job in the Task Tab by clkcing on the arrow to the right of Run Job in the Tasks Tab 10. Click on the triangle to the left of the Job to expand all job tasks and see them in detail - Notice the 3,372 rows inserted into the table 11. Click to Open Environments section in the bottom left 12. Click to Expand the Development environment 13. Click to Expand the db_dwh schema 14. Click to Open the Tables folder 15. You will see the new table created (in training, named dmn_airport)

Control Business Logic with Transactions (Create a Transaction to perfrom a Commit or a Rollback)

1. Right Click 'default' version of project 2. Click 'Orchestration Job' 3. Type 'Orchestration Transaction Control', as Job Nam 4. Click Yes to Open New Job 5. Click on Transactions Folder in the Components Section 6, Drag and Drop the Begin Component near Start 7. Drag and Drop DWH Orchestration Job close to the Begin Component to connect them. 8. Drag and Drop the Commit Component close o the DWH Orchestration Component to connect them. 9. Drag and Drop the Rollback Component into the Canvas 10. Connect the Failure Output from your DWH Orchestration Component to the Rollback Component 11. Type end in the Search Box to find the End Failure and End Success Components 12. Drag and Drop End Success Component near the Commit Component to Connect them. You want a Success Message if the transaction finishes with Commit 13. Drag and Drop End Failure Component to Canvas and Connect to Rollback. You want a failure message if the transaction finishes with a roll back 14. Right Click on Canvas to Run the Job 15. Expand the Running Job and Explore the Steps 16. Expand the DWH Orchestration Job Steps 17. Notice how the DWH Orchestration Job finished successfully. The flow passed through the Ends Success 0 Components. 18. Notice how the Commit and End Success as Expected in the Output of the Orchestration Transaction Control Job

Cleanse Data and Rank Results (Rank Flights Based on Airtime)

1. Right Click defaul and Add Transformation Job to create a new Transformation Job 2. Name new Transformation Job (in training 'Load_fct_flight_analysis') 3. Remember and find the three views pointing to the training_flights table under the db_source Schema (Environments>development>db_source>Views) 4. Drag and drop all 3 views onto the Canvas (2016, 2017, 2018) 5. Type Unite in the Component Search Box to find the Unite Component 6. Drag and Drop the Unite Component onto the Canvas and connect to all 3 Views 7. Click on Sample Tab 8. Click Row Count to see the Total Number of Rows for the 3 Views (4,320,170) 9. Type Multi in the Component Search box to find the Multi Table Input Component 10. Drag and Drop Multi Table Input to Canvas 11. Click on Properties Tab 12. Change Name to 'view_flight_20%' 13. Change Pattern Property to view_flight_20% 14. Select Columns 15. Select All Columns (two triangles) 16. Click Sample Tab 17. Click Row Count (Notice 4.3 million Rows counted, same as Unite) 18. You can now delete the 3 Views and Unite from Canvas as we are getting the same results using the Multi Table Input Component (named view_flights_20% Component) 19. Find the Filter Component (type fil in the search Box) 20. Drag and Drop Filter Component onto the Canvas. Connect to the Multi Table Input Component 21. Change Filter Component's Name to 'Filter Blank Elapsed Time' 22. Click +Sign to Add a new Filer Condition 23. Type act to find the actual elapsed time columng 24. Click Not as Qualifier and Null as Comparator 25. Click on Sample Tab 26. Click Row count. Notice the number of Rows has decreased 27. Find Rank component 28. Drag and Drop Rank Component next to Filter Component to get them connected 29. Change Name of Rank Component to 'Rank Longest Airtime' 30. Select Columns for grouping the data by changing the Partition Data Property. Click on Partition Data 31. Click Destination Column, move it to the right and click OK 32. Change Ordering within Partitions Property. Click on it 33. Click + and type ac to find the actual elapsed time column and click on it as the Input Column 34. Click Desc as the Ordering option (descending) 35. Modify the Functions property 36. Click + to add a new function 37. Click 'Dense Rank' as the Window Function 38. Type 'longest_airtime_rank' as output column name 39. Select Rank Component on Canvas and click on Sample Tab 40. Click Data to get a Sample 41. Notice a new column has been created on the right side of the data sample based on longest air time

Creating A Transformation Job with Joins (Airports Dimensions Table) - Steps

1. Right click to select project 2. Click 'Add Transformation Job' 3. Add 'Job Name" (your selection) 4. Type Input to find the Table Input component 5. Drag and Drop the Table Input component to the canvas 6. Click on the 3 dots next to Target Table to select Target Table (for training, select training_airports as target table) 7. Click to select the Columns of the Input Table and select the columns you want (for training, select all columns) 8. Drag and drop additional tables you want to the canvas (for training, drag and drop the training_us_state_lookup table in the db_source schema) 9. Go to/Expand the components section and type Join to find the Join component 10. Connect the components to the Join component on the canvas 11. Modify the properties of the Join component 12. Select Main Table and select 'training_airports" 13. Add Alias to Main Table 14. Type/select the alias (in training, type Main) 15. Specify the Join property 16. Click on Plus on lower left to add a condition to the join 17. Click on the secondary table to join the main table to it. (in training click on the training_us_state gov) 18. Type/select alias for second table (in training 'State') 19. Click/select type of join. In training, we select Left, which will keep all the rows coming from the main table even if they do not match. 20 Modify join expressions: click on the arrow next to column you want to modify (in training, click on Main State) 21. Use menu below selected expressions to modify (in training: click on equal sign, click on State Code column, 22. Select Output columns (click on 3 dots next to Output) - In training, click Add All to add all the columns 23. You can delete duplicated columns. (in training, select and delete state_code by clicking on minus sign). 24. Done Joining two datasets (in training, airport data to the US States data)

Creating a Date Dimension from a Sequence of Numbers

1. Roght Click default and Add Transformation Job 2. Type Job Name (in training Load dmn_date) 3. Find Generate Sequence Component and Drop to Canvas 4. Right Click on Canvas and Click Manage Job Variables 5. Click + Sign twice to add two Job Valiables 6. Name first Variable 'job_max_records' 7. Select Numeric as Type, Copied as Behavior, and type 3,000 as default value 8. Name second Variable 'job_start_date' 9. Select DateTime as Type, Copied as Behavior, and type 2012-01-01 as the default value 10. Select and Change Name of Generate Sequence Component to 'Generate Date Sequence' 11. Change Starting Value to 0 12. Change End Value to ${job_max_records} to reference the default value contained in the Job Variable 13. Click Sample and Data to see that a sequence of numbers has been created using the Generate Sequence and a Job Variable. 14. Find Calculator Component and Drop and connect to the Generate Sequence Component 15. Click on it, Click Properties, and Change Name to 'Calculate Y.M.D.' 16. Change the Calculations Property (click on the 3 dots) 17. Click + Sign to Add new expression 18. Type 'starting_date' as Name of expression 19. Copy and Paste the expression provided on the Notes (CAST('${job_start_date}' as date) 20. Click + to add Second Expression 21. Name it 'calendar_date' 22. Click the arrow next to starting_date expression to add it to our second expression 23. Click the + symbol (addition symbol) below the expression window 24. Click the arrow next to the sequence column 25. Add a third expression (click + sign) 26. Name it 'calendar_year' 27. Copy and paste the expression from the Notes: DATE_PART(year, "calendar_date") 28. Add a fourth expression and name it 'calendar_month' 29. Copy and paste the expression from the Notes: DATE_PART(mm, "calendar_date") 30. Add a fifth expression and name it 'calendar_week' 31. Copy and paste the expression from the Notes: DATE_PART(week, "calendar_date") 32. Add a sixth expression and name it 'calendar_day' 33. Copy and paste the expression from the Notes: DATE_PART(day, "calendar_date") 34. Click OK to finish 35. Click Sample and Data to see new calculations 36. Drag and Drop another Calculator and connect to previous Calculator component 37. Change its Name to 'Calculate Relative Fields' 38. Click Calculations to Add new Calculations 39. Click + to add a new calculation 40. Name first relative expression 'relative_year' 41. Copy and paste the expression from the Notes: DATEDIFF (years, CURRENT_DATE, 'calendar date") 42. Add a second expression and name it 'relative-month' 43. Copy and paste the expression from the Notes: DATEDIFF (months, CURRENT_DATE, 'calendar date") 44. Add a third expression and name it 'relative-day' 45. Copy and paste the expression from the Notes: DATEDIFF (days, CURRENT_DATE, 'calendar date") 46. Click OK to be done 47. Click Sample and Data 48. Scroll to the right and see the three new columns created for the relative fields

Using the If and And logic opereations to control the buisness logic in an Orchestration Job

1. Select the DWH Prjectration Orchestration Job to get it on Canvas 2. Click on Project Menu 3. Click Manage Environment Variables 4. Click +Sign to create a new Environment Variable 5. Create 'env_query_row_count' varaible (type it in the Name box). To be used to store the total Rows coming from the RDS Query component 6. Click Numeric as the Type of variable 7. Click Copied as the Behavior of variable 8. After Selecting the RDS Query Component on the Canvas (this case Stage Airport from earlier), Click on the Export Tab 9. Click Edit 10. Click + Sign to add a mapping from the Components internal variables to the newly created Environmental Variable 11. Click Row COunt from the Source Menu 12. Click the new Variable created from the Target Variable Menu (in this case 'env_query_row_count') 13. Type 'If' in the Components Search Box to find it 14. Drag 'if" comonent near the RDS QUery Component (Stage Airport) to connect them 15. Add a new Condition by clicking on the '...' in the Properties Tab 16. Click + sign and yype 'env_q" to find the previously created environment variable (env_query_row_count) 17. Click on the 'env_query_row_count' Environment Variable 18. Leave Qualifier "is"; Click on 'Greter than or equal to' in the Comparatator Column 19. Type 1000 in the Value window 20. Type And to find the And component in the search box in the Components Section 21. Drag and connect the And Component to the Fixed Iterator Component. 22. Type End in the Search Box to find the End Failure Component 23. Drag and Drop End Failure Component into the Canvas and Connect to the False Output if Component (If 0) 24. Drag and Drop End Success component close to the And Component to get them connected 25. Connect the Ture Output if Component to the And Component 26. Right Click on Canvas to run the Job 27. Expan the Running Job in the Task section (arrow on the right) 28. Explore all the Steps of the Running Job (click on triange to expand) 29. See/confirm that the Output of the If Component was Ture because the total number of Rows exceeded 1000 30. Change the Condition property in the If Component to force a failure 31. Click on Condition and Type 10,000 in the Value Property 32. Right Click on Canvas to Run the Job again. 33. Expand the Running Job in the Task Tab 34. Output of If Component was falske becase the total number of Rows from the RDS Query Component did not eceed 10,000 35. Used logical opertors to control the output of the Job

Move files, write a python script, and use Password Manager

1. Working on the DWH Orchestration Job 2. Find, drag and connect the Data Transfer Component to the Start Component 3. Delete the existing connection between the Start and Create Training Flight (Create Table) Component 4. Change Name of Data Transfer Component to 'Copy Holidays Excel File' 6. Change Source Type to S3 7. Change Source URL 8. Enter S3 path s3://<bucket>/<path> and click Go 9. Select TrainingUKNationalHolidays.xlsx file 10. Change Target Type to S3 11. Change Target Object Name to 'UK_Holidays.xlsx' 12. Change Target URL and select the S3 bucket you want to copy the excel file to (in training 'internal-sf-attendee-scbucket-9sbjsh3xy3v5') 13. Right click the Copy Holidahs Excel File (Data Transfer) Component to only run this component and Click Run Component 14. Check Tasks Tan to see that the Job has Run 15. Find ExcelQuery Component and Drag and Connect to the Copy Holidahs Excel File (Data Transfer) 16. Select and Change Name of Excel Query Component to 'Stage UK Holidays' 17. Change the Storage URL property (where the file is stored) to wher you have copied it (in training 'internal-sf-attendee-scbucket-9sbjsh3xy3v5') 18. Click the file name to select it ('UK_Holidays.xlsx) 19. Change Data Source to UKHolidays (this is a tab in the Excel File) 20. Change Data Selection: Click on it, Click on Date and Holiday Name fields to Select them (green triangle to move from left to right column) 21. Change Target Table to 'training_uk_holidays' 22. Change the S3 Staging Area to the bucket previousy used as Staging Area (in training, 'internal-sf-attendee-scbucket-9sbjsh3xy3v5') 23. Connect Stage UK Holidays (Excel Query) Component output to the Create training_flights (Create Table) component 24. Right Click on Stage UK Holidays (Excel Query) Component to execute it individually - Click Run Component 25. Click the Environments Section (lower left) and explore and confirm that the holidays table has been created. 26. Right Click 'db_source Schema to refresh it 27. db_source>Tables>training_uk_holidays (confirm creation) 28. Find Python Script Component and Drag and Connect to Start Component 29. Connect Python to the Copy Holiday Excel File (Data Transfer) Component and the Stage Airports (RDS Query) Components 30. Click Script to Add or Write a Script 31. Copy and Paste the Python Script from the Notes. This will raise and Exception depending on the day of the week, forcing the job to fail 32. Change the day Monday to Wednesday and Click Ok 33. Right Click on Canvas and Run the Job 34. Check the Task tab and confirm that Job has failed 35. Right Click the Python Script Component and Disable It 36. Run the Job again and confirm that it has run properly 37. Click Project to create a password 38. Click Manage Passwords 39. Click + to Add a Password 40. Name Password 'RDS_Password' 41. Copy and Paste the password from the Notes 42. Chose Encoded as encryption type 43. Click OK and confirm creation of new password 44. Select Stage Airports (RDS Query) Component and click on Property Tab and click on Password to change it 45. Click to Activate the Use Password Manager option 46. Click the RDS_Password label from the menu. It ill substitute this password 47. Run the Job again and confirm that it has finished successfully on the Tasks Tab

Organize and Orchestration Job based on business logic and getting new changes for airport data

1. Working on the DWH Orchestration Job 2. Select the Create training flights (Create Table) Component, S3 Load (S3 Component) and the connector component abouve the S3 3. Right Click on them and Extract to New Job 4. Type Sub Orchestration - S3 Flights Load as new job name and click OK 5. Connect the output from sthe Stage UK Holidays Excel Component to the new Orchestration Job and connect the output from the Orchestration Job to the And Component 6. Notice new job created on the left hand side menu 7. Double Click the new Orchestration Job to open it 8. New Job has the components we extracted and the Start Component 9. Click to Close the Job 10. Click the Project Menu to crate an API Profile 11. Click Manage API Profiles and then Click Manage Query Profiles 12. Click + sign to add a new profile and name it 'Training-Airport Changes API' 13. Click the Gear (cog) next to it to confrigure the new API profile 14. Click New Endpoint and name it 'api_airports' Adding a description is option. Click next when done 15. Copy and Paste the endpoint URI from Notes (https://fc1 . . .) 16. Go Auth Tab and Enable basic authentication 17. Type apiuser as the user parameter and copy and paste the password from the notes and click on Send 18. If connection successful, you should get Validated successfully with no errors message. Then click Next 19. View the metadata pulled from the API on left column. If all good, click Next 20. Final Review Screen lets you review the endpoint that was configured plus a sample of the data. If happy with it, click Finish 21. Click Test to generate a new Table 22. Click on the api_airports under the Tables meny to explore it; data is shown on the right hand side; Click ok to exit when done and ok again to exit the API menu 23. Find the API Query Component and Drag it. Connect it between the Python Component and the Copy Holiday Excel File (Data Transfer) Component so that Output from API Query Component goes to the Copy Holiday Excel File (Data Transfer) Component 24. Change Name of API Query Component to Load Changes US Airports 25. Change the Profile property to Training-Airport Changes API (select it) 26. Change Connection Options property 27. Click + to add new parameter, select Other parameter and copy and paste value from Notes (csvfilename=training_s3_airports); click OK when done 28. Select Data Source to api_airports (table previously created) 29. Change Data Selection and Move all columns to the right side 30. Change Limit to blank (delete 100) 31. Chanage Target Table to training_airport_changes 32. Select S3 Staging Area (sellect the s3 bucket you want to use) 33. Change Authentication Method. Need to do this because person creating the API profile might be different than the person using the API query 34. Select User/Pasword 35. Change Username to apiuser 36. Change password by pasting the password from the notes 37. Right Click the API query component to run it by itself 38. Expand the running task and explore (right arron and green triangle). Confirm that 3375 rows loaded from the API Call 39. Click to expand the Development Environment 40. Right click db_source Schema to refresh it 41. Open the Tables folder and see the new table training_airport_changes

Calculating Days per Month / Tidying up Output Data

1. Working on the Date Dimensions Job 2. Find the Windows Calculation Component and Drag to Canvas and connect to Join Component 3. Change its Name to Calculate Days in Months 4. Change Partition Data property 5. Select the calendar_year column and the calendar_month columns and move them to the right; Click OK when done 6. Change the Ordering within Partitions property next 7. Click + to add a Partition 8. Select calendar_date as the Input Column and click OK 9. Next Change the Lower Bound property by selecting 'unbounded preceding' 10. Next Change the Upper Bound property by selecting 'Unbounded Following' 11. Next, we can specify the Functions property 12. Click + to Add a New Function 13. Select Count as the Windows function 14. Select calendar_date as the Input Column 15. Type in 'days_in_months it the Output Column and Click OK 16. Find the Rename Component and drag to Canvas and connect to Calucaltes Days in Months Component 17. Change the Rename Component name to 'Tidy Output' 18. Change the Column Mapping property 19. Click Add All to include all the columns 20. Select and Delete the Sequence Column as it is not needed for the final output (select and minus sign to delete) 21. Starting_date column is also not needed and can be deleted 22. Sample and Data to see the results 23. See the new day_in_month Column has been created 24. Now find the Rewrite Table component and Drag and connect to the Tidy Output (Rename Component) Component 25. Change its Name to 'dmn_date' 26. Change theSchema; Click on Schema and Click on Use Variable to select the Environment Variable previously created 27. Type env_ to find all environmental variables 28. Click on '${env_dwh_schema}' to repleace the Schema Name 29. Add a Target Table and name it 'dmn_date' 30. Run the Job by Right Clicking on the Canvas and Click on Run Job 31. Expand the Running Job on the Task section (arrow on right) 32. Explore the Steps in the Job (click on triange to expand) 33. Notice 3,001 Rows have been inserted in the dmn_dates Table 34. Open up and explore the Environments section to find the new dimensions table 35. Environments>development>db_dwh>Tables>dmn_date

Creating a Fact Table (for Flight Analysis)

1. Working on the Loact fct_flights_analysis Job 2. FInd the Rewrite Table Component 3. Drag and Drop to Connect to Calcualte Distance Fields Component 4. Click on its Properties 5. Change Name to 'fct_flightanalysis' 6. Change Destination Schema - Click on Schema 7. Click on 'db_dwh' as the Destination Schema 8. Add Target Table Name by typing 'fct_flightanalysis' 9. Right Click on Canvas to Run the Job 10. Open the Running Job on the Task Tab (Arrow on the Right) 11. Expand to Explore Job Steps - Notice the 4.22 million rows 12. Click the Environments Section to Open it 13. Open the Development Environment 14. Expand the db_dwh Schema 15. Open the Tables Folder 16. Confirm that fct_flightanalysis Table is created

Incorporate UK Holidays to the Date Dimension

1. Working on the Load dmn_date Job 2. Find SQl Component and Drag to Canvas 3. Connect ouput of SQL Component to the Join Component 4. Change SQL Component's Name to 'UK Holidays' 5. Add a SQL Query: Copy and Paste from the Notes (select "date", "Holiday Name" from "Interna-Training_candidate_1"."db_source"."training_uk_holidays"| 6. Click Run to test the query 7. Select Join Component and change the Join setting to incorporte the holidays data 8. Click + to add new Join 9. Click 'UKHolidays' and type ukholidays as Join Alias and select left as Join Type 10. Edit Join Expressions next 11. Click on main-left_ukholidays expresion 12. Click main.calendar_date column to include in the expresion 13. Click on equal sign and the ukholidays.Date to include in the expression; click ok to be done 14. Click Output Column to add a new column 15. Click + sign and find and select the ukholidays.Holiday Name column 16. Type holiday_name as the output column name 17. Select the Tidy Output (Rename) Component and Click on Column Mapping to add the new column 18. Click + sign to add the column 19. Find and select the holiday_name column and select it as the Source Column and the Target Column Name 20. Sample and Data to see the new column. Scroll down to see the different holiday names in sumoe of the rows

BDW 3-9: Update Dimension Tables with New Changes

1. Working on the Transformation Job Load dmn-airport (with changes) 2. Find the Filter Component 3. Drag and Drop and Connect to the Detect Changes Component 4. Change its Name to 'New Records' 5. Change Filter Conditions Property 6. Click + to add a New Condition 7. Find and Select Indicator Column 8. Select Equal to as the Comparator and 'N' as the Value 9. Find the Table Output Component and Drag and Connect to the New Records (Filter) Component 10. Change the Name of the Table Output Component to 'dmn_airport - New' 11. Change the Schema: Click Use Variable and find and select the ${env_dwh_schema} 12. Select/Change the Target Table to dmn_airport 13. Modify the Column Mapping Property 14. Click Add All to include all columns 15. Select all columns with mouse except the iata column and click OK 16. Click Sample and Data 17. Click to get the Row Count. Notice 4 Rows; same as the one from the previous module 18. Copy and Paste the New Records (Filter) Component and place it under the Original on the Canvas 19. Connect output of Detect Changes Component to the new Copy of New Records (Filter) Component 20. Change its Name to Deleted Records 21. Update Filter Conditions: Change Value from N to D (new to delete) 22. Find the Calculator Component, Drag and Connect to the Delete Records (Filter) Component 23. Change its Name to Set Active 24. Add a Calculation 25. Click + to add an expression and type 'is_active" as expression name 26. Type 'false" as the expression (this will represent a soft delete of the is_active column 27. Find the Table Update component 28. Drag and Drop and connect to the Set Active (Calculator) Component 29. Select to Update the Properties of the Table Update Component 30. Change its Name to 'dmn_airport - Soft Deletes 31. Select the Schema, click Use Variable to find and select ${env_dwh_schema} 32. Change/Select Target Table to dmn_airport 33. Specifiy a Join Expression 34. Select target.iata, click =, Select input.iata, click OK 35. Change When Matched Property (to specify Operation we want to make as mart of the update) 36. Clicl + and Add a New Case 37. Delete Default Syntax and select target.iata, click =, select input.iata, click OK 38. Operations should be Update (confirm) 39. Change the Update Mapping Property next 40. Click + and find the 'is_active' and select it for the Input Column 41. Find and select the 'is_active' Output Column 42. Copy and Paste the Deleted Records (Filter) Component and Connect to the Output of the Detect Changes Component 43. Change its Name to Changed Records 44. Change Filter Conditions, Change Value from D to C (delete to change) 45. Drag a new Table Update Component to Canvas and Connec to Change Records (Filter) Component 46. Change its Name to 'dmn_airport - Changed Records 47. Select the Schema: click Use Variable to find and select ${env_dwh_schema} 48. Change/Select Target Table to dmn_airport 49. Change/Specify Join Expression 50. Add target.iata to the expression, click =, select input.iata, click OK 51. Change the When Matched property next 52. Click + to add new condition, click ... to add new case 53. Delete defaul expression, select target.iata, click +, select input.iata, click OK 54. Confirm that the Operation is Update 55. Change the Update Mapping Propety next 56. Click Add All and select all column options except iata (which is the match key) and click OK 57. Go to Sample Tab and Click Data (after selecting the Aggregate Component); check how many rows are Identical, New, Changed, or Deleted 58. Right Click on Canvas to Run the Job 59. After Job Runs, Select Aggregate Component, Sample Tab and Data again 60. Notice 3375 rows are now Identical and one delete. This is because you performed a soft delete throught Calculator component, which flagged this row as Active.

Data Enrichment - Adding Month and Day Names to Date Dimensions

1. Working on the same Load dmn_date Transofrmation Job from before 2. Right Click on Cavas to Add a Grid Variable by clicking on Manage Grid Variables 3. Click + Sign to Add a Grid Variable 4. Name the Grid Variable 'grid_month_names' 5. Click 'Text Mode' to add the Column Nmae and Column Type more easily 6. Copy and Paste the Text from Notes Input_Column Text Qualfier Text Comparator Text Comparison _Value Numeric Then_Value Text 7. Click Text Mode to disable it 8. Confirm that Columns have been correctly created. Click Next to end this section 9. Click Text Mode to add the Values 10. Copy and paste from the Notest to Add Month Names 11. Click Text Mode to Disable 12. Confirm all Values are Correct. If so, Click OK 13. You can see the newly created Grid Variable 14. Click OK 15. Find the Map Values Component 16. Drag to Canvas and Coonect to the Calculate Relative Fields Component 17. Click on it and Change its Name to 'Add Month Names' 18. Cilick on Value Map to select the mapping using the grid variable 19. Click Use Grid Variable 20. Choose previously created Grid Variable named grid_month_names 21. Select Input_Column to map to the Input Column Box 22. Click on Qualfier for Qualfier Box 23. Click Comparator for Comparator Box 24. Click Comparison_Value to map the column Comparison Value 25. Click Then_Value to map the column Then Value 26. Click OK when Done and See how the Grid Variable has been slected as part of the Property (for the Value Map) 27. Click Other to specific which value will be used when none of the values from the Value Map property match 28. Type null as the Other value; Click Ok to select null 29. Click 'Column name' and name it 'month_name' 30. Click on Sampe and Data to see Results 31. Scroll to the right to see the new Month Name columng 32. Click to Open the Environment Panel (lower left) 33. Drag the training_date_day_lookup Tabe and drop to the Canvas 34. Search and find the Transpose Columns Component 35. Drag and Connect to the training _data_day_lookup Table 36. Change the Name of the Transpose Columns component to 'Transpose Column - Day lookup' 37. Change the Ordinary Columns Property; Click on it 38. Click Year and Move it to the Left Column using the arrow 39. Add a Row Label Name to add a label to the Rows by typing 'day_name' 40. Click to add the Output Columns 41. Click + Sign to Add a New Output Column and name it 'date' and make its type a 'VARCHAR' (varchart) 42. Click to Change the Column to Row Mapping 43. Click Text Mode to add the mapping provided 44. Copy and Paste (convert no caps to Title Case for Days of the week) and once you confirm, Select Text Mode to disable it. 45.Check that the Columns have been correctly inputed and click on OK 46. Find the Convert Type component and Connect to the Transpose Columns - Day Lookup Component 47. Select Transpose Columns and click on the Metadata Tab 48. Notice that the Date component Type is a VARCHAR. We want to change it to be a date 49. Change the Name of the Convert Type Component to 'Convert Date' 50. Click on Conversions to Add Conversions 51. Click on + Sign to Add a Conversion 52. Click date to select the Column 53. Select Date as the new data type and click OK 54. Go back to metadata tab and see that we converted the Column Date to a type Date 55. Find the Joint Component and drag to Canvas and connect to the Map Values and the Covert Date Components. 56. Change the proprties of the Join Component 57. Click on Main Table and select Add Month Names as the Main Table 58. Specify the Main Table alias as 'main' 59. Click on Joins to specific the Joins property 60. Click + sign to Add a new join 61. Choose Convert Date as the Join Table 62. Type in 'days' as the alias 63. Select 'Left' as the Join Type and Click OK 64. Click on Join Expressions to Specify the Join Expressions 65. Click Arrow net to main.calendar_date column to Add it to the Expression 66. Add an Equal Sign by clicking on it 67. Click Arrow next to days.date column and click OK 68. Click on Output Column in the Properties Tab 69. Click Add All to include all the columns 70. You can delete the days_date column as it has the same data as the column min.calendara.ddate. Click on It and then Click on the Minus sign 71. Similarly, delete the days_tear column the same way 72. Click on Sampe and Data to explore the data 73. Notice new column day_name is now there on the left.

Maximum Number of Environments for Matillion Hub users

999

Staging Area

A bucket Matillion uses to copy the data before loading it into the data warehouse (S3 for AWS)

Fact Table

A fact is a value, or measurement, which represents a fact about the managed entity (stored in the dimension tables).

AWS Namespace

A namespace is a way to group services for an application. When you create a namespace, you specify how you want to discover service instances that you register with AWS Cloud Map: using API calls or using DNS queries. You also specify the name that you want your application to use to discover instances.

Dimension Table

A structure that categorizes facts and measures in order to enable users to answer business questions.

Data warehouse

A system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise

Views (list View components)

A view is a mechanism for querying data. When you create a query on your database, you can store the query (the SQL commands that perform the query) as a view and make it available to use elsewhere. A View contains rows and columns of table, similar to a table. However, these can be drawn from multiple different tables. A view can be configured to filter the data retrieved from the underlying tables and perform some aggregation or calculation on the data (sum, average, etc.) In Matillion, you can have the Transformation Job Create a View instead of writing to a New Table. You can examine the Data that results from the Transformation without the resources to create a new table in your cloud data warehouse. Matillion has the following View Components: Create View: Lets you output a view definition to your CDW Assert View: Lets you verify that certain conditions are true of a view, and stop the query if they are not Refresh Materialized View (Redshift): Identifies changes to a table and applies changes to the materialized view Refresh Materialized View (BigQuery): Identifies changes to a table and applies changes to the materialized view

Creating an Environment

Click Project>Add Environment (Opens up Create Environment wizard) All Environments in current instance are listed in the Environments Panel in the lower left of the UI - current environment is labelled [Selected] To Switch to an Environment, Right Click on it and Click Select To Edit an Environment, Right Click on it and Click Edit You can also Add environments directly from the Environments Panel

Task History (including what can be seen)

Click Project>Task History to Open Task History Completed Tasks (successor failed) are show in Tak History along with all Details (e.g Job, Environment, Start Time, End Time, Duration, etc.) You can Add or Remove columns from the Task History Display You can Filter Task History according to the Column you selected (e.g. you can filer by Start Date and Time or by Task Type)

Component Exports

All Orchestration Components and Some Transformation Componens support exporting runtime information into variables during job execution Component Exports are also used to populate the Tasks View and the Run History To Export a Value, you need to map it to an existing Job Variable or Environment Variable: 1. Click on the Component on Canvas and click the Export Tab 2. Click Edit to Open the Export Variables Dialog 3. Click + to Create a New Mapping 4. Select the Desired Export Variable from the drop-down in the Source column 5. Selet a Target Variable from the Target Variable drop-down 6. Click Ok Note: Type of target varuable must match the type of source variable (e.g. string, integer, DateTime, etc. Runtime information exproted to job variables is only avaiable while the job is running ( no longer avaiable after the job finsihes running).

Query Components

Allow Matillion to query data from various external sources and load the data into a table ready for transformation Usually invovles staging the data on a file stroage ready

RDS

Amazon Relational Database Service: A fully managed SQL database services provided by AWS. Amazon RDS supports an array of database engines to store and organize data. It also helps with relational database management tasks, such as data migration, backup, recovery and patching.

Environments

An environment describes how your Matillion ETL instance connects to a data warehouses and which sets of cloud platform credentials to use. Environments exist inside projects Before starting any job in Matillion ETL, you must set up an Environment Multiple environments can be set up at a user level (different users can use different environments in the same instance) When you run a job, the Job runs in the environment currently in use by that user

Canvas

Area of the UI where Components are laid out within a Job

Assert External Table

Assert the Metadata and Row Count of an External Table

Creating Environment Variables

Click Project>manage Environment Tables (opens up Manage Environment Variables Panel (shows all existing Environment Variables Clikck + to Create a new Environment Variable Provide the Following Details for the new Environment Variable Name: Follow guidelines Type: Text, Numeric, DataTime (Text is default) Behavior: Shared or Copied (Shared is the Default) Default Enviornment Value: Can be blank, but best practice is to give a reasonable value Description: Free form text description Same method to edit and existing Environmental Variable

Variable Behavior (branch behavior)

Branch Behavior Copied: Variables can be updated within one branch of a job without updating variables of the same name in other branches. A branch using a copied variable will begin by taking its default value, even if this variable has been updated in another, parallel branch. Shared: Shared variables are updated in a job-wide fashion. As such, regardless of branches, if a Shared variable is updated in one branch, all other branches will use that updated value.

Matillion Keyboard Shortcuts

CTRL+Z: Undo CTRL+Y: Redo CTRL+A: Select All CTRL+Return: Run Job if on Canvas or close Pop Up Window CTRL+Shift+C: Open a Search WIndow listing all jobs in current project CTRL+T: Open the Search tab in the Task panel

Load Status

Can view status of all ongoing data loads in a single window through Project>Load Status When loading stops for any reason (complete, cancel, failure), the load is removed from the window

Using Shared Jobs

Click and Open the Shared Jobs Panel in the Lower Left of the UI Right Click on a Shared Job Options Open Shread Job: Read Only View of the Shared Job so you can navigate throught the Jobs and Components of the Shared Job Unpack: Unpacks the Constitutent Jobs that ware included when the Shared Job was generated (orchestration and transformation jobs within it). You can edit these and regenerate the Shared Job Export: Opens the Export Shared Jobs dialog, which lets you download the Shared Job as a .melt file that can then be imported into another Matillion Instance

Creating a Shared Job

Click on any Job in the Projects Job List and click Generate Shared Job Opens up the Generate Shared Job Wizzard to Configure the shared Job

Shared Job Configuration

Contains 3 Pages of Configureaton Settings Page 1: Create New or Existing If Existing, select from Menu Package Name: for new Shared Job (can specify nested packages) Name: of the new Shared Job Revision Number: When creating a new shared job, default is 1 Icon: Browse and locate. Must be a .png file. Matillion provides a default icon Description: A meta description of the shared job. Will be visible in Shared Job's Help section. Simple HTML can be used for formatting Page 2: Root Job: The job that the Shared Job originates from ; default is the Job selected when Generate Job was clicked. You can change it. Additional Jobs: Should include any jobs that the root job is expected to call. Any jobs called using the Run Orhestration or Run Transformation Components within the Root Job are added by default. Add any others that may be dynamically called manually + to add a Job Click Clear to Remove a Job Auto to automaticaly list all explicitly called Jobs Page 3 - Paramter Configuration: Automatically populated but can be configured Name: The name of this job variable as it exists in the Root Job Display Name: The Name of the paramter created from this job variable Required: This setting can be enabled or disabled for each parameter. Default is enabled

URL Safe Characters

Conversion of special characters to URL-safe codes (aka Percent encoding) when using usernames or passwords in a Component property E.g. ! becomes %21

Tasks

Created whenever an Orchestration or Transformation operation is perfomed by Matillion ETL Types of Tasks Tasks created by user Operations Tasks created by the Scheduler Tasks run via Matillion ETL API For Redshift user only - Tasks created from Amazon SQS (Simple Queue Service)

Schema (General Definition) - What does it define?

Defines the structure of something. In Relational Databases, Schema defines: Tables Fields Relationships Functions Views Indexes Types Other elements

Managing Passwords - From Project Menu (list cloud platform services)

Many components in Matillion ETL require passwords to provide access to various services on behalf of the user Password manager provides an alternative to individually stored component passwords, and allows the user to store passwords as named entries. When a component requests a password, the identifying name can be entered and will draw the corresponding password from the manager. Thereafter, if a password should change, the password need only be edited once in Manage Passwords and not in individual components. There are two ways in which passwords can be managed in Matillion ETL: Internal: The password is stored on the Matillion ETL server. Internal passwords can use various encryption methods, with availability depending on the cloud platform you are using. These are: AWS Key Management Services (AWS platforms) Key Vault Store (Azure platforms) GCP Key Management Services (GCP platforms) External: The password is stored in a third-party secret manager.

Text Mode (what does it allow you to do - list)

Many menus contain a Text Mode Option which lets the user: Edit variables as a set of space-delimited text entries Convert all existing variable data into an appropriate format Copy and paste text, offering a fast way to move large numbers of variables Have certain types of missing data auto-completed by Matillion ETL when parsing Error check data before adding it to the project (Matillion ETL parses the data on exiting Text Mode, throwing a message if an error has been made)

Job Documentation

Matillion ETL allows users to generate documentation for any job; Right Click on Explorer Panel and select Generate Job Documentation > Generates a .html file Html file has a table of content that lets you navigate to each Job Component and examine details

Manage Schedules (from Project Menu)

Matillion ETL features a scheduler that will launch orchestration jobs automatically at a pre-defined, regular time interval. For scheduled or queued Matillion ETL jobs to run, the Matillion ETL instance must be running. Schedules are created against a project, and users can set up multiple schedules. The scheduler is based on Quartz technology. Project>Manage Schedules to Access the Scheduled Jobs Menu Properties Shown: Schedule: The name of the schedule. Last Run: The timestamp this schedule was last run. Next Run: The timestamp of the next time this schedule is to be run. Status: Whether this schedule is configured properly and activated for runs, or stopped. Run Now: Run the schedule. This schedule's Status must be Enabled before it is run. Edit: Allows you to edit details of the schedule. See the Creating Schedules section for more information. Delete: The delete icon, X, deletes and disables the schedule. Click on a Schedule to display the informtion on the times this schedule has executed: Version: The name of the version the job existed in at the time of this run. Job: The job the selected schedule executed. Environment: The environment in which the job was executed. Started: The time this schedule run began. Duration: The time taken for the job to complete. Run Status: The job end status for this schedule run. Either failure (red icon) or success (green icon). Task Info: Click for a breakdown of the components in this job run, their run times, row counts and messages. Plus Refresh: The Refresh button updates the dialog for any unloaded runs. Full History: The schedule history for this specific schedule. Click on + in the Manage Schedules Dialog to Create a New Schedule. Need the following properties for the schedule: Name Enabled (or Disabled) Timezome Hours (of the day to run the schedule, can be multiple hours) Minutes (of the hour to run the scheduled job, can be multiple times per hour) Days of the Week and Days of the Month Version (Matillion ETL project version to use in this Schedule) Job: Select the Job to schedule EnvironmentL Selct the environment Run job in clone: Only available in Enterprise from Snowflake Configre Run Options: Configure clone options Ignore misfires: If instance is not running during schedule, run immediately or ignore missed schedule time Test: To make sure schedule configured correctly You can disable schedule all together by Project You can also prevent Duplicate Job by clicking on it while creating a schedule (prevents high frequency jobs from running before prior one has finished)

Scope of Matillion ETL Features

Matillion ETL has many features that are not configured globally and can differ from project to project. These are common features and how they are scoped. Scope of features are most commonly by: Instance Project Group Project Job

Date and Time Methods (what are the methods - list them)

Matillion ETL offers a set of proprietary methods for use with Date and DateTime variables to ease the burden of handling the multiple formats and common calculations placed on the user. .now(): A new date object with a current server timestamp .add("<units>", <integer>): Add a specified length of time to a timestamp. Allowed units are (seconds, minutes, hours, days, weeks, months, years) .format("<pattern>"): Change the format of a timestamp. Allowed styles: yyyy/MM/dd or HH:mm:ss.SSS .parse("<pattern>", <stringDate>): Parse a piece of a timestamp according to a given pattern. (i.e.. takes a string and converts it into a Date object; e.g. 2016 is turned into a date and is stored in the dt variable as ${dt.parse("yyyy", "2016")}

Automatic Environment Variables

Matillion comes with a number of Environment Variables that are automatically available without having to be created (grouped by type): project_group_name, project_group_id, project_name, project_id version_name, version_id environment_name, environment_default_schema, environment_database, environment_endpoint, environment_id, environment_port environment_username job_name, job_id component_name, component_id, component_message run_history_id detailed_error task_id

Task Queues

Matillion uses Task Queues internally to manage tasks There is one queue per environment Waiting icon is used to show subsequent tasks behind a long-running task Tasks initiated by the Scheduler and SQS queue listener will also queue behind long running tasks in the same environment Setting up multiple environments to separate development and production tasks are highly recommended so that tasks in development do not impact production tasks

Jobs

Matillion's way of designing, organizing, and executing workflows Most common usage is to build strings of configured components inside a job and then run that job to acomplish a task such as loading data or transforming data Two Main Job Types: Orchestration Jobs Transformation Jobs You can nest workflows. Put a Run Transformation Job inside a Run Orchestration Job. Nest an Orchestration Job Inside another Orchestration Job

Shared Jobs

Pacakged Jobs, created by users, that can be used similarly to how a Component can be used in other workflows Like a custom Component Only Orchestration Jobs (and the Transformation Jobs they link to) can be used as Shared Jobs Can decrease the complexity of their workspace by packaging complex workflows into a single shared job that can be used in other workflows

Incremental Loading

Practice of loading only records that are new or changed since the previous load Users should schedule their incremental load jobs to run periodically for the Job to continually update the created Tables When an Incremental Tool is dragged to the Canvas a Configuration Wizard automatically opens up Wizard contains, data sources. columns, authentication properties, and staging options (among others)

Replace Property

Recreates a Table every time you run a Job

Creating Grid Variables

Right Click on the Job Click Manage Grid Variables Click + in the Bottom to Create a New Variable Click + next to a Variable to see its Description Edit the Variable Delete the Variable Creating a New Grid Variable Grid Properties Name Behavior: Variables branch behavior inside a Job (how the variable is updated when more than single job brand is making use of it. Visibility: Public or Private (ca't be discovered and overwritten when job is call from a Run Orchestraion or Run Transformation Component) Description : Text description Grid Columns Column Name Column Type: Data Type (text, numeric, or DataTime) Second Page of Create New Grid Variable, you can define default values (do not need to)

Creating Job Variables

Right Click on the Job within the Exploration Panel Select Manage Variables to bring up the Mange Job Variables Dialog. Can also click on a component in the Canvas to get there Manage Job Variables dialog shows all existing Job Variables You can add new one by clicking + Select: Name, Type, Behavior, Visbility, and Value

Canceling Tasks

Right click on Task in Task Panel and click Cancel. Cancells all queued sub-tasks If Task is stuck and won't cancel, Ctrl-Right Click to reveal the Cancel Task and Continue Option. This will force cancellation of the Task

Converting Views to Tables

Some Matillion Operations do not support Views. You should convert your View to a Table in these cases. 1. Create a Transformation Job with a Table Input Component connected to a Table Output Component 2. Select your view in the Table property of the Table Input Component 3. Use Metadata Tab in Text Mode to Copy your view's metadata 4. Create an Orchestration Job and attach a Create Table Component to Start 5. Paste the metadata into the Table Metadata Property of the Create Table Component and Run the Job 6. Return to the Transformation Job and add a Table Output Component to place the data from our view into the created table and run the Transformation Job You now have a new Table with the View Data

Grid Variables

Special type of job variable that can be declared in Matillion ETL. Grid variables are 2D arrays that hold scalar values in named columns. They are used in Components where a list of data needs to be passed around Grid variables can be used to populate choices in multiple properties within Data Staging components. Following Matillion ETL Components can be used when working with Grid Variables: Table Metadata to Grid: Takes metadata from a table and loads it into a grid variable Remove from Grid: Removes rows from a preexisting grid variable Append to Grid: Adds rows to a preexisting grid variable Query Results to Grid: Queries a table and loads the resulting data into a grid variable

Loading Data

Taking a Data Source and Converting it into a Table on a Target Platform

Task Panel (what does it show)

Task Panel on bottom right shows the last 20 tasks: completed, running, queued, cancelled, or failed since joining the current session In addition to Task Icon, panel also includes: any output from a Python Script Component and Status Information from some Orchestration Components (such as RDS Query) Red x: Task Failed Green check: Task completed successfully Hourglas: Task is queued and will execute when a free thread is available

Viewing Task Information

Task information can be viewed in the Tasks panel at the bottom right Clicking on the right arrow next to the Task (In Task Panel or in Task History) displays further task information in a new tab Additional info includes: Constituent Components Summary of how each Component performed Success and Failure of each Job and Component A fuller report can be found in the Task History

Variable Types (including formats)

Text: Any text string Numeric: Any number (with or without decimals) DataTime: Data and time values in the following formats yyyy-MM-dd yyyy-MM-dd HH:mm yyyy-MM-dd HH:mm:SS yyyy-MM-dd HH:mm:SS.SSS

Calculator Component (Transformation) What happens to pre-existing columns How are columns mapped Variables within Component (how are they managed) Processing: how / what

The Calculator Component adds new columns by performing calculations. Each input row produces one output row. Any pre-existing columns that share a name with a created column will be overwritten. By default, all input columns are mapped into output columns of the same name. You can then use the expression editor to add new columns with a user-defined calculation. It is possible to include Matillion ETL variables when writing expressions in the Calculator component using the syntax ${<VariableName>}. Additionally, if you click Manage Variables, you can manage both Environment Variables and Job Variables. Calculator Component uses target cloud platform processing Properties of Calculator Component Name: Text Include Input Column: Yes/No (yes maps all input columns, no outputs only columns that have been mapped manually with the expressions editor) Calculations: List of expressions calculated per row of data (must be valid SQL) Checking Grid Variables to switch editor and use Matillion ETL grid variable to assign a calculation column and an output column

Create / Edit Environment Wizard

The Create and Edit Environment Panel Show the Same information and each has Three pages Page 1: Cloud Platform Connection Specific cloud platform credentials used by this environment. Matillion uses the instance credentials to discover available resources, but these can be changed (AWS, Azure, GCP) Page 2: Data Warehouse Connection Add the connection and authorizations details for your data warehouse. An environment can only be confugured to use a single data warehouse and is limited by your product version (e.g. Snowflake, Redshift, Azure Synapse, Delta Lake, Big Query) Page 3: Data Warehouses Defaults Specifiy the resources from your data warehouse that Matillion should use bu default

Expression Editors

The Expression Editor allows the user to create expressions in components The Expression must be a valid SQL expression to compute values The expression editor includes a Function List

Component-level Explicit Schema

This setup can be useful if you want to use the common "Single-schema per environment" setup with the exception that all of those environments can share some of their data available in another schema. For example, source data may be staged into a single schema, which can then be run through to a Test or Production environment as above. To do this, Override the Schema of particular tables. Change from Default Schema to the desired Schema. The Job can run in any environment, but this Table will always use the selected Schema instead of the Default Schema of the Environment

Table Properties (list the properties viewed)

You can see many metadata details for any Table in a singe location In Environments Panel, expand to get to the Tables folder and Right Click on a Table and Click on Properties to View its properties In the Table Properties Menu, you can see: Table Columns and the property of each Column (name, type, primary or unique key, etc.) Properties for the Table itself (listed below the column), including: Table name Schema name Table column metadata Size in MB Distribution Style and Data Skew The Stats Off value, indicating the staleness of the Table Data For external tables, file locations and partition information

Filtering Booleans

You can use boolean statements to filter values Example True False for the Value combined with And and Or conditional statements. You can also use 1 (True) or 0 (False) for values


Related study sets

Chapter 43: Nursing Management: Lower Gastrointestinal Problems, ch.44 liver, pancreas, and biliary tract problems, MedSurg Ch. 61, Lewis Ch. 44, Chapter 43: Nursing Management: Lower Gastrointestinal Problems practice questions, Chapter 59 Chronic N...

View Set

Busc-1B Microeconomics final study guide

View Set

Trauma, Crisis, Disaster, and Related Disorders - Mental Health

View Set

NJ Laws, Rules & Regulations Common to All Lines

View Set

File Systems/File Management Quiz

View Set

7. Motsvarigheter till svenskans "det" (Choose: Write)

View Set