Data Quality Specialist Certification

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

GroupKey

A key that is assigned to groups made from the string, soundex or NYSIIS strategy. The field is the same Groupkey that was passed into the match transformation.

Standardization

Addresses issues identified through profiling on correct completeness, conformity, consistency, enhance and enrich information

Transformation Pallet

All transformations displayed in scroll bar on the left hand side

Java Transformation

Allows Java syntax to be used in Informatica

Group Key

Allows records to be compared that only have the same group key, ultimately reducing pairs and improving performance

Detailed Overflow

Allows users to see detail on which column overflowed

Profiling

Analysis of the content and structure of data

Enrich

Append bad data

Reverse Hamming

Calculates a match score by calculating the number of positions in which characters differ between data strings reading from right to left. Use this when character position is a critical factor such as telephone numbers, zip codes or product codes. Works well on strings of same length.

Hamming Distance

Calculates a match score by calculating the number of positions in which characters differ between data strings. Use this when character position is a critical factor such as telephone numbers, zip codes or product codes. Works well on strings of same length.

Bigram

Calculates a match score by dividing the number of matched character pairs by the total number of character pairs. Used for long text strings such as postal addresses

Match Transformation

Calculates the degree of similarity between input records and creates scorecard

Jaro Distance

Calculates the similarity between two values if the first four characters are the same. If the first four do not match score goes down by penalty property

Case Transformation

Changes case of data

Classifier Transformation

Classifies or labels according to classify models

Merge Transformation

Concatenates values

Token Parser

Consists of individual strings that match token sets or reference tables

Human Task

Contains steps that require human input to complete and can be used to fix bad exceptions. Helps users participate in the business process

Key Creation NYSIIS

Converts a word into its phonetic equivalent

Labeler Transformation

Creates labels that describes characters or strings in each field

Association Transformation

Creates links between records assigned to different match clusters so they can be associated for consolidation

Consolidation Transformation

Creates single consolidated record from cluster of matched records

What types of patterns can be used while parsing?

Customer patterns, reference tables and content set patterns

Mapping

Defines the flow and transformation of data

Write Permission

Delete or edit the project

Identity Matching

Delivers next generation linguistic and statistical matching algorithms and enables business user to deliver accurate matches. Emulates a human experts ability to determine a match and Delivers highest possible reliability

Edit Distance

Derives match score for two values by calculating minimum cost of transforming one string into another by insert, update, delete, replace

Classic Match Strategies

Edit Distance, Jaro Distance, Bigram, Hamming Distance

Decision Transformation

Evaluates conditions and creates outputs based on those conditions

Comparison Transformation

Evaluates similarity between pairs of input values

Address Validator Transformation

Examines input addresses and outputs corrected address elements and validation information

Read Permission

Execute rules, export data and export rules

Grant Permission

Execute rules, export data, export rules and contains the ability to assign others tasks

Parser Transformation

Fields with multiple information type and create new output fields for each type

Key Creation Soundex

Generate alphanumeric code based on how a word sounds

Workflow

Graphical representation defining a business process comprised of steps

Projects

Highest level container that organized objects and processes also containing nested folders

Link ID

Identifies the record that pulled the current record in question into the cluster. This does not have to be a master record

Ports

Input or output fields

Joiner Transformation

Joins heterogeneous sources

Patterns are created in this transformation to be used in other transformations?

Labeler

Lookup Transformation

Looks up values and passes them to other objects

Gateway

Makes decisions to split and merge paths in a workflow

Content Set

Model Repository Object used to store reusable content

Unparsed Output

Not matches found when parsing

Token Parser Disadvantages

Not sensitive, works better on unstructured data, output types need to be well defined, Tries to use the first reference table, If a value is already parse it overflows rather than parsing to subsequent tables

Cluster Size

Number of records in a cluster, records that don't match with anything end up in cluster sizes of 1

Cluster ID

Number that signifies records that match with each other will be assigned to the same cluster

Transformation

Objects used to transform data

Strategy

Operation that is applied to data in a transformation

Key Generator

Organized records into groups based on data values ahead of matching

Overflow

Parsed but could not be send to an output port

Pattern Parser

Parses multiple strings to form patterns

Task Performers

People who correct bad records

Aggregator Transformation

Perform aggregate calculations

Expressions Transformation

Performs low level calculations

Union Transformation

Performs union all join between two data streams

Threshold Value

Picks how exact or close the match of two rows will be up to 1 which is a perfect match

Custom Data Transformation

Processes data in unstructured / semi-structured file formats

Cheat Sheets

Provides step by step guide for a process or transformation configuration guide

Token Parser Advantages

Quick to configure, Standardize as it parses, Multiple outputs to the same output, Reverse parse, Append reference tables

Read Transformation

Reads from Logical Data Objects

Weighted Average Transformation

Reads scores generated by comparison and generates weighted values based on inputs

Driver ID

Record in a cluster that is determined to be the driver or master record

Logical Data Object

Regular object combining the can combine data sources behind the scenes and provide one view

Model Repository

Relational database stores metadata for projects and folders

Standardizer Transformation

Removes noise and creates standardized values

Cleanse and Transform

Removes values, Alters values and creates new values

Mapplet

Reusable predefined process containing transformations

How can you write patterns to a reference table or content set in order to be reused?

Right click on a pattern and select send to reference table or new data domain

Router Transformation

Routes rows conditionally

Mapping Task

Runs a mapping

Data Viewer

Runs data through mapping and displays it.

Command Task

Runs single shell command, if successful returns 0 if not returns error number

Task

Runs unit of work in the workflow

Link Score

Score between a record in question and the record that pulled the record in question into the cluster

Driver Score

Score calculated between a record and the Driver record

Notification Task

Sends out a notification or email to users

Rank Transformation

Sets the condition for rows included in a rank

Properties

Shows configurations for transformations or mapplets / mappings

Outline

Shows dependencies of an object, this can be a transformation or rule

Object Explorer

Shows model repository, projects, folders and data objects

Match Cluster Analysis

Shows statistics on the output from the match transformation, consisting of Cluster ID, Groupkey, Clustersize, Row ID, Driver ID, Driver Score, Link ID, Link Score

Classic Matching

Specify what field you want to apply along with the algorithm. Requires standardized inputs

Key Creation String

String first or last number of characters

Key Creation Types

String, Soundex, NYSIIS

Update Strategy Transformation

Tags rows as insert update delete or reject

What is required when using a pattern parser?

The input to the transformation must be the output from the labeler transformation

Distinct Values

Unique values within a column

Filter Transformation

Used as condition statement for inclusion

Exception Transformation

Used to create table required for exception and duplicate management process

Sorter Transformation

Used to sort data

Assignment Task

Value to user defined workflow variable

Row ID

Variation of Sequence ID passed into the Match Transformation

Mainline Editor

Where mappings, profiles, workflows are created. Where transformations are placed

Write Transformation

Writes to Logical Data Objects

Can a Token Parser standardize while parsing?

Yes

Classifier

ability to classify data based on classification model stored as part of a content set

Sequence flow

connects workflow objects to specify the order that the Data Integration Service (DIS) runs the objects

Data Quality Precision

sum of ports plus their join characters


Kaugnay na mga set ng pag-aaral

Digestive organs and their function

View Set

Physical Science Chapter 4.3 Modern Atomic Theory

View Set

End of Chapter Questions for Exam 2

View Set