Data Quality Specialist Certification
GroupKey
A key that is assigned to groups made from the string, soundex or NYSIIS strategy. The field is the same Groupkey that was passed into the match transformation.
Standardization
Addresses issues identified through profiling on correct completeness, conformity, consistency, enhance and enrich information
Transformation Pallet
All transformations displayed in scroll bar on the left hand side
Java Transformation
Allows Java syntax to be used in Informatica
Group Key
Allows records to be compared that only have the same group key, ultimately reducing pairs and improving performance
Detailed Overflow
Allows users to see detail on which column overflowed
Profiling
Analysis of the content and structure of data
Enrich
Append bad data
Reverse Hamming
Calculates a match score by calculating the number of positions in which characters differ between data strings reading from right to left. Use this when character position is a critical factor such as telephone numbers, zip codes or product codes. Works well on strings of same length.
Hamming Distance
Calculates a match score by calculating the number of positions in which characters differ between data strings. Use this when character position is a critical factor such as telephone numbers, zip codes or product codes. Works well on strings of same length.
Bigram
Calculates a match score by dividing the number of matched character pairs by the total number of character pairs. Used for long text strings such as postal addresses
Match Transformation
Calculates the degree of similarity between input records and creates scorecard
Jaro Distance
Calculates the similarity between two values if the first four characters are the same. If the first four do not match score goes down by penalty property
Case Transformation
Changes case of data
Classifier Transformation
Classifies or labels according to classify models
Merge Transformation
Concatenates values
Token Parser
Consists of individual strings that match token sets or reference tables
Human Task
Contains steps that require human input to complete and can be used to fix bad exceptions. Helps users participate in the business process
Key Creation NYSIIS
Converts a word into its phonetic equivalent
Labeler Transformation
Creates labels that describes characters or strings in each field
Association Transformation
Creates links between records assigned to different match clusters so they can be associated for consolidation
Consolidation Transformation
Creates single consolidated record from cluster of matched records
What types of patterns can be used while parsing?
Customer patterns, reference tables and content set patterns
Mapping
Defines the flow and transformation of data
Write Permission
Delete or edit the project
Identity Matching
Delivers next generation linguistic and statistical matching algorithms and enables business user to deliver accurate matches. Emulates a human experts ability to determine a match and Delivers highest possible reliability
Edit Distance
Derives match score for two values by calculating minimum cost of transforming one string into another by insert, update, delete, replace
Classic Match Strategies
Edit Distance, Jaro Distance, Bigram, Hamming Distance
Decision Transformation
Evaluates conditions and creates outputs based on those conditions
Comparison Transformation
Evaluates similarity between pairs of input values
Address Validator Transformation
Examines input addresses and outputs corrected address elements and validation information
Read Permission
Execute rules, export data and export rules
Grant Permission
Execute rules, export data, export rules and contains the ability to assign others tasks
Parser Transformation
Fields with multiple information type and create new output fields for each type
Key Creation Soundex
Generate alphanumeric code based on how a word sounds
Workflow
Graphical representation defining a business process comprised of steps
Projects
Highest level container that organized objects and processes also containing nested folders
Link ID
Identifies the record that pulled the current record in question into the cluster. This does not have to be a master record
Ports
Input or output fields
Joiner Transformation
Joins heterogeneous sources
Patterns are created in this transformation to be used in other transformations?
Labeler
Lookup Transformation
Looks up values and passes them to other objects
Gateway
Makes decisions to split and merge paths in a workflow
Content Set
Model Repository Object used to store reusable content
Unparsed Output
Not matches found when parsing
Token Parser Disadvantages
Not sensitive, works better on unstructured data, output types need to be well defined, Tries to use the first reference table, If a value is already parse it overflows rather than parsing to subsequent tables
Cluster Size
Number of records in a cluster, records that don't match with anything end up in cluster sizes of 1
Cluster ID
Number that signifies records that match with each other will be assigned to the same cluster
Transformation
Objects used to transform data
Strategy
Operation that is applied to data in a transformation
Key Generator
Organized records into groups based on data values ahead of matching
Overflow
Parsed but could not be send to an output port
Pattern Parser
Parses multiple strings to form patterns
Task Performers
People who correct bad records
Aggregator Transformation
Perform aggregate calculations
Expressions Transformation
Performs low level calculations
Union Transformation
Performs union all join between two data streams
Threshold Value
Picks how exact or close the match of two rows will be up to 1 which is a perfect match
Custom Data Transformation
Processes data in unstructured / semi-structured file formats
Cheat Sheets
Provides step by step guide for a process or transformation configuration guide
Token Parser Advantages
Quick to configure, Standardize as it parses, Multiple outputs to the same output, Reverse parse, Append reference tables
Read Transformation
Reads from Logical Data Objects
Weighted Average Transformation
Reads scores generated by comparison and generates weighted values based on inputs
Driver ID
Record in a cluster that is determined to be the driver or master record
Logical Data Object
Regular object combining the can combine data sources behind the scenes and provide one view
Model Repository
Relational database stores metadata for projects and folders
Standardizer Transformation
Removes noise and creates standardized values
Cleanse and Transform
Removes values, Alters values and creates new values
Mapplet
Reusable predefined process containing transformations
How can you write patterns to a reference table or content set in order to be reused?
Right click on a pattern and select send to reference table or new data domain
Router Transformation
Routes rows conditionally
Mapping Task
Runs a mapping
Data Viewer
Runs data through mapping and displays it.
Command Task
Runs single shell command, if successful returns 0 if not returns error number
Task
Runs unit of work in the workflow
Link Score
Score between a record in question and the record that pulled the record in question into the cluster
Driver Score
Score calculated between a record and the Driver record
Notification Task
Sends out a notification or email to users
Rank Transformation
Sets the condition for rows included in a rank
Properties
Shows configurations for transformations or mapplets / mappings
Outline
Shows dependencies of an object, this can be a transformation or rule
Object Explorer
Shows model repository, projects, folders and data objects
Match Cluster Analysis
Shows statistics on the output from the match transformation, consisting of Cluster ID, Groupkey, Clustersize, Row ID, Driver ID, Driver Score, Link ID, Link Score
Classic Matching
Specify what field you want to apply along with the algorithm. Requires standardized inputs
Key Creation String
String first or last number of characters
Key Creation Types
String, Soundex, NYSIIS
Update Strategy Transformation
Tags rows as insert update delete or reject
What is required when using a pattern parser?
The input to the transformation must be the output from the labeler transformation
Distinct Values
Unique values within a column
Filter Transformation
Used as condition statement for inclusion
Exception Transformation
Used to create table required for exception and duplicate management process
Sorter Transformation
Used to sort data
Assignment Task
Value to user defined workflow variable
Row ID
Variation of Sequence ID passed into the Match Transformation
Mainline Editor
Where mappings, profiles, workflows are created. Where transformations are placed
Write Transformation
Writes to Logical Data Objects
Can a Token Parser standardize while parsing?
Yes
Classifier
ability to classify data based on classification model stored as part of a content set
Sequence flow
connects workflow objects to specify the order that the Data Integration Service (DIS) runs the objects
Data Quality Precision
sum of ports plus their join characters