RPA
How to prepopulate variable values into Manual Task fields
Use Default Value field in Answer. Refer to value as ${question.data['variable']}
Feature Extractors
analyze tokens in documents and create features.
Is it possible to extract table data from documents
No. Tables are not supported
Go over a random post processing lesson
Post-Processing - AutoML - WorkFusion Knowledge Base
"intelligent automation"
"software robots that can perform both deterministic and non-deterministic tasks by continuously understanding and analyzing structured and unstructured data." Note these three critical terms:
The requirement to correct numbers that were wrongly recognized by OCR refers to the following requirement: a. completion
. normalization
Can original PDF be shown in Information Extraction Manual Task
1. Can in separate window 2. Can in same UI form near area to extract
If Threshold(streaming) option is equal to 5 Tasks. For the 2nd BP step it means:
5 Records must be completed on the 1st BP step, only after that they are routed to the 2nd step +
Disaster Recovery / DR
A process for preserving or restoring business continuity in the event of a disaster that destroys part or all of a business's resources, including IT equipment, data records, and the physical space of an organization.
Initial Process Assessment / IPA
A quick study of process to asses technical feasibility and commercial viability before deciding for RPA deployment.
S3
Accesses and manages data on Amazon S3 storage. <s3-get> <s3-list> <s3-delete> <s3-put> <s3-copy>
Business Process / BP
An activity or set of activities that will accomplish a specific organizational goal.
Go over random automl sdk configuration example
AutoML SDK Configuration - AutoML - WorkFusion Knowledge Base
Basic Sprint plan when implementing Automation
Automation plan, Kickoff meeting, Design & Build, User Acceptance Testing (UAT) & Production, Roll out
In manual tasks you have the top two bot cases
Classification Bot Task, IE cognitive Bot task
If we have two classes, cats and dogs, and we know that "Snoopy the cat" is a false negative object, then:
Dogs are Positives, Cats are Negatives +
MLE mostly works with Data Scientist:
False +
You need to have SME manually enter title and description about products. What Answer Types will you use in Manual Task
File upload, Free text
How does Taxonomy Answer Type get data?
From Data Store +
How to setup process to effectively fetch 30 attributes from scanned PDF
Have OCR process to convert PDF and then rout to Manual Task with Information Extraction
automation tag
Interact with VDS Services. <automation-start-eval> <automation-get-eval-result> <automation-extract>
IDC stands for
International Data Corporation
Who is responsible for the compliance of final P/R results with customer's requirements?
MLE +
End-user computing / EUC
Systems in which non-programmers can create working applications. EUC is a group of approaches to computing that aim to better integrate end users into the computing environment. Read more: https://en.wikipedia.org/wiki/End-user_computing
Data Analyst gets all the documents tagged by customer and decides which of them can be used for the model training:
True +
UML stands for
Unified Modeling Language
"Information extraction" type of answer is
Used for extracting structured content from raw unstructuted text
Analytics
WorkFusion Analytics comprehensive dashboards present a reporting mechanism that aggregates and displays metrics and key performance indicators (KPIs). Dashboards help improve decision making by revealing and communicating in-context insight into business performance. Related components: Control Tower, WorkSpace.
Does WorkFusion provide internal version control for Bot Task?
Yes, it's possible to see versions and restore a particular version +
20. Which protocol is used to communicate between WorkFusion and S3?
a. HTTPS +
9. Does the WorkFusion instance validate the script code syntax/show errors right after you save the code?
a. No +
True negative is:
correct answer is not given
The main goal of Data Analyst is to:
ensure the high quality of a training set
Data Analyst is fully responsible for writing post-processing:
flase
Support Vector Machine
is a binary classification algorithm. Being given a set of objects belonging to two classes it build a function accordingly in an n-dimensional feature space
scatterplot matrix
look up
If dates in documents are presented in different formats, they should be:
normalized to the format that was chosen with a customer
In Information Extraction case, true positives are:
not-empty values that were extracted correctly +
High-variance models make mistakes by
overfitting to the idiosyncrasies of the training data. They tend to be wrong in inconsistent ways.
file Reads and writes content of the file or search directory for specified files.
<file action="file_action" path="file_path" type="file_type" charset="charset_of_text_file" listdirs="listdirs" listfiles="listfiles" listrecursive="listrecursive" listfilter="listfilter"> body defining content of the file if action="write" or action="append" ' </file>
ftp Creates FTP connection and executes some of valid ftp-based operations against the server: ftp-list,ftp-get, ftp-put, ftp-del, ftp-mkdir, ftp-rmdir.
<ftp server="server" port="port" username="username" password="password" account="account" remotedir="remotedir"> [<ftp-list path="path" listfiles="listfiles" listdirs="listdirs" listlinks="listlinks" listfilter="listfilter"/>]* [<ftp-get path="path"/>] [<ftp-put path="path" charset="charset"> content to save </ftp-put>] [<ftp-del path="path"/>] [<ftp-mkdir path="path"/>] [<ftp-rmdir path="path"/>] </ftp>
Optical Character Recognition / OCR
A technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data. Read more:https://www.abbyy.com/en-us/finereader/what-is-ocr/https://en.wikipedia.org/wiki/Optical_character_recognition
Application Development Automation
A technology/methodology that increases development process speed, limits knowledge dissipation, ensures build quality and does not require developers to perform a large number of manual actions.
Finance and Accounting / FnA
Accounting focuses on the day-to-day flow of money in and out of a company or institution, whereas finance is a broader term for the management of assets and liabilities and the planning of future growth. Read more: https://smartasset.com/investing/the-difference-between-accounting-and-finance
The omitted metric in Information Extraction is:
Accuracy +
The client gave you a set of 50 photos where 20 photos contain road signs and 30 photos do not. The model decided that 25 photos contain road signs: among them, 10 photos actually contain road signs and 15 do not. What is the model quality?
Accuracy 50% + Precision 40%, Recall 50% +
You need to have Subject Matter Expert to validate and react on what Organization's internal manual worker did. How would you design such Business Process
Add Moderation Flow
Elements
Elements are represented as Tokens, Sentences, Named Entities, or Entity Boundary Elements and are used for further content analysis.
VDS support is Data Scientist's responsibility:
False
MLE never participates in getting ML statistics and the model results analysis, it's a DA's responsibility:
False +
The following steps take place during model training with a checker model:
Feature extraction. Cross-validation.TrainClassify Raw statistics calculation. Checker model training.Use features, scores from cross-validation and correctness information of IE model as input training data to checker model. Checker model statistics calculation. Applying Post-Processors.The first Post-Processor should apply the checker model and filter results. Processed statistics calculation. Estimated accuracy model training.
Document Formats and where they would go
Format Approach HTML / XMLCan be directly processed with AutoMLPlain textCan be directly processed with AutoMLImage (.jpeg, .tiff, .png, .gif)Send to OCRPDFSend to OCRExcel (.xls, .xlsx)Convert to HTMLWord Convert to HTML / plain text or convert to PDF and send to OCR Email (.msg)Work with email body, as with plain text
To configure a dictionary NER Annotator, follow these steps:
1. Create a dictionary file, and then put it to the [project-dir]/ main/resources/ project folder. 2. Define a dictionary reader. 3. Add a NER Annotator.
Database
Database stores business process definitions, runtime and tracking information on their executions — including RPA and Cognitive automations. It is a reliable storage of the designed automation implementation. Related components: Control Tower, WorkSpace, Analytics.
What features do WorkFusion Bot Task components provide (via Web Harvest)?
a. Dynamically split Business Process records b. Reusing of code with functions d. Calls to SOAP services e. Launching of another Business Process g. Passing of the calculated values to the next step of the Business Process h. Working with internal file storage i. Working with external Databases j. Calls to RESTful services k. Conversion of scanned images to text m. List to MS Excel conversion
WorkSpace
WorkSpace is the web application where Subject Matter Experts complete manual tasks. It allows to perform the following actions: Manage manual tasks queue Fulfill manual tasks Related components: Control Tower, Object Storage.
Workflow
Workflow Engine executes automation processes authored in Control Tower according to process definition. This is the server application configured within Control Tower. Execution is handled in different computing environments—RPA, WorkSpace, OCR, and AutoML. Related components: Control Tower, RPA, WorkSpace, AutoML.
UML (explain what it is)
____ is a software modeling process that enables system developers to create a blueprint showing the overall functionality of the program being engineered and provides a way for the client and the developer to communicate. a. SDLC c. UML b. Scope creep d. Prototyping
data-value
a gold tag attribute that will be returned as extraction result (what a user will see, if s/he tries to save extraction results as json or csv). May be changed in a manual task or by post-processing as final result does not necessarily equal to the tagged string. Note that ML will be focused on tagged string, not data-value. On the picture above you see data-value="24.20". Below is the case when the data-value is not equal to the tagged text.
Training set is
a set of documents (invoices, e-mails, contracts, etc.) used for training an ML model. For Information Extraction, a training set consists of documents which contain gold values (the values surrounded by special tags) for a set of fields.
25. Check the true statements about Bot Task scripting
a. All Groovy and BeanShell variables automatically become available in the next step of the Business Process X b. All Groovy variables that are defined in script block 1 are available in script block 2 X c. All BeanShell variables that are defined in script block 1 are available in script block 2 + d. Multiple script blocks are allowed in the Bot Task + e. The default script language in WorkFusion is Groovy X
As regards Business Process, MLE is responsible for:
a. Automation BP configuration b. making sure the BP has the latest version c. checking BP's compatibility with corresponding model d. ML environment is up and running
How do you code with the Java language in WorkFusion?
a. Code both Java classes and Bot Task in WorkFusion Studio, then deploy the bundle in the WorkFusion instance. Finally, place the deployed Bot Task in the Business Process + d. Code the Java classes in WorkFusion Studio then deploy the bundle in the WorkFusion instance, and add import. Now you will be able to instantiate a custom class object +
16. What are the ways in which external Java libraries (JARs) can be used in the WorkFusion instance?
a. Compile the bulk Machine Config Bundle with library included and deploy the bundle via WorkFusion Repository + b. Download and add the library to the classpath using the Bot Task code in runtime +
How is it possible to include RPA Bot into Business process?
a. Create new Bot from empty Use Case, open and code in Control Tower + b. Use Intelligent Automation Cloud Recorder to record and deploy RPA Bot into BP + c. Find in Out Of The Box section of Bots and pull into BP + d. Use RPA Studio do design and test RPA Bot and copy-paste code into empty Bot within BP +
How can you improve the OCR results?
a. Don't accept mobile pictures that are taken with no flash in the darkness b. Request high DPI images from the customer c. Avoid handwritten data in documents d. Use ImageMagic or GhostScript for preprocessing e. Remove garbage using the OCR feature f. Convert to monochrome g. Use a custom dictionary h. Verbose problems are logged to provide a better understanding of recognition problems i. Exclude incorrect region types j. Use a searchable PDF document instead of a high DPI image-based one
18. What are the OCR features in WorkFusion?
a. Elastic scalability of OCR Agents in cloud setup + d. RESTful API to integrate the Bot Task with the available OCR cluster + e. On-site OCR setup is available + f. OCR forms part of the Intelligent Automation Cloud product +
How to see exceptions in bots if any occur in runtime?
a. Exceptions popup is accessible from BP list via information icon within failed BP run + c. Exceptions popup is accessible from BP run View Results Summary via information icon +
What data is available via the Bot Task context variables?
a. Name of the current Business Process b. Server date and time c. All input data d. Unique ID of the current Business Process execution e. Name of current Bot Task f. Result Data from the previous executions of this Business Process
Does the WorkFusion instance validate the script code syntax/show errors before you save the code?
a. No +
In order to check whether several SMEs have the same understanding what should be tagged and where without manual checking, DA uses:
a. Qualification Task X b. Adjudication Rule c. DA Training quizzes d. DA Training assignments e. all the answers are correct X f. correct answer is not given X Figure out the answer
Which approaches does WorkFusion use on its ML use cases?
a. Rule-based b. Regexp c. ML d. Combined
What is the purpose of Split Data technique in Business Process?
a. To design dynamic data flow for rows extracted from DataStore. When we need to execute next BP Tasks per extracted data row + b. To design dynamic data flow for rows extracted from Excel file taken from S3 file storage. When we need to execute next BP Tasks per extracted data row + c. To design dynamic data flow for rows extracted by RPA Bot. When we need to execute next BP Tasks per extracted data row +
21. Is it possible to reach 100% automation for web scraping using Bot Tasks?
a. Yes +
24. Can S3 communication be tested from WorkFusion Studio? find the right answer
a. Yes, but first the credentials for S3 should be entered in the Bot Task run/debug configuration b. Yes, just launch Bot Task configuration X c. S3 Emulator communication can be tested, but Amazon S3 cannot X d. Yes, but the credentials for S3 should first be set in Studio Preferences X e. No, it can only be tested in the WorkFusion instance
The requirement of consistency means:
a. a field must be tagged the same way across all the data set b. a field must be tagged in the same context across all the data set
Support Vector Machine is:
a. binary classification algorithm + b. supervised learning algorithm +
OCR quality of the training set can be improved:
a. by SMEs during the tagging process if possible b. by DA during the process of data-set check and improvement c. by MLE during the post-processing implementation
While working on training set preparation before the model training, MLE is responsible for:
a. consulting Data Analyst on technical side if required b. checking the training set quality c. giving recommendations how the training set could be improved d. creating training set from customer data e. all the answers above are correct +
While working with a customer, DA:
a. gathers requirements b. makes sure that a use-case is clearly described in the documentation c. provides a training to SMEs d. configures a Manual Task e. estimates OCR quality and defines if a customer should provide more documents
31. How do you access the Content-Type response header that is received from a REST API call? (find the correct answer)
a. http.getHeader("Content-Type") b. hit_submission_data_item.getItemValueMap().get("Content-Type") c. headers("Content-Type") X d. http.client.getHeaders("Content-Type") X
As regards Manual Task, MLE is responsible for:
a. its set-up X b. fields description and configuration c. making sure that ML environment is applicable to the current Manual Task d. all the answers above are correct X e. a correct answer is not given X Figure out the correct answer
6. "$" (or any other currency sign) before the field "Total" serves as:
a. part of the amount and should be tagged b. the context for the model and should not be tagged c. both variants are correct d. correct answer is not given find the correct answer
In WorkFusion practice, in Information Extraction case, the objects are classified as "Positives" and "Negatives" on the basis of the following criteria (choose all the correct variants):
a. the field is empty or not + b. information from a field must be extracted or not +
False positive is
an object of class "Negative" that was mistakenly classified as "Positive"
training set +
an object of class "Positive" that was classified correctly +
False negative is:
an object of class "Positive" that was classified incorrectly +
blocknumber
blocknumber attribute is a number denoting the fixed place of the tab in a block, starting from 0.
In WorkFusion practice, SMEs are:
customer's employees whose work will be automated by WorkFusion - normally, they have a deep understanding of use-case documents business logic +
In case some of numbers in the field "Total" were recognized as letters but the initial number can be logically retrieved by a human, Data Analyst normally should:
describe this issue for the future post-processing and ask SMEs to correct data-value manually while tagging
In case the needed text is missed in documents after OCR, Data Analyst should:
exclude the document from the training set
In Information Extraction, true negatives are:
extracted=gold=empty +
Which databases are used under the hood of the WorkFusion Control Tower application in versions 9.x?
f. PostgreSQL + g. MySQL(Percona) +
Data Analyst is fully responsible for the model training:
false
If Gold = "5/12/2017", Extracted = "12/5/2017", the extracted value should be counted in:
false positives, false negatives +
Dimensions in a data set are called
features, predictors, or variables.
Annotators
help to extract structured information from unstructured data. As documents pass through the processing pipeline Annotators analyze words, phrases, named entities in unstructured content, and then create
Creating a Feature Extractor
public class MyFeatureExtractor implements FeatureExtractor { private DateTimeFormatter dateTimeFormatter; @OnInit public void init() { dateTimeFormatter = DateTimeFormatter.ofPattern("dd/MM/yyyy"); } /* The Feature Extractor implementation goes here */ }
Parser
removes HTML tags from documents, creates an AutoML SDK Document.
As regards legal endings in the field "Client name", WorkFusion always requires SMEs to:
tag client's name with legal ending included even if a customer required to do otherwise
One method for making predictions is called a decision trees
which uses a series of if-then statements to identify boundaries and define patterns in the data.
Secrets Vault
Secrets Vault is a secured storage for sensitive data. It allows to: -Store authentication data (login/password) for applications involved in RPA -Securely access applications involved in RPA during business process execution -Store authentication data for WorkFusion Intelligent Automation Cloud inter-component interaction
Semi-structured data
Semi-structured data is not organized into rows and columns, yet contains tags or other information that names and defines the data content, and simplifies searching and processing. XML and JSON are examples of semi-structured data.
If the model training showed low statistics, the first thing to do is:
to check FPs and FNs +
The following example creates Sentence elements between dots.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import java.util.regex.Matcher; import java.util.regex.Pattern; import com.workfusion.vds.sdk.api.nlp.annotator.Annotator; import com.workfusion.vds.sdk.api.nlp.model.IeDocument; import com.workfusion.vds.sdk.api.nlp.model.Sentence; public class SentenceAnnotator implements Annotator<IeDocument> { @Override public void process(IeDocument document) { Pattern pattern = Pattern.compile("\\."); Matcher matcher = pattern.matcher(document.getText()); int index = 0; while (matcher.find()) { // add sentence into document document.add(Sentence.descriptor() .setBegin(index) .setEnd(matcher.start())); index = matcher.end(); } } }
Regardless of the type of Use Case, all Data Analyst responsibilities can be divided into three main stages:
1 Data set collection: The most considerable and important stage makes up more than half of all DA work. Mistakes made and not corrected here may entail serious risks in terms of timelines and achievement of success criteria. As a prerequisite to this stage, DAs should get familiar with different aspects of the project, clearly understand the business logic of the use case, what type is it (Information Extraction / Classification), what fields/classes will be used, document types and formats etc. 2 Model Training and analysis of results: This stage is implemented by close cooperation of DA and ML Engineer. 3 Preparation of the final report: At the end of the project, final results should be presented to the customer.
What OCR Bot configuration options are valid
1 Output format must be specified 2 Range of pages to extract from document can be configured 3 Orientation and skew correction can be automatically applied 4 Areas on pages to include into extraction can be configured
exit Conditionally breaks the configuration execution.
<exit condition='${!sys.isVariableDefined("username")}' message="No username provided!" />
What Is Expected From ML Engineer?
1. Enforce qualitative Training Set as starting point of Machine Learning solution 2. Improve model based on data analysis while following WorkFusion Machine Learning solution workflow 3. Use WorkFusion AutoML components as ML solution 4. Apply AutoML SDK when needed https://kb.workfusion.com/display/VDS/ML+Engineer+Manual
Creating a training set in control tower
1. Go to advanced and click on automation training sets 2. Upload training set data and press save You can also add a tagged csv file
What are the roles that particular databases play in the WorkFusion Control Tower application?
1. It stores all application system data, i.e., BP attributes, all details about BPs that have been run, all transition variables/values between BP steps, etc. - MySQL + 2. Developers are allowed access to store data collected as part of the scope of BP execution - Postgresql +
Tips to setting up and AutoMl SDK Shit
1. Make sure the m2. is set up properly https://kb.workfusion.com/display/VDS/Environment+Configuration 2. https://kb.workfusion.com/pages/viewpage.action?spaceKey=VDS&title=Project+Configuration 3. According to above url generate an archetype 4. Make sure you run maven install https://kb.workfusion.com/pages/viewpage.action?spaceKey=VDS&title=Getting+Started
Save Content on S3 The following config saves the specified fields content on S3 (useful for dealing with *_tagged fields) Overrides the fields content by S3 link Skips the fields which are not present in the input Skips the fields which are already an http link (safe to use the step several times in the flow) Adds proper UTF-8 headers
<?xml version="1.0" encoding="UTF-8"?> <config charset="UTF-8"> <var-def name="variablesToSave"> <loop item="varName"> <list> <tokenize delimiters="|"> document_xml_link_tagged|document_xml_link_tagged_failed </tokenize> </list> <body> <case> <if condition='${sys.isVariableDefined(varName.toString())}'> <var name="varName"/> </if> </case> </body> </loop> </var-def> <loop item="taggedVar"> <list> <var name="variablesToSave"/> </list> <body> <var-def name="content"> <var name="${taggedVar}"/> </var-def> <var-def name="${taggedVar}"> <case> <if condition='${content.toString().startsWith("http")}'> <var name="content"/> </if> <else> <s3 bucket="${conf_invoice_image_bucket}"> <s3-put path="tagged/${java.util.UUID.randomUUID()}-tagged.html" acl="PublicRead" content-type="text/html; charset=utf-8" content-disposition="inline"> <var name='content'/> </s3-put> </s3> </else> </case> </var-def> </body> </loop> <export include-original-data="true"> <loop item="taggedVar"> <list> <var name="variablesToSave"/> </list> <body> <single-column name="${taggedVar}"> <var name="${taggedVar}"/> </single-column> </body> </loop> </export> </config>
call Calls the user-defined function.Syntax
<call name="function_name"> <call-param name="function_name"> body as actual parameter value </call-param> </call>
This is the root element of every configuration file.
<config charset="charset_value" scriptlang="default_script_lang"> configuration body </config>
db-param Specifies database parameter inside database element. Can be used for storing BLOBs (Binary Large OBjects).
<db-param type="param_type"> parameter value </db-param>
function Declares the user-defined function.Syntax
<function name="download-multipage-list"> <return> <while condition="${pageUrl.toString().trim() != ''}" maxloops="${maxloops}" index="i"> <empty> <var-def name="content"> <html-to-xml> <http url="${pageUrl}"/> </html-to-xml> </var-def> <var-def name="nextLinkUrl"> <xpath expression="${nextXpath}"> <var name="content"/> </xpath> </var-def> <var-def name="pageUrl"> <template>${sys.fullUrl(pageUrl, nextLinkUrl)}</template> </var-def> </empty> <xpath expression="${itemXPath}"> <var name="content"/> </xpath> </while> </return> </function> <var-def name="imgLinks"> <call name="download-multipage-list"> <call-param name="pageUrl">http://images.google.com/images?q=harvest&hl=en&btnG=Search+Images&nojs=1</call-param> <call-param name="nextXPath">//a[@shape='rect' and .='Next']/@href</call-param> <call-param name="itemXPath">//img[contains(@src, 'images?q=tbn')]/@src</call-param> <call-param name="maxloops">5</call-param> </call> </var-def>
html-to-xml Cleans up the content of the body and transforms it to the valid XML. The body is usually HTML obtained as a result of http processor execution. Actual parsing and cleaning job is delegated to HtmlCleaner tool. Altough no special tuning is needed in most cases, cleaner may be configured with the several parameters defined with the processor's attributes.
<html-to-xml outputtype="..." advancedxmlescape="..." usecdata="..." specialentities="..." unicodechars="..." omitunknowntags="..." treatunknowntagsascontent="..." omitdeprtags="..." treatdeprtagsascontent="..." omitcomments="..." omithtmlenvelope="..." allowmultiwordattributes="..." allowhtmlinsideattributes="..." namespacesaware="..." prunetags="..." omitxmldecl="..."> body as html to be cleaned </html-to-xml>
Downloads the www.motors.ebay.com page and cleans it up producing pretty-prented XML content.
<html-to-xml outputtype="pretty"> <http url="http://www.motors.ebay.com"/> </html-to-xml>
json-to-xml Converts given JSON content to XML. See also JSON Manipulation Plugins.
<json-to-xml> JSON content </json-to-xml>
loop Iterate through the specified list and executes specified body logic for each item. Result is the list of processed bodies.
<loop item="item_var_name" index="index_var_name" maxloops="max_loops" filter="list_filter" empty="true"> <list> body as list value </list> <body> body for each list item </body> </loop>
mail-attach Adds an email attachment. Can be used only as part of mail processor of html type.Syntax
<mail from="[email protected]" smtp-host="smtp.gmail.com" to="[email protected]" type="html" username="myusername" password="mypassword" security="tsl" subject='Photos from the ...'> Here is me with ... <![CDATA[ <img src="]]> <mail-attach inline="true"> <file path="myphoto1.jpg" type="binary"/> </mail-attach> <![CDATA[ "> ]]> And this is ... <![CDATA[ <img src="]]> <mail-attach inline="true"> <file path="myphoto2.jpg" type="binary"/> </mail-attach> <![CDATA[ "> ]]> </mail>
mail Sends an email.Syntax
<mail smtp-host="smtp server" smtp-port="smtp server port" type="content type" from="sender" reply-to="reply-to header" to="to" cc="cc" bcc="bcc" subject="subject" charset="charset" username="smtp username" password="smtp password" security="smtp security type"> mail content with optional attachments (mail-attach elements) </mail>
database Execute query against database. JDBC driver library file(s) should be provided on the classpath if used programatically, or on the same path with Web-Harvest executable if used standalone. In case of SELECT sql statement, it returns list of row objects. They can be accessed with special accessor methods:
<mydbrow>.getColumnCount() - returns number of columns returned. <mydbrow>.getColumnName(index) - returns name for column number. <mydbrow>.get(column_index) - returns field value for column number. <mydbrow>.get(column_name) - returns field value for column name.
regexp Searches the body for the given regular expression and optionally replaces found occurrences with specified pattern. If body is a list of values then the regexp processor is applied to every item and final execution result is the list.
<regexp replace="true_or_false" max="max_found_occurrences" flag-canoneq="flag-canoneq" flag-caseinsensitive="flag-caseinsensitive" flag-dotall="flag-dotall" flag-multiline="flag-multiline" flag-unicodecase="flag-unicodecase"> <regexp-pattern> body as pattern value </regexp-pattern> <regexp-source> body as the text source </regexp-source> [<regexp-result> body as the result </regexp-result>] </regexp>
return Returns value from the user-defined function.
<return> body as return value </return>
text Converts embedded value to the string representation.
<text charset="charset" delimiter="delimiter"> wrapped body </text>
tokenize Splits given text to elements (tokens).
<tokenize delimiters="delimiters" trimtokens="trimtokens" allowemptytokens="allowemptytokens"> content to tokenize </tokenize>
var Returns value of defined variable. Throws an exception if variable is not defined.
<var name="variable_name"/> Example <var-def name="searchEngine">google</var-def> <var-def name="${searchEngine}Content"><http url="http://www.${searchEngine}.com"/></var-def> <file action="write" path="data/${searchEngine}_content.html"> <var name="${searchEngine}Content"/> </file>
empty Wraps execution sequence and returns empty value. This element is used in situations when execution result is needless.
<var-def name="amazonContent"> <empty> <http url="http://www.amazon.com" /> </empty> </var-def>
This example defines the variable digitList which is the sequence of 9 values (digits from 1 to 9), and 10 simple variables digit1, digit2, ..., digit9 with values ranging from 1 to 9.
<var-def name="digitList"> <while condition="true" index="i" maxloops="9"> <var-def name="digit${i}"><template>${i}</template></var-def> </while> </var-def>
try-catch Wraps execution and for any recoverable exception returns default value without crashing the whole process.
<var-def name="reportText"> <try> <body> <file path="data/report.txt"/> </body> <catch> No report file! </catch> </try> </var-def>
var-def Defines new or overrides existing variable with specified name and value.
<var-def name="variable_name" overwrite="overwrite_existing"> body as value of the variable </var-def> where overwrite attribute is not required
xml-to-json Converts given XML content to JSON.
<xml-to-json> XML content </xml-to-json> <var-def name="outputLink"> <s3 bucket="vr1677"> <s3-put path="aharhots/my_super_file.csv" acl="PublicRead" content-type="text/csv; charset=utf-8" content-disposition="inline"> <list-to-csv> <json expression="$.row"> <xml-to-json> <datastore name="ds_test"> select * from @this; </datastore> </xml-to-json> </json> </list-to-csv> </s3-put> </s3> </var-def>
xpath webharvest tag
<xpath expression="xpath_expression"> body as xml </xpath>
xquery Uses an XQuery language expression to query an XML document.
<xquery> [<xq-param name="xquery_param_name" [type="xquery_param_type"]> body as xquery parameter value </xq-param>] * <xq-expression> body as xquery language construct </xq-expression> </xquery>
xslt Applies XSLT transformation to the XML document
<xslt> <xml> body as xml </xml> <stylesheet> body as xsl </stylesheet> </xslt>
zip Creates a ZIP archive by compressing inner content defined by zip-entry elements. To unzip, use the unzip plugin.
<zip> ... [<zip-entry name="name" charset="charset"> entry content </zip-entry>]* ... </zip>
Accuracy
= tp+tn/tp+tn+fp+fn
Precision
= tp/tp+fp Precision (calculated per class only) shows how exact is classification: among all the objects defined by the model as red squares, which of them actually belong to this class? It's calculated the following way:P = Correctly classified red squares/Total number of objects classified as red squares (the same for blue circles or any other class).
Recall
=tp/tp+fn Precision (calculated per class only) shows how exact is classification: among all the objects defined by the model as red squares, which of them actually belong to this class? It's calculated the following way:P = Correctly classified red squares/Total number of objects classified as red squares (the same for blue circles or any other class).
Classification
A Classification use case is applied when it is necessary to define the class for the item (document). By class, we usually mean different document types. For example invoices, purchase orders and claims are processed in one workflow and each document type is handled differently. That means we need to classify these documents first before applying automation.
Natural language processing/ NLP
A branch of artificial intelligence that deals with the interaction between computers and humans by using the natural language, such as speech or signing.
OnDestroy
A method annotated with OnDestroy follows these conventions: Called only once when the information extraction process is fully complete. Used to release resources by removing objects/references, that the OnDestroy method is holding. Method is optional.
OnDocumentComplete
A method annotated with OnDocumentComplete follows these conventions: Called only once for a whole Document to release the resources, while the extract() method is called for all Tokens in a Document. Method is optional.
OnDocumentStart
A method annotated with OnDocumentStart follows these conventions: Called only once for each document before extract() method. Accepts a Document and element type (Token by default) to perform feature extraction. Method is optional. Typical usage: If the feature extraction logic requires to analyze large structures or entire documents for each Token, it is a good practice to do pre-calculations on the Document level, and then use it with the extract() method to improve the performance.
OnInit is a method-level annotation for FeatureExtractor used to define a method that should be invoked after the constructor.
A method annotated with OnInit should follow these conventions: Called only once when a FeatureExtractor instance is created. Accepts a Map<String, Object> of parameters. These parameters are context-specific and typically not required for a common Feature Extractor. Accepts a special builder for index caching and type of focus Element, by default — Token. For details, refer to the Indexes and Cache section. Method is optional. A method annotated with OnInit is typically used to initialize regex patterns, add dynamic cached indexes, etc.
Grouping
A model can extract multiple field values of the same type (for example, Currency, Quantity, and Price) that should be joined into a group according to their logical connection. For example, you may need to group Product Name, Amount and Price. Field values can be grouped based on their position in a document or words in a sentence, based on a table line, or based on some underlying custom logic.
Natural Language Generation / NLG
A subfield of artificial intelligence (AI) that produces language as output on the basis of data input.
Information Extraction
An Information Extraction (IE) use case is used when data defined by business logic is taken out (extracted) from documents and processed according to business rules. In terms of Information Extraction, each use case data point is referred as a "field". For example, invoice number, supplier name and quantity of products have to be extracted from all invoices. It means there are three fields (invoice number, supplier name and quantity) to be extracted from documents in this use case.
Intelligent Character Recognition / ICR
An advanced optical character recognition (OCR) or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition levels. Read more: https://en.wikipedia.org/wiki/Intelligent_character_recognition
Deterministic Process
An algorithm, model, procedure, process, etc., whose resulting behavior is entirely determined by its initial state and inputs, and which is not random or stochastic. Processes or projects having only one outcome are said to be deterministic their outcome is 'pre-determined.' A deterministic algorithm, for example, if given the same input information will always produce the same output information. Read more: http://www.businessdictionary.com/definition/deterministic.html
Robotic Process Automation / RPA
An automation technology that allows software robots to use an application's user interface (UI) to mimic human actions without modifying systems or requiring human intervention.
AutoML
AutoML finds the best ML model for a particular target function (expressed in a Manual Task), trains it, validates, and finally predicts the target variables. It allows to perform the following actions: -Search for the best model within hypermodels space -Training, validation of models using data from Manual Tasks -Computing predictions for unseen input data -Visualization of prediction quality Related components: WorkFusion Studio, Control Tower, Object Storage
BIESO is a tagging approach for multi-word named entities where a model tags each token of the named entity with one of the following tags:
B - The first token of a multi-word entity. I - The inner token of a multi-word entity. E - The last token of a multi-word entity. S - A single token entity. O - A non-entity token.
BEP
Bot Execution Platform (BEP) is the platform that creates tasks, submits them to queues, routes to completely isolated workers, and receives the task perfomance results. It dynamically allocates resources depending on the task load. BEP covers the following functionality: -Horizontal scaling of workers -Bot execution isolation -Reuse of the shared resources (cluster nodes) both by CT and AutoML tasks -Task processing fault-tolerance -Aggregation of logs and metrics from tasks and workers Related components: Control Tower, AutoML, Workflow, RPA
mail-check web harvest
Connects to a mail server and checks for new unread emails with a specified subject pattern.
JSON manipulation
Convert JSON string to object, searches in JSON, deletes and adds nodes, changes node values. <json> <json-put> <json-set> <json-add> <json-delete>
conversion
Convert input data to custom format. <convert-date> <convert-price> <convert-number> <convert-json> <convert-percent>
AutoML's Information Extraction feature solves a variety of problems. The main ones can be narrowed down to the following:
Convert scanned documents into digital text documents. Extract structured content from these documents (account ID, amount, currency, etc.). Provide the structured content to some data store. All these steps need to be automated, it means made without or with a minimal human work.
What is deep copy of Business Process?
Deep Copy clones BP definition as well as each Bot and Manual Task to avoid changing original ones +
Managed Services Provider / MSP
Delivers network, application, system and e-management services across a network to multiple enterprises, using a "pay as you go" pricing model. Read more: http://www.gartner.com/it-glossary/msp-management-service-provider
language-extractor
Detects the language of the given website.
Building a supervised ML model usually includes the following steps:
Determine what type of objects should be used for training.In WorkFusion, the most common problems are classification and information extraction. For classification, the object is a pair document-class. For information extraction an object is a pair gold value-field. Gold value is the information that should be extracted. Collect the training set which represents how the function can be used in real world.In WorkFusion use cases, we apply human-in-the-loop approach for supervision and consistent quality. Training sets are usually created by manually labeling the documents for classification and gold values for information extraction. In rare cases, training sets are tagged automatically using the provided historical data. Determine the features that represent the learned function.The model performance quality strongly depends on how the input objects are represented. Typically, the input object is transformed into a feature vector, which contains a number of features describing this object. The number of features should not be too large but should contain enough information to predict objects correctly. In WorkFusion's out-of-the-box models, features are defined automatically while in custom models they are created manually. Determine the structure of the learned function and corresponding learning algorithm. Complete the development phase (not required with WorkFusion out-of-the-box models).Train the learning algorithm on the available training set and produce a model. Evaluate the performance of the learned model.Following the training process, we need to evaluate the performance of the resulting model. In order to ensure the objectivity of evaluation, we recommend using a test set that was not part of the training set.
Dictionary NER Annotators
Dictionary NER Annotator creates Named Entity Elements based on word lists from dictionaries. AutoML SDK allows to re-use out-of-the-box dictionary-matching algorithm — Aho-Corasick to add and configure a dictionary for each Named Entity mention type that should be annotated. To configure a dictionary NER Annotator, follow these steps: Create a dictionary file, and then put it to the [project-dir]/ main/resources/ project folder. Define a dictionary reader. Add a NER Annotator.
Virtual Desktop Infrastructure / VDI
Environment that allows an organization's information technology professionals to centrally manage thin client machines, leading to a mutually beneficial experience for both end users and IT administrators. Read more: http://www.tricerat.com/resources/topics-library/virtual-desktop-infrastructure-vdi
Proof of Concept / POC
Evidence that establishes an idea, invention, process, or business model is feasible. Read more: http://www.businessdictionary.com/definition/proof-of-concept.html
Data Scientist is responsible for making ML environment work properly:
False
In WorkFusion, training of new ML models is done in two stages:
Hyper Parameter Optimization (HPO) finds the best parameters/Metamodel for the supplied training set Model training itself uses the Metamodel to train the ultimate model
Business Operations Automation
Is related to the technology-enabled automation of complex business processes. It aims to streamline a business for simplicity, achieve digital transformation, improve service quality & delivery, and reduce costs.
Control Tower Control Tower is the central managing web application, which orchestrates automation of processes and tasks, and provides a user interface for power users, administrators, and developers. It allows to perform the following actions:
Manage Automation Processes Create and edit workflows using GUI tools Run and monitor automations Handle input/output data Manage users and roles Manage Manual Tasks Manage Workforce Configure all components of the system
Nexus access
NEXUS - https://repository.workfusion.com/content/repositories/ml-sdk/ login: odf-user password: Workfusion!5
NER Annotators
Named entity recognition (NER) is a subtask of information extraction that searches for words or patterns in the input text, and then classifies Named Entities in text into pre-defined types such as names of persons, organizations, locations, etc. When a NER Annotator finds a proper NER mention in a text Annotator labels its type.
Deep Neural Network / DNN
Neural networks help us cluster and classify. You can think of them as a clustering and classification layer on top of data you store and manage. Think of deep neural networks as components of larger machine-learning applications involving algorithms for reinforcement learning, classification, and regression. Read more: https://deeplearning4j.org/neuralnet-overview
Is everyone who has access to the Control Tower able to delete existing credentials in the Secrets Vault?
No
What is nested levels limit in WorkFusion Business Process
No Limit
What is the limit of bot steps in Intelligent Automation Cloud Business?
No Limit
OCR
OCR engine processes scanned documents: converts unstructured, image-based data into structured character documents in several formats (html, xml). Related components: WorkFusion Studio, Control Tower, WorkSpace, Object Storage.
How it is recommended to extract human-typed data from scanned PDF documents?
OCR will fail and Manual Task with field-per-value should help with extraction
Object Storage
Object Storage is a reliable storage of OCR results and trained AutoML models. It keeps binary file type content, such as OCR results, AutoML models, screenshots, and reports generation by automation. Related components: WorkFusion Studio, Workflow, OCR, AutoML.
What types of automations are able to be implemented using Bot Tasks?
Object automation Restful API integration web scraping surface automation direct interactions with customer database store the calculated data in work fusion internal storage
alexa
Obtains Alexa rank of the given website.
Full Time Equivalent / FTE
One FTE is equivalent to one employee working full-time. Read more: http://www.businessdictionary.com/definition/full-time-equivalent-FTE.html
What is a correct type of use case for invoice tagging Manual Task
Other / Miscellaneous Tasks
If Gold contains 100 not empty values and 0 empty values, the model extracted 80 and 60 of them were extracted correctly, the statistics will look as follows:
Precision 75%, Recall 60% +
Is it possible to see a Groovy variable value in runtime on a WorkFusion instance?
b. Yes +
Secrets Vault webharvest
Provide functionality to manage Secrets Vault. <secrets-vault-put> <secrets-vault-get> <secrets-vault-delete> <secrets-vault-update> <secrets-vault-reset>
In order to check whether SMEs correctly understand what should be tagged and where and can do it with a required level of accuracy, DA usually uses:
Qualification Task +
excel to-from list
Reads data from xlsx or xls formats and writes to List<Map<String, Object>> [row:[columnName:value]] and vice-versa. <excel-to-list> <list-to-excel>
Reconciliations
Reconciliation is an process that uses two sets of records to ensure figures are accurate and in agreement. Reconciliation is the key process used to determine whether the money leaving an account matches the amount spent, ensuring the two values are balanced at the end of the recording period. Read more: http://www.investopedia.com/terms/r/reconciliation.asp#ixzz4dJv4BqpK
There are two WorkFusion Perspectives
Recorder and Code
How to show Information Extraction task with previously extracted values
Refer document with tags in Default Value field as ${question.data['document_variable']}
Order to Cash / O2C
Refers to one of the top-level (context level) business process for receiving and processing customer orders. Read more: https://en.wikipedia.org/wiki/Order_to_cash
Regex Annotators
Regex Annotator detects email addresses, URLs, phone numbers, zip codes, IBANs, CUSIP numbers, or any other entity that can be identified using a regular expression.
What is used In Intelligent Automation Cloud for storing secure data?
Secret Vault
What is the purpose of Secrets Vault?
Secrets Vault functionality that is designed for storing secure data in the (login, password, +alias) format +
What is the key difference bewteen RDA and RPA
So, the key difference is that RDA is always initiated by a person and assists in real time. Whereas in RPA, people are only involved for exceptions handling. A good example of where RDA can be applied is a call center. The operator needs to collect the required data as soon as possible, so she or he triggers an RDA bot, which automatically fill in information about a customer after the agent takes a call and might build a transaction report. Then the operator can continue handling the request.
Business Intelligence / BI
Technologies, applications and practices for the collection, integration, analysis, and presentation of business information. The purpose of Business Intelligence is to support better business decision making. Read more: http://olap.com/learn-bi-olap/olap-bi-definitions/business-intelligence/
Task
The smallest identifiable and essential piece of a job that serves as a unit of work, and as a means of differentiating between the various components of a project. Read more: http://www.businessdictionary.com/definition/task.html
Binary classification
The task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not.
Sending Email with HTML Body
To send emails from Bot configs, use the standard Web-Harvest <mail> plugin: Do not forget to enter your SMTP server settings (host, port, username, password, security) and email type (text or html). To insert a variable value into mail body, use the <var> plugin with a respective name attribute. If you want to insert HTML tags into mail body, surround them with CDATA section(s). <?xml version="1.0" encoding="UTF-8"?> <config> <var-def name="acronym"> SPA </var-def> <mail smtp-host="smtp.example.com" smtp-port="587" type="html" from="[email protected]" to="[email protected]" subject="Some test subject" charset="UTF-8" username="[email protected]" password="super_secure_password" security="ssl"> Using HTML content in your email: <![CDATA[ <p><a href="https://workfusion.com">WorkFusion Inc.</a> is providing <strong>SPA*</strong></p> ]]> <![CDATA[<hr><em>]]> *<var name="acronym"/> - Smart Process Automation <![CDATA[</em>]]> </mail> </config>
Running (HPO) Training from Control Tower
To start model (HPO) training from CT, the following steps are taken: Create a manual task selecting ML Model, BP Use Case and Training Set (assuming training set is ready and uploaded as Automation Training Set to CT) in the Automation options: Creating a Manual Task. Run and stop the task to start model evaluation: Starting a Training Process. Note that to see the training BP, you need to select Automation Training filter in Business Processes view — by default, training processes are hidden. Once training is finished, you will see an AUTOMATION AVAILABLE label on the manual task. Then you can click on the label, set accuracy threshold, and apply recommendation to create a process with Automation sub-process in it: Setting Up an Automation Chart. Now you can run the Automation process with the trained model extracting data: Running a Cognitive Business Process.
Document type:
Type of document that is processed in the Use Case depending on the industry. For example: statement of value, invoice, email, purchase order, loan booking, reconciliation, annual reports, etc. Usually in the Information Extraction (IE) use case, there is one document type, and in the Classification use case, there are several. A DA needs to consider the quantity of document types, their logic, and representativeness.
How to determine if a process is fit for Intelligent Automation
Unstructured Data, Judgement- Based and Non- Digital Data
How to link Information Extraction Manual Task with data to show for extraction?
Use Unique ID field in Answer. Type exact variable name from Input Data or previous step. +
WorkFusion Studio
WorkFusion Studio is an Eclipse-based IDE used to develop Bot configurations. It allows to perform the following actions: Create, edit, and publish Bot configurations Create Bot Config Bundles Web-scraping & ETL Use AutoML SDK Record user workflows Inspect desktop applications
Supervised learning
analyzes the training data and produces an inferred function. The training data consist of a set of training examples. Each example is a pair consisting of an input object and a desired output value. For example, logistic regression, decision tree, naive bayes classifier, support vector machine, bagging (e.g. random forest), some types of neural networks.
Post-Processing
applies normalization logic and field grouping.
Feature weights
are coefficients that show the importance of each feature and the strength of its impact on a classification decision. Class function returns 1 or -1 that defines which class is assigned to the object.
Facilities and transportation costs
are those that cover mortgage or rent for office space, heating, light, power for offices and equipment, communications (e.g. phones, mobile phones, and tablets), facilities insurance, and transportation for workers if provided by the business.
tabnumber
attribute showing that the tag belong to some group of tags and the number of this group. We will give more details about this attribute in TABLES PROCESSING.
Fields
attributes that should be extracted from the documents (for example date, email, company name, company address). Each tagged value is an object for some particular field. It is surrounded by a special tag with special attributes. This text and the tag are called gold tag or a gold value.
What types of Business Process schedules are available in Control Tower?
b. Based on same Input Data file and configured time table + d. Triggered by change of file stored in S3 file storage +
What are the possible ways for designing the Bot Task?
b. Code and run in the Bot Task on the WorkFusion instance + d. Code and debug in WorkFusion Studio, then manually transfer the code to the Bot Task in the WorkFusion instance +
Data Analyst is responsible for the development of BP:
b. False +
29. What are the possible ways for obtaining details about the exception in the WorkFusion instance?
b. In order to view the exception, you need to click the problematic Bot Task in the View Results tab and then click on the information icon +
23. Does WorkFusion provide an internal framework to rapidly build and deploy RESTful APIs?
b. No +
The developer fetches credentials from Secrets Vault and is required to do the following:
b. Secrets Vault is the secureEntryMap which contain decrypted key/value pairs. There is no need to decrypt it in the Bot +
What are the possible ways to correct an exception on a WorkFusion instance?
b. The developer can open the Bot Task, change the code, save, and then select Repair from the Actions menu + c. The developer can stop the current execution of the Business Process, fix the input data, make a new copy of the Business Process, and run it +
What is the behavior of the WorkFusion server instance if there is a Bot Task code exception?
b. WorkFusion tries to repair (re-execute) the Bot Task multiple times (the number has been set to 10 by default). Then, if all attempts have failed, it stops and throws an exception only for those records that failed to be processed by the Bot Task +
Can transaction mechanism be used in Bot Tasks for external database connection?
b. Yes +
28. What must be done in order to pass the Groovy variable to the next step of the Business Process?
c. Add the variable call to the export tag +
33. What way is used to add JARs that do not form part of the WorkFusion Control Tower?
c. By using the Bot code, you can download a JAR from the web and add it to the classpath with this.getClass().classLoader.add +
15. What are the possible configurations for exception handling by the WorkFusion server?
c. It's possible to configure the default number of repair attempts via the UI for the whole instance +
26. What is the space limit for the S3 Emulator?
c. Limited to the configured S3 Emulator cluster disk space +
22. What is the recommended way to store files in a Business Process?
c. S3 Emulator + d. Amazon S3 +
What tools are used for OCR in WorkFusion?
c. Tesseract + d. WorkFusion extension of Abbyy OCR +
27. What is the WorkFusion technologies stack on the server that executes the Bot Task?
c. Ubuntu+Java+WebHarvest+Beanshell + e. CentOS+Tomcat+Java+WebHarvest+Groovy +
While working on post-processing, MLE should (check everything that applies):
describe a post-processing logic together with DA + implement post-processing +
Steps to creating a manual task
https://doc.workfusion.com/business/docs/iac-business/control-tower/create-new-manual-task/
Confusion matrix read up on it
https://en.wikipedia.org/wiki/Confusion_matrix
AutoML Glossary pick a random set to study and add it separately
https://kb.workfusion.com/display/VDS/AutoML+Glossary
Information Extraction training set tags attributes
https://kb.workfusion.com/display/VDS/Information+Extraction+training+set+tags+attributes
Model Training
https://kb.workfusion.com/display/VDS/Model+Training#tab-SPA+10.x
You can do IE through Manual tasks
https://kb.workfusion.com/display/VDS/Step+1.+Create+Automation+Manual+Task?src=contextnavpagetreemode
Accessing REST API and parsing JSON response
https://kb.workfusion.com/display/WF/Accessing+REST+API+and+parsing+JSON+response
Bot task context with objexcts
https://kb.workfusion.com/display/WF/Bot+Tasks+Context or http://web-harvest.sourceforge.net/doc/org/webharvest/utils/SystemUtilities.html
HTML to XML Transformation and XPath Selectors
https://kb.workfusion.com/display/WF/HTML+to+XML+Transformation+and+XPath+Selectors
Loading external JARs
https://kb.workfusion.com/display/WF/Loading+external+JARs
Run and Debug Bots with RPA Worker
https://kb.workfusion.com/display/WF/Run+and+Debug+Bots+with+RPA+Worker
Download ZIP from S3 and extract content This is a universal example - an archive can contain files and nested folders.
https://kb.workfusion.com/display/WF/S3+Advanced+Usage
Secrets+Vault+Plugins go over a random secrets vault tag and try it out
https://kb.workfusion.com/display/WF/Secrets+Vault+Plugins
Standard webharvest processing
https://kb.workfusion.com/display/WF/Standard+Web-Harvest+processors#StandardWebHarvestprocessors-config
Auto ML Runthorugh
https://kb.workfusion.com/pages/viewpage.action?spaceKey=VDS&title=AutoML
Actually run these examples
https://kb.workfusion.com/pages/viewpage.action?spaceKey=WF&title=Bot+Task+Examples#BotTaskExamples-StoreFilesinS3Storage
Troubleshooting in WorkFusion Studio
https://kb.workfusion.com/pages/viewpage.action?spaceKey=WF&title=Troubleshooting+in+WorkFusion+Studio
Work Fusion Repo
https://kb.workfusion.com/pages/viewpage.action?spaceKey=WF&title=WorkFusion+Repository
The requirement of completion means:
if a field is present in the document, all the tokens belonging to the field must be tagged
Implementing Annotator Interface All custom Annotator classes should implement an Annotator interface, as in the following example.
import com.workfusion.vds.sdk.api.nlp.annotator.Annotator; import com.workfusion.vds.sdk.api.nlp.model.IeDocument; public class MyAnnotator implements Annotator<IeDocument> { @Override public void process(IeDocument document) { // TODO put your code here } }
Extract The extract() method applies the main algorithm for feature extraction which performs the analysis of the provided Document structure, and then extracts features for a specified Token element. The extract() method is called for each Token and returns a list of features. The extraction process is parallel for all fields in a Document. The following Feature Extractor produces a feature when a Token contains only digits.
import java.util.ArrayList; import java.util.Collection; import java.util.List; import com.workfusion.vds.sdk.api.nlp.fe.Feature; import com.workfusion.vds.sdk.api.nlp.fe.FeatureExtractor; import com.workfusion.vds.sdk.api.nlp.model.Document; import com.workfusion.vds.sdk.api.nlp.model.Element; public class DigitsOnlyFE<T extends Element> implements FeatureExtractor<T> { private final static String FEATURE_NAME = "DigitsOnly"; @Override public Collection<Feature> extract(Document document, T element) { List<Feature> features = new ArrayList<>(); if (element.getText().matches("[0-9]+")) { features.add(new Feature(FEATURE_NAME, 1.0)); } return features; } }
Example of using a dictionary annotater
import java.util.ArrayList; import java.util.List; import com.workfusion.vds.sdk.api.hypermodel.annotation.ModelConfiguration; import com.workfusion.vds.sdk.api.hypermodel.annotation.Named; import com.workfusion.vds.sdk.api.nlp.annotator.Annotator; import com.workfusion.vds.sdk.api.nlp.configuration.IeConfigurationContext; import com.workfusion.vds.sdk.nlp.component.annotator.EntityBoundaryAnnotator; import com.workfusion.vds.sdk.nlp.component.annotator.ner.AhoCorasickDictionaryNerAnnotator; import com.workfusion.vds.sdk.nlp.component.annotator.tokenizer.SplitterTokenAnnotator; import com.workfusion.vds.sdk.nlp.component.dictionary.CsvDictionaryKeywordProvider; @ModelConfiguration public class MyModelConfiguration { @Named("annotators") public List<Annotator> annotators(IeConfigurationContext context) { List<Annotator> annotators = new ArrayList<>(); // Adding a Tokenizer annotators.add(new SplitterTokenAnnotator("\\W+")); // Adding an Entity Boundary Annotator annotators.add(new EntityBoundaryAnnotator()); annotators.add(new AhoCorasickDictionaryNerAnnotator("country", // Provider is used to read the CSV dictionary file from classpath new CsvDictionaryKeywordProvider(context.getResource("classpath:dictionary/countries.csv")))); return annotators; } }
Technology infrastructure costs
include network infrastructure, servers, operating system and database licensing costs, communication software costs, core system licensing for ERP, CRM, and similar systems, and other technology elements required for transaction processing.
Machine learning
is a branch of computer science that applies various techniques to give the computer an ability to learn from data without a significant need for programming.
Information Extraction
is a process of extracting structured information (or key facts) from unstructured and/or semi-structured documents (invoices, claims, dividend news, etc.).
Business Process Management (BPM)
is a systematic approach to making an organization's workflow and processes more effective, more efficient and more capable of adapting and scaling. RPA is also about making more effective, scalable processes
Application Development Automation
is a technology/methodology that increases development process speed, limits knowledge dissipation, ensures build quality and does not require developers to perform a large number of manual actions.
Entity Boundary Annotator
is an optional Annotator which creates Elements of type Entity Boundary based on text chunks between the specific boundary HTML tags.
Checker model
is to try and identify incorrect values by training a separate meta-model, and then use it to post-process—that is validate and remove—incorrectly predicted values produced by the original model. Estimated accuracy model is then trained considering this information.
Overfitting happens when
some boundaries are based on on distinctions that don't make a difference. You can see if a model overfits by having test data flow through the model.
IT Automation,
sometimes referred to as infrastructure automation, is the use of software to create repeatable instructions and processes to replace or reduce human interaction with IT systems.
Annotator
splits text into tokens, adds boundary elements, adds NERs.
Machine learning identifies patterns using
statistical learning and computers by unearthing boundaries in data sets. You can use it to make predictions.
SME means
subject matter expert
In order to calculate Accuracy using Confusion Matrix, we:
sum all the objects in the main diagonal and divide this sum by the total number of objects +
In its practice WorkFusion mainly uses:
supervised learning +
Two main types of machine learning are:
supervised learning and unsupervised learning +
The requirement of diversity means:
the training set must contain all the possible layouts of documents and documents distribution across the layouts must be close to reality
Objects in SVM are classified on the basis of:
their positions in n-dimensional feature space +
The function that divides objects into two classes in n-dimensional space is built during:
training process +
For Support Vector Machine, the role of source of features plays:
training set +
In Information Extraction, false positives are:
values for which gold is empty and that were extracted as not empty +
key areas with the best RPA opportunities are:
1 Human resources The entire hire-to-retire spectrum is ripe with opportunity. This is because HR tends to require information management and standardization across a large variety of systems and applications. Think of a talent acquisition process, for example, which might involve several applications itself — ranging from personnel databases and file managers to salary and compensation analysis tools. 2 Finance and accounting This area has vast opportunities as the processes are also rule-based, requiring high degrees of accuracy and speed, which RPA is designed for. Some common areas are order management, billing, regulatory compliance, accounting and reconciliation. 3 Procurement This is a prime area thanks to the structured documents and data. Examples are invoice processing, spend data management, purchase order management, and contract management, which often involve communicating and validating data across different databases and ordering systems.
The Six pillars of Automation Strategy
1 Process Excellence: Processes are both efficient and effective. Excellent processes undergo regular improvement and minimize rework and waste. 2 Automation Governance: Prescribes the monitoring, management, enhancement, production hardening and re-architecting of systems under automation. 3 Data Governance: Management principles, access and usage decision rights, and stewardship for managing and securing the organization's data resources. 4 Automation Applications Governance: Selection of the right tools for the right job is an important part of the strategy. Application governance helps select, introduce and manage technology tools through the initiative. 5 Technical Skills: Hiring, training and technically supporting the architecture, design, coding, testing and deployment professionals necessary for automation success. 6 Sustainable deployment: Maintaining automated applications, including the software robots, tools, technology and documentation to keep applications operating correctly and efficiently.
Three things to see if something is a candidate for RPA
1 Check inputs and outputs involved in a process .Processes with structured standardized inputs are more suitable than processes that use unstructured ones. Structured inputs examples are spreadsheets databases, JSON files, CSV files, XML and other electronic feeds. Because of their digital and structured nature, they're a better fit for RPA than scanned documents, faxes, handwritten forms or emails. 2. Look for routine and rule-based processes. A process where employees follow a strict set of rules is more suitable for RPA than one that includes decisions open to employee judgment. If you think your process is too complicated, try to break into its smallest constituent parts. You will find that much of the work in an organization can be classified as highly rule-based and therefore highly automatable. 3. Consider the type of data in a process. RPA works best with data in the form of text and numbers because it can guarantee accuracy when programmed correctly. Image interpretation is possible in some cases, but it's usually better to use Intelligent Automation solutions when image recognition is required.
Record to Report / R2R
A Finance and Accounting (F&A) management process which involves collecting, processing and delivering relevant, timely and accurate information. It provides strategic, financial and operational feedback on how a business is performing. Stakeholders read the feedback and gain insights into whether an organization is performing successfully or not, and if their expectations have been met. Read more: https://www.invensis.net/blog/finance-and-accounting/what-is-record-to-report/
Cognitive Automation
A capability of software robots to use AI to mimic human judgment in business operations without specifying rules. Cognitive automation is a knowledge-based technology. Here, the machine goes through several human-like behaviors to understand how humans behave and define its own rules. It works with unstructured data, can build relationships and find similarities between the items by learning.
Single Sign-on / SSO
A capability wherein a user logs in to one Client and is then signed in to other Clients automatically, regardless of the platform, technology, or domain. Google's implementation of login for its products (including Gmail, YouTube, Google Analytics, and so on) is an example of SSO: Any user logged in to one of Google's products is automatically logged in to the other products as well. Read more: https://auth0.com/docs/sso
General Ledger / GL
A company's set of numbered accounts for its accounting records. The ledger provides a complete record of financial transactions over the life of the company. The ledger holds account information that is needed to prepare financial statements and includes accounts for assets, liabilities, owners' equity, revenues, and expenses. Read more: http://www.investopedia.com/terms/g/generalledger.asp#ixzz4dJJUGrRW
Digital Transformation
A complete re-examination and adjustment of business and organizational processes, systems, and models to fully leverage the changes and opportunities allowed by digital technologies. The key drivers are optimizing the end-to-end customer experience, gaining operational flexibility, and embracing innovation.
Virtual Customer Assistant / VCA
A computer program that assists customers. The fact that it's a computer program, at least partially filling the role of a live agent, is what makes it "virtual." The fact that it's described as an "assistant" is important, too. VCAs are not expected to fully resolve customer issues. More often, they're expected to assist customers by giving them a push in the right direction. Read more: http://www.softwareadvice.com/resources/beginners-guide-to-virtual-customer-assistants/
Service Level Agreement / SLA
A contract between a service provider (either internal or external) and the end user that defines the level of service expected from the service provider. SLAs are output-based in that their purpose is specifically to define what the customer will receive. Read more: https://www.paloaltonetworks.com/cyberpedia/what-is-a-service-level-agreement-sla
Master Services Agreement / MSA
A contract that spells out most but not all of the terms between the signing parties. Its purpose is to speed up and simplify future contracts. Ideally the initial time-consuming negotiation is done once, at the beginning. Read more: http://smallbusiness.chron.com/definition-master-services-agreement-40141.html
Virtual Personal Assistant / VPA
A conversational, computer-generated character that simulates a conversation to deliver voice- or text-based information to a user via a Web, kiosk or mobile interface. A VA incorporates natural-language processing, dialogue control, domain knowledge and a visual appearance (such as photos or animation) that changes according to the content and context of the dialogue. The primary interaction methods are text-to-text, text-to-speech, speech-to-text and speech-to-speech. Read more: http://www.gartner.com/it-glossary/virtual-assistant-va/
eXtensible Markup Language / XML
A markup language wherein users can create their own tags. It was created by the World Wide Web Consortium (W3C) to overcome the limitations of HTML, the Hypertext Markup Language that is the basis for all Web pages. Like HTML, XML is based on SGML -- Standard Generalized Markup Language. Although SGML has been used in the publishing industry for decades, its perceived complexity intimidated many people that otherwise might have used it. (A common joke is that SGML also stands for "Sounds great, maybe later") XML was designed with the Web in mind. Read more: https://www.ibm.com/developerworks/xml/tutorials/xmlintro/xmlintro.html
Simple Object Access Protocol / SOAP
A messaging protocol that allows programs that run on disparate operating systems (such as Windows and Linux) to communicate using Hypertext Transfer Protocol (HTTP) and its Extensible Markup Language (XML). Read more: http://searchmicroservices.techtarget.com/definition/SOAP-Simple-Object-Access-Protocol
Human in the Loop / HitL
A model that requires human interaction. HitL readily allows for automated systems to seamlessly interact with people in the course of processing. Read more: https://en.wikipedia.org/wiki/Human-in-the-loop
Out of the Box / OOTB
A ready-made software, hardware, or combination package that meets a need that would otherwise require a special development effort. Read more: http://www.urbandictionary.com/define.php?term=out%20of%20the%20box
Process
A sequence of interdependent and linked procedures which, at every stage, consume one or more resources (employee time, energy, machines, money) to convert inputs (data, material, parts, etc.) into outputs. These outputs then serve as inputs for the next stage until a known goal or result is reached. Read more: http://www.businessdictionary.com/definition/process.html
Business Process Improvement / BPI
A strategic planning methodology aimed at identifying the operations or employee skills that could be improved to encourage smoother procedures, more efficient workflow, and overall business growth. This process can also be referred to as functional process improvement. Read more: http://searchcio.techtarget.com/definition/business-process-improvement-BPI
Business Continuity Plan / BCP
A strategy that recognizes threats and risks facing a company, to ensure that personnel and assets are protected and able to function in the event of a disaster. Read more: Business Continuity Planning - BCP Definition | Investopedia http://www.investopedia.com/terms/b/business-continuity-planning.asp#ixzz4dIzWnZdo
Machine Learning / ML
A study of algorithms and statistical models that allows software applications to learn from experience and become more accurate in predicting outcomes without being explicitly programmed. It provides software the capacity to effectively perform a specific task relying on patterns, i.e. to modify its processing on the basis of newly acquired information.
Version Control System / VCS
A system that records changes to a file or set of files over time so that you can recall specific versions later. Read more: https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
Center of Excellence/CoE
A team, a shared facility or an entity that provides leadership, best practices, research, support and/or training for a focus area.
Optical character recognition / OCR
A technology that allows converting different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.
Interactive Voice Recognition / IVR
A telephony technology in which someone uses a touch-tone telephone to interact with a database to acquire information from or enter data into the database. IVR technology does not require human interaction over the telephone as the user's interaction with the database is pre-determined by what the IVR system will allow the user access to. Read more: http://www.webopedia.com/TERM/I/IVR.html
Shared Services / SS
A term defining an operational philosophy that involves centralizing those administrative functions of a company that were once performed in separate divisions or locations. Services that can be shared among the various business units of a company include finance, purchasing, inventory, payroll, hiring, and information technology. Read more: http://www.inc.com/encyclopedia/shared-services.html
Full-time equivalent / FTE
A unit that indicates the workload of an employee. It is used to measure a worker's involvement in a project or to track cost reductions in an organization. 1 FTE is equivalent to one employee working full-time.
Dynamic Case Management / DCM
A variant of Business Process Management focused on handling case-based, unstructured workflows such as insurance claim handling.
Representational State Transfer / REST
An architectural style, and an approach to communications that is often used in the development of Web services. The use of REST is often preferred over the more heavyweight SOAP (Simple Object Access Protocol) style because REST does not leverage as much bandwidth, which makes it a better fit for use over the Internet. Read more: http://searchmicroservices.techtarget.com/definition/REST-representational-state-transfer
Robotic Desktop Automation / RDA
An automation technology that facilitates human and bot collaboration. Bots respond to employee-triggered actions by automatically completing certain tasks to simplify the work routine.
Bot
An autonomous program which does automated tasks and can interact with systems or users.
Total Cost of Ownership/ TCO
An estimate of all the direct and indirect costs involved in acquiring and operating a product or system over its lifetime.
Subject-matter expert / SME
An individual with a deep understanding of a particular business process, function, technology, machine, material or type of equipment.
Center of Excellence (COE) is an in-house entity that provides capabilities for a specific area of focus. The COE topic is huge, but let's specifically highlight some worthy aspects of an automation COE, what advantages it brings, what it actually does, and how it operates.
Automation Center of Expertise: Offered to line of business organizations, the focus is on training and enablement of automation development and implementation teams. Its primary goals include dissemination of best practices and building common skills across automation teams, to ensure success. The tool focus is business-friendly rather than enterprise- and IT-friendly. Automation Center of Excellence: Offered to the line of business, the true Center of Excellence focuses on strategic projects. The focus is on methodology, best practices and integration with re-engineering projects. The technology suite includes process-level, enterprise-oriented, and IT-friendly tools necessary to succeed in large-scale strategic projects. Automation Factory: Another line of business solution, Automation Factories deliver quick, efficient automation results. The key driver for this form is high demand for expertise and many projects. The tooling is process-level, using Intelligent Automation tools. Automation Community: Unlike the previous CoE offerings, the automation community is a Community of Interest sharing information, expertise and best practices across lines of business. It is "Automation teams helping Automation teams." Accordingly, the focus is on tactical and departmental projects. With laissez-faire governance and standards compliance, the automation community of interest features "ad hoc" demand management. Technology is task-level and no-code.
End User Developed Applications / EUDA
Built by tools that allow end-users (people who are not professional software developers) to program computers. People can use EUD tools to create or modify software artifacts (descriptions of automated behavior) and complex data objects without significant knowledge of a programming language. Read more: https://en.wikipedia.org/wiki/End-user_development
Managerial Information / MI
Gives managers feedback about their own performance; top management can monitor the company as a whole. Information displayed by the Management Information System (MIS) typically shows "actual" data over against "planned" results and results from a year before; thus it measures progress against goals
Business Process Services / BPS
Managing and executing business operations within the scope of BPO.
Work Digitization
Optical character recognition (OCR) digitizes image-based files, making them machine-readable. Paired with smart software bots and people, operations teams can automate high volume, document-intensive processes from end to end.
Supply Chain Management / SCM
Oversight of materials, information, and finances as they move in a process from supplier to manufacturer to wholesaler to retailer to consumer. Supply chain management involves coordinating and integrating these flows both within and among companies. Read more: http://searchmanufacturingerp.techtarget.com/definition/supply-chain-management
Proof of Value / PoV
Proof of Value builds on Proof of Concept to demonstrate that a solution works for a particular prospect's situation and delivers value, rather than simply demonstrating that a solution works as described by its developers. Read more: https://www.linkedin.com/pulse/poc-vs-pov-whats-difference-mr-software-vendor-andrew-brockfield
Global Business Services / GBS
Provides integration of governance, locations, and business practices to all shared services and outsourcing activities across the enterprise.
Hire to Retire / H2R
Providing employees with a clear understanding that there is a plan in place for them to be a valued team member of the staff and organization. Read more: https://workology.com/hire-to-retire-maintaining-the-dash/
Return on Investment / ROI
ROI measures the gain or loss generated on an investment relative to the amount of money invested. ROI is usually expressed as a percentage and is typically used for personal financial decisions, to compare a company's profitability or to compare the efficiency of different investments. Read more: http://www.investinganswers.com/financial-dictionary/technical-analysis/return-investment-roi-1100
RPA is not AI becuase
RPA is rule based
RPA vs Macros and AI and BPM
So remember, RPA excels macros and can integrate many applications. It shares the same goal and is a complement to, but not a replacement for, BPM. And RPA is not the same as AI, but its capabilities could be combined into Intelligent Automation solutions.
Digital Labor Governance
Software bots, like people, require management. Preventive controls ensure both bot and human performance using advanced quality control and automatically scale the workforce based on volume peaks and troughs.
Open Source Software / OSS
Software with source code that anyone can inspect, modify, and enhance. Read more: https://opensource.com/resources/what-open-source
Unstructured data
Text, images, audio files, video files, sensor outputs, and similar data that may have a well-defined internal structure, which is not defined by position or tag. Unstructured data may be simple to use for appropriately designed software (e.g. MP3 player easily plays appropriate audio files) but making using unstructured data for rule evaluation and interpretation is beyond simple query and decide processing.
Deep learning
The application to learning tasks of artificial neural networks that contain more than one hidden layers
Intelligent Automation / IA
The automation of the company's processes (including general processes using BPM and specific task-level processes using RPA), supported by analytics and decisions made by Artificial Intelligence.
Project and Portfolio Management / PPM
The centralized management of the processes, methods, and technologies used by project managers and project management offices (PMOs) to analyze and collectively manage current or proposed projects based on numerous key characteristics. Read more: https://en.wikipedia.org/wiki/Project_portfolio_management
Human Resources / HR
The company department charged with finding, screening, recruiting and training job applicants, as well as administering employee-benefit programs. Read more: http://www.investopedia.com/terms/h/humanresources.asp#ixzz4dJMBtmzg
Business Process Outsourcing / BPO
The contracting of non-primary business activities and functions to a third-party provider. BPO services include payroll, human resources (HR), accounting and customer/call center relations. https://www.techopedia.com/definition/13776/business-process-outsourcing-bpo
Program Management Office / PMO
The organizational unit responsible for Program Management (PM). PM is the process of managing several related projects, often with the intention of improving an organization's performance. In practice and in its aims it is often closely related to systems engineering, industrial engineering, change management, and business transformation.
Procure to Pay / P2P
The process of obtaining and managing the raw materials needed for manufacturing a product or providing a service. ... According to the Chartered Institute of Purchasing and Supply, procure to pay should be a seamless process from point of order to payment. Read more: http://searchfinancialapplications.techtarget.com/definition/procure-to-pay-P2P
Artificial Intelligence / AI
The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Quality Control / AutoQC
The use of statistical methods in the monitoring and maintaining of the quality of products and services.
ETL: Extract, Transform, and Load
Three database functions are combined into one tool to pull data out of one database and place it into another database. Extract is the process of reading data from a database. Transform is the process of converting the extracted data from its previous form into the what it needs to be so that it can written into another database. Load is the process of writing the data into the target database. Read more: http://www.webopedia.com/TERM/E/ETL.html
Service Delivery Automation / SDA
Uses automation to replace a series of human actions. Read more: http://searchmanufacturingerp.techtarget.com/definition/supply-chain-management
Error and correction costs
are the actual costs or standard costs realized when correcting defective work. For transaction processing, this may include but is not limited to expenses for the worker time and supervisory time applied to reversing out wrong transactions, correcting customer or inventory records, and addressing similar problems that may cascade from incorrectly processed transactions.
Compliance and risk costs
are those which arise from ensuring the business follows industry regulations and from the fees and penalties that businesses must pay for violating regulations.
Direct labor costs
include the salary, benefits (e.g. health insurance and retirement contributions), and employment taxes (e.g. Social Security, Worker's Compensation,) paid to, or on behalf of, each worker processing transactions.
Indirect labor costs
include the same salary, benefits and employment taxes paid to workers who support the direct labor force. These workers include human resources, accountants, managers and similar roles that enable the transaction processing workers.
Business Operations Automation
is related to the technology-enabled automation of complex business processes. It aims to streamline a business for simplicity, achieve digital transformation, improve service quality & delivery, and reduce costs.
RDA is an automation technology
that facilitates human and bot collaboration. Bots respond to employee-triggered actions by automatically completing certain tasks to simplify a work routine.
