SAS DataFlux

¡Supera tus tareas y exámenes ahora con Quizwiz!

File

Within the Data Mgmt Studio Repository, the ___________storage of a repository can contain the following: data jobs process jobs match reports entity resolution files queries entity definitions other files A. Data B. File

Data explorations can be used for the following: to identify data redundancies to extract and organize metadata from multiple sources to identify relationships between metadata to catalog data by specified business data types and processes

A data exploration reads data from databases and categorizes the fields in the selected tables into categories. These categories are predefined in the Quality Knowledge Base (QKB). Data explorations perform this categorization by matching column names. You also have the option of sampling the data in the table to determine whether the data is one of the specific types of categories in the QKB. A. Repository B. Data Collection C. Data Exploration

Token

A __________ is an "atomically semantic" component of a data value. In other words, _____________ represent the smallest pieces of a data value that have some distinct meaning. A. Token B. Data Value C. Data Object

Collection

A _______________ is a set of fields that are selected from tables that are accessed from different data connections. A _______________ provides a convenient way for users to build a dataset using those fields. A __________________ can be used as an input source for a profile in Data Management Studio. A. Collection B. Data Connection C. Master Data Foundation

Standardization A standardization definition has the following attributes: is more complex than a standardization scheme involves one or more standardization schemes can also parse data and apply regular expression libraries and casing

A ________________________ scheme is a simple find-and-replace table that specifies how data values will be standardized. A. Data Search B. Standardization

Standardization A standardization scheme can be built from the profile report. When a scheme is applied, if the input data is equal to the value in the Data column, then the data is changed to the value in the Standard column. The standard value DataFlux was selected by the Scheme Builder because it was the permutation with the most occurrences in the profile report.

A _________________scheme takes various spellings or representations of a data value and lists a standard way to consistently write this value. A. Build B. Standardization

Preview Previewing does not create the output. The output is physically created only when the job is executed.

A _______________of a Data Output node does not show field name changes or deletions. This provides the flexibility to continue your data flow after a Data Output node. In addition, previewing a Data Output node does not create the output. You must run the data job to create the output. A. Export B. Import C. Preview

Reference Reference source locations are registered on the Administration riser bar in DataFlux Data Management Studio. One reference source location of each type should be designated as the default.

A ______________object is typically a database used by DataFlux Data Management Studio to compare user data to a reference source (for example, USPS Address Data). You cannot directly access or modify references. A. Data Source B. Reference

Business Rule Business rules are defined within a repository using the Business Rules Manager.

A formula, validation, or comparison that can be applied to a given set of data.Data must either pass or fail the business rule. A. Exception B. Business rule

Plan: Discover

A quick inspection of your corporate data would probably find that it resides in many different databases, managed by many different systems, with many different formats and representations of the same data. This step of the methodology enables you to explore metadata to verify that the right data sources are included in the data management program. You can also create detailed data profiles of identified data sources so that you can understand their strengths and weaknesses. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Data Collection A data collection has the following features: provides a convenient way to build a data source using desired fields can be used as an input source for profiles

A set of data fields from different tables in different data connections. A. Repository B. Data Collection

Act: Execute

After business users establish how the data and rules should be defined, the IT staff can install them within the IT infrastructure and determine the integration method (real time, batch, or virtual). These business rules can be reused and redeployed across applications, which helps increase data consistency in the enterprise. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Act: Design

After you complete the first two steps, this phase enables you to take the different structures, formats, data sources, and data feeds and create an environment that accommodates the needs of your business. At this step, business and IT users build workflows to enforce business rules for data quality and data integration. They also create data models to house data in consolidated or master data sources. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Plan

Analyzing and exploring the data sources can lead to the discovery of data quality issues. The ACT phase is designed to create data jobs that cleanse, or correct, the data. This phase involves the following: standardizing, parsing, and/or casing the data correctly identifying types of data (identification analysis) performing methods to remove duplicates from data sources or to join tables with no common key A. Plan B. Act C. Monitor

Role-based rule

Business Rule that evaluates every row in a table? A. Role-based rule B. Set-based rule C. Group-based rule

Group-based rule

Business Rule that evaluates groups of data (for example, if data is grouped by product code, then the rules are evaluated for each product code)? A. Role-based rule B. Set-based rule C. Group-based rule

Set-based rule

Business Rule that evaluates the table as a whole? A. Role-based rule B. Set-based rule C. Group-based rule

SAS QKB for Product Data (PD)

Contains extraction, parsing, standardization, and pattern analysis definitions to handle the following attributes in generic product data: • brands/manufacturers • colors • dimensions • sizes • part numbers • materials • packaging terms and units of measurement A. SAS QKB for Contact Information (CI) B. SAS QKB for Product Data (PD)

Network

Possible values of Diff type include the following: A record belongs to a set of records that are involved in one or more different multirecord clusters in the left and right tables. A. Combine B. Divide C. Network

Extraction

Extracts parts of the text string and assigns them to corresponding tokens for the specified data type. A. Case B. Extraction C. Gender Analysis D. Identification Analysis E. Language Guess F. Locale Guess G. Match H. Parse I. Pattern Analysis J. Standardization

Data Type

In the context of the QKB, a _______________ is an object that represents the semantic nature of some data value. A _____________ serves as a placeholder (or grouping) for metadata used to define data cleansing and data integration algorithms (called definitions). DataFlux provides many data types in the QKB, but you can also create your own. A. Data Object B. Data Type

Data Job Node The referenced data job (the one that is embedded using the Data Job (reference) node) must have an External Data Provider node as the input. Data is passed from the parent job to the referenced data job, processed, and returned to the flow in the parent job. The Data Job (reference) node is found in the Data Job grouping of nodes.

Is used to embed a data job within a data job. A. Data Node B. Data Job Node

Combine

Possible values of Diff type include the following: A record belongs to a set of records from one or more clusters in the left table that are combined into a larger cluster in the right table. A. Combine B. Divide C. Network

Divide

Possible values of Diff type include the following: A record belongs to a set of records in a cluster in the left table that is divided into two or more clusters in the right table. A. Combine B. Divide C. Network

Metadata

Profiles are not stored as files, but as ____________. To run a profile via the command line, the Batch Run ID for the profile must be specified. A. Metadata B. Tokens

External Data Provider Node The External Data Provider node has the following characteristics: accepts source data from another job or from user input that is specified at run time can be used as the first node in a data job that is called from another job can be used as the first node in a data job that is deployed as a

Provides a landing point for source data that is external to the current job. A. External Data Provider Node B. External Data Job

Data profiles provide the following benefits: improve understanding of existing databases aid in identifying issues early in the data management process, when they are easier and less expensive to manage help determine which steps need to be taken to address data problems enable you to make better business decisions about your data

Provides the ability to inspect data for errors, inconsistencies, redundancies, and incomplete information. A. Data Profile B. Data Collection

Extensible

Rules are no longer limited to well-known contact data. With the customization feature in Data Management Studio, you can create data-cleansing rules for any type of data. A. Fully Customizable B. Extensible C. Modifiable D. Efficient E. Flexible

Modifiable

Rules can be modified to appropriately address the needs of the enterprise and can be implemented across Data Management Studio modules. A. Fully Customizable B. Extensible C. Modifiable D. Efficient E. Flexible

Master Data Foundation

The __________________ feature in Data Management Studio uses master data projects and entity definitions to develop the best possible record for a specific resource, such as a customer or a product, from all of the source systems that might contain a reference to that resource. A. Collection B. Data Connection C. Master Data Foundation

Cluster Diff

The __________________ node is used to compare two sets of clustered records by reading in data from a left and a right table. From each table, the ______________________ node takes two inputs: a numeric record ID field and a cluster number field. A. Cluster Group B. Cluster Diff

SAS QKB for Contact Information (CI)

Supports management of commonly used contact information for individuals and organizations, such as names, addresses, company names, and phone numbers. A. SAS QKB for Contact Information (CI) B. SAS QKB for Product Data (PD)

Multiple

The Allow generation of _____________ matchcodes per definition option requires the creation of a special match definition in the QKB. A. Single B. Multiple

Validation The Data Validation node is in the Utilities grouping of nodes.

The Data _________________node is used to filter or flag rows according to the specified condition(s). A. Import B. Validation C. Output

NULL

The Generate null match codes for blank field values option generates a ____________match code if the field is blank. If this option is not selected, then a match code of all $ symbols is generated for the field. When you match records, a field with NULL does not equal another field with NULL, but a field with all $ symbols equals another field with all $ symbols. A. Preview B. Numeric C. NULL

Collection

The SAS Quality Knowledge Base (QKB) is a _______________ of files that store data and logic that define data management operations. A. Collection B. Repository

Surviving Record Identification

The Surviving Record Identification (SRI) node examines clustered data and determines a surviving record for each cluster. A. Entity Resolution B. Surviving Record Identification

Match

The ____________ Report node produces a report listing the duplicate records identified by the match criteria. ______________ reports are displayed with a special report viewer. A. Match B. Clustering

Table

The ______________Match report displays a list of database tables that contain matching fields for a selected table or field. A. Field B. Identification C. Table

Clustering

The ________________ node enables the specification of an output ______________ ID field and specifications of _____________ conditions. A. Match B. Clustering

Quality Knowledge Base (QKB)

The _________________ is a collection of files and configuration settings that contain all the DataFlux Data Management algorithms. A. Collections Repository B. Quality Knowledge Base (QKB)

Field Match

The _________________ report displays a list of the fields in metadata that match a selected field's name. A. Field Name B. Field Relationship C. Field Match

Entity Resolution

The __________________ File enables you to manually review the merged records and make adjustments as necessary. This can involve the following tasks: examining clusters reviewing the Cluster Analysis section reviewing related clusters processing cluster records editing fields for surviving records A. Entity Resolution B. Surviving Record Identification

Identification

The _____________________Analysis report displays a list of fields in metadata that match categories in the identification analysis definitions specified for field name and sample data analysis. A. Field B. Identification C. Table

Field Relationship

The ______________________ map provides a visual presentation of the field relationships between all of the databases, tables, and fields that are included in the data exploration. A. Field Name B. Field Relationship C. Field Match

Execute Business Rule The Execute Business Rule Properties window allows for the specification of a Return status field, which flags records as either passing (True) or failing (False) the business rule. Not selecting the Return status field will pass only records that pass the business rule to the next node.

The _________________________ node applies an existing, row-based business rule to the rows of data as they flow through a data job. Records either pass or fail the selected rule. A. Execute Business Rule B. Business Rules

Clustering

The ______________node provides the ability to match records based on multiple conditions. Create conditions that support your business needs. A. Match B. Clustering

Outliers

The ______________tab lists the X minimum and maximum value outliers. The number of listed minimum and maximum values is specified when the data profiling metrics are set. A. Frequency Distribution B. Frequency Pattern C. Outliers

-j

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Executes the job in the specified file. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

-o

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Overrides settings in configuration files. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

-c

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Reads the configuration from the specified file. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

-i

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Specifies job input variables. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

-b

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Specifies job options for the job being run. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

-l

The dmpexec command can be used to execute profiles and data jobs from the command line. Which of the commands, Writes the log to the specified file. A. -j <file> B. -l <file> C. -c <file> D. -i <file> E. -b <file> F. -o <file>

Monitor: Control

The final stage in a data management project involves examining any trends to validate the extended use and retention of the data. Data that is no longer useful is retired. The project's success can then be shared throughout the organization. The next steps are communicated to the data management team to lay the groundwork for future data management efforts. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Administration

The locations of the Quality Knowledge Base files are registered on the _________________ riser bar in DataFlux Data Management Studio. There can only be one active QKB at a time. A. Collections B. Folders C. Administration

Data Job

The main way to process data in DataFlux Data Management Studio. Each ____________ specifies a set of data-processing operations that flow from source to target. A. Command B. Routine C. Data Job

Physical The command line to execute the data job could be similar to the following: call dmpexec -j "D:\Workshop\dqdmp1\Demos\files\batch_jobs\Ch4D2_Products_Misc.ddf" -l "C:\Temp\log1.txt"

The physical path and filename of data jobs must be specified with the -j switch. A. Logical B. Physical

Plan: Define

The planning stage of any data management project starts with this essential first step. This is where the people, processes, technologies, and data sources are defined. Roadmaps that include articulating the acceptable outcomes are built. Finally, the cross-functional teams across business units and between business and IT communities are created to define the data management business rules. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Monitor: Evaluate

This step of the methodology enables users to define and enforce business rules to measure the consistency, accuracy, and reliability of new data as it enters the enterprise. Reports and dashboards on critical data metrics are created for business and IT staff members. The information that is gained from data monitoring reports is used to refine and adjust the business rules. A. Plan: Define B. Plan: Discover C. Act: Design D. Act: Execute E. Monitor: Evaluate F. Monitor: Control

Case

Transforms a text string by changing the case of its characters to uppercase, lowercase, or proper case. A. Case B. Extraction C. Gender Analysis D. Identification Analysis E. Language Guess F. Locale Guess G. Match H. Parse I. Pattern Analysis J. Standardization

True

True or False: If a single value in a group of items needs to be changed, then select Edit Modify Standards Manually Single Instance. A single value can then be modified manually. To toggle back to the ability to change all instances in a group, select Edit Modify Standards Manually All Instances.

True

True or False? Data standardization does not perform a validation of the data (for example, Address Verification). Address verification is a separate component of the DataFlux Data Management Studio application and is discussed in another section.

True

True or False? If you standardize a data value using both a definition and a scheme, the definition is applied first and then the scheme is applied.

True

True or False? Monitoring tasks are created by pairing a defined business rule with one or more events. Some available events include the following: call a realtime service execute a program launch a data flow job on a Management server log error to repository log error to text file raise an event on the process job (if hosted) run a local job run a local profile send email message set a data flow key or value write a row to a table

True

True or False? Record-level rules select which record from a cluster should survive. If there is ambiguity about which record is the survivor, the first remaining record in the cluster is selected.

True Jobs and profiles developed with Data Management Studio can be uploaded to the Data Management Server. Jobs and profiles can be executed on this server, which is intended to be a more powerful processing system. Data Management Server needs access to a copy of the QKB and data packs that are used in the data jobs and profiles.

True or False? The DataFlux Data Management Server is an application server that supports web service requests through a service-oriented architecture (SOA) executes profiles, data jobs, process jobs, and services on Windows, UNIX, or LINUX servers.

True

True or False? The match code generation process consists of the following steps: 1. Data is parsed into tokens (for example, Given Name and Family Name). 2. Ambiguities and noise words are removed (for example, the). 3. Transformations are made (for example, Jonathon > Jon). 4. Phonetics are applied (for example, PH > F). 5. Based on the sensitivity selection, the following occurs: Relevant components are determined. A certain number of characters of the transformed, relevant components are used.

Target

When choosing Output Field settings, which of the options sends all fields available to target nodes to the target? A. Target B. Source and Target C. All

All

When choosing Output Field settings, which of the options specifies All available fields are passed through source nodes, target nodes, and all intermediate nodes. A. Target B. Source and Target C. All

Source and Target

When choosing Output Field settings, which of the options specifies All fields available to a source node are passed to the next node and all fields available to target nodes are passed to the target. A. Target B. Source and Target C. All

dmserver.cfg

When configuring options for the Data Management Server...which config file describes the settings below? DMSERVER/SOAP/LISTEN_PORT= PORT specifies the TCP port number where the server will listen for SOAP connections. DMSERVER/LOGCONFIG_PATH= PATH specifies the path to the logging configuration file. A. app.cfg B. dmserver.cfg

app.cfg

When configuring options for the Data Management Server...which config file describes the settings below? QKB/PATH = PATH specifies the location of the active Quality Knowledge Base. VERIFY/USPS = PATH specifies the location of USPS reference source. VERIFY/GEO = PATH specifies the location of Geo/Phone reference source. A. app.cfg B. dmserver.cfg

Lower

When creating folders, it is best practice to set folder names in _____________ with no spaces. A. Lower B. Upper

Batch Jobs

When importing to a Data Management Server, Each defined Data Management Server has a series of predefined folders. Selecting ____________ (for example) enables the Import tool in the navigation area, as well as in the main information area. A. Data Jobs B. Batch Jobs

ABANDONED

When parsing, the which term best describes the description below: A resource limit was reached. Increase your resource limit and try again. A. OK B. NO SOLUTION C. NULL D. ABANDONED

NULL

When parsing, the which term best describes the description below: The parse operation was not attempted. This result occurs only when a null value was in the field to be parsed and the Preserved null values option was enabled. A. OK B. NO SOLUTION C. NULL D. ABANDONED

OK

When parsing, the which term best describes the description below: The parse operation was successful. A. OK B. NO SOLUTION C. NULL D. ABANDONED

NO SOLUTION

When parsing, the which term best describes the description below: The parse operation was unsuccessful; no solution was found. A. OK B. NO SOLUTION C. NULL D. ABANDONED

Preserve

When standardizing, selecting _________________ null values ensures that if a field is null when it enters the node, then the field is null after being output from the node. It is recommended that this option be selected if the output is written to a database table. A. Import B. Preserve C. Archive

SQL

Which of the statements below describes this querying method? The data generated for both the __________query and the filter have the same results. The filter pulled all records. The filter was processed on the machine where the profile was run. The database does the filtering for the ________ query. A. Filtering B. SQL

Data

Within the Data Mgmt Studio Repository, the ___________storage of a repository can contain the following: explorations and reports profiles and reports business rules monitoring results custom metrics business data information master data information A. Data B. File

%

Within the Standardization Scheme, which of these commands provides an indicator specifying that the matched word or phrase is not updated? A. //Remove B. %

//Remove

Within the Standardization Scheme, which of these commands removes the matched word or phrase from the input string? A. //Remove B. %

Define

Within the _______________ methodology, there are four main functions which can be used: Connect to Data Explore Data Define Business Rules Build Schemes A. Define B. Discover

Flexible

You can customize rules to conform to the ever-changing business environment regardless of your data needs. A. Fully Customizable B. Extensible C. Modifiable D. Efficient E. Flexible

Efficient

You can dramatically reduce manual data manipulation time by simply updating cleansing rules. It is much easier to manipulate reusable data-cleansing rules than to manually manipulate the data itself. A. Fully Customizable B. Extensible C. Modifiable D. Efficient E. Flexible

Data Collection

You can use _____________ to group data fields from different tables, database connections, or both. These collections can be used as input data sources for profiles. A. Repository B. Data Collection C. Data Exploration

Fully Customizable

You have full control of data-cleansing rules across the enterprise and through time. A. Fully Customizable B. Extensible C. Modifiable D. Efficient E. Flexible

Parse

_____________ definitions define rules to place the words from a text string into the appropriate tokens. A. Parse B. Text C. Case

Field Name

______________ analysis analyzes the names of each field from the selected data sources to determine which identity to assign to the field. A. Identification B. Field Name C. Sample Data

Case

______________ definitions are algorithms that can be used to convert a text string to uppercase, lowercase, or proper case. A. Parse B. Text C. Case

Address Verification

________________ identifies, corrects, and enhances address information. A. Address Validation B. Address Verification

Sample Data

_________________ analysis analyzes a sample of data in each field to determine which identity to assign to the field. A. Identification B. Field Name C. Sample Data

Data Connection

__________________ are used to access data in jobs, profiles, data explorations and data collections. A. Collection B. Data Connection C. Master Data Foundation

Entity Resolution

__________________ is the process of merging duplicate records in a single file or multiple files so that records referring to the same physical object are treated as a single record. Records are matched based on the information that they have in common. The records that are merged might appear to be different, but can actually refer to the same person or item. A. Entity Match B. Entity Resolution C. Match Entity

Data Exploration

______________________ have the following types of analysis methods: field name matching field name analysis sample data analysis A. Repository B. Data Collection C. Data Exploration

Geocoding Geocoding latitude and longitude information can be used to map locations and plan efficient delivery routes. Geocoding can be licensed to return this information for the centroid of the postal code or at the roof-top level. Currently, there are only geocoding data files for the United States and Canada. Also, roof-top level geocoding is currently available only for the United States.

_______________enhances address information with latitude and longitude values. A. Geo Validation B. Geocoding

Identification, Right

______________analysis and ___________ fielding use the same definitions from the QKB, but in different ways. ______________ analysis identifies the type of data in a field, and __________ fielding moves the data into separate fields based on its identification. Both the ___________ analysis and _________ fielding examples above use the Contact Info identification analysis definition. a. Identification, Right B. Right, Identification


Conjuntos de estudio relacionados

Get enterprise features with analytics 360

View Set

ACCT 3210 Review Chapter 4: Income Statement, Comprehensive Income, Cash Flow Statement

View Set

SmartBook--Chapter 19: Sales and Operations Planning

View Set

Chapter 41: Management of Patients With Musculoskeletal Disorders PrepU

View Set