CP4D Study Guide

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Developer tool services - NOT Included in CPD

Lightbend Platform

Open Container Intiative

OCI

Pod State - Unknown

State of the pod cannot be determined, typically because of a communication issue.

Regulatory Accelerator

Streamline the process of complying with regulations.

Watson Knowledge Studio

Teach Watson the language of your domain. Create a machine learning model that understands the linguistic nuances, meaning, and relationships specific to your industry or to create a rule-based model that finds entities in documents based on rules that you define.

The installation of CP4D will provide a URL to access the web client

The installation of CP4D will provide a _________ to access the web client

UG DefaultWorkspace

This is the default workspace with the default settings.

Reports Dashboards Stories

Three visualiztion types in Cognos Analytics

To ensure that your data in CP4D is stored securely, you can encrypt your storage partition

To ensure that your data in CP4D is stored securely, you can encrypt your_____________

Watson Studio

Unleash the power of your data. Build custom models and infuse your business with AI and machine learning.

Execution Engine for Apache Hadoop

Used to integrate the Watson Studio service with the distributed computing power of a Hadoop cluster, allowing you to analyze data in place and use packages and libraries from Watson Studio without installing them on your Hadoop cluster

You can also sort search results by these properties:

· Name · Type · Tags · Stewards or owners · The user who last modified the asset · The last modified date

What properties can be searched?

· Tags · Name · Description · Column names

A runbook entry is a necessary backstop for any highly-available service and should be created for reconstructing the master

A _______________ is a necessary backstop for any highly-available service and should be created for _______________

A self-signed TLS Certficate is included in the CP4D installation, which can be used to enable HTTPS connections. By default this is untrusted by all HTTPS clients.

A __________is included in the CP4D installation, which can be used to enable HTTPS connections. By default this is _______ by all HTTPS clients.

IBM Db2 Warehouse

A high-performing analytics engine that combines in-memory processing with integrated database analytics.

Installation Component - Assembly

A particular Cloud Pak for Data service (and its dependent assemblies) to deploy to a project in Red Hat OpenShift. For Cloud Pak for Data control plane and services, these are stored on a public IBM file server, and do not have to be installed at the same time.

StoredIQ

A platform for IBM's information life cycle governance, big data governance, and enterprise content management technologies, and is used to govern unstructured data sources

Change Capture stage

A processing stage that compares two data sets and makes a record of the differences.

OpenShift Container Registry (OCR)

Adds the ability to automatically provision new image repositories on demand. This provides users with a built-in location for their application builds to push the resulting images.

Kafka

Open source streaming platform software that provides a publish and a subscribe messaging transport protocol using topics.

Watson AIOps

Uses artificial intelligence to simplify IT operations management and accelerate and automate problem resolution in complex modern IT environments

Minikube

Which tool can you use to run Kubernetes locally as a single-node cluster?

kubelet kube-proxy

Which two are required to run on each Kubernetes node?

Isolating User Access Isolating Applications or Jobs Maintaining a development and a production environment

Why might you want to provision more than one instance of a service?

To specify the hardware and software configurations for the environment runtimes.

Why specify an environment definition?

Analytics Engine Powered by Apache Spark

Will spin up lightweight, dedicated Apache Spark clusters to run a wide range of workloads on your IBM Cloud Pak for Data cluster

DataStage Data Virtualization Watson Studio Watson Knowledge Catalog.

Describe the ways to access remote data from Cloud Pak for Data

Data Source Services - Included CP4D

Data Virtualization, Db2Event Store, Db2 Warehouse, Db2 for z/OS Connector, PostgreSQL

MongoDB

Flexible and scalable database that is designed to store, query, and index documents.

Data Governance Services Included

Regulatory Accelerator, Watson Knowledge Catalog

TLS re-encryption for improved security TLS passthrough for improved security Multiple weighted backends (split traffic) Generated pattern-based hostnames Wildcard domains

Advantages of OpenShift Router vs. Kubernetes Ingress

Dashboard service - Included in CP4D

Analytics Dashboards

Container Runtime Interface

CRI

Watson Machine Learning

Deploy machine-learning models into production at scale. Build analytical models and neural networks, trained with your own data and deploy them for use in apps.

Terraform

Open-source infrastructure-as-code tool. Users define and provision data center infrastructure, using the declarative configuration language HCL, or optionally JSON. It is used to install RHOS on AWS.

What are the global search functions?

Filter, Sort, Search

Governance Roles and goals: Role: Data Scientist

Governance Roles and goals: Goal: Find data assets in a catalog

Governance Roles and goals: Role: Administrator

Governance Roles and goals: Goal: Set up the first catalog

Analytics Dashboards

Identify patterns in your data with sophisticated visualizations. No coding needed.

Loading data into the database.

Name the Integrated database console tasks for IBM Db2 Warehouse.

PostgreSQL

Open source SQL database that is extensible, ACID compliant, and supports HA and backup and restore functions.

Ansible

Open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code -- and is used by Red Hat OpenShift for operating system upgrades

Streams Db2 Event Store Spark Streaming

Systems that can be used for real time analytics

What is the command line for scaling wml?

./cpd-operating system \scale wml

A Red Hat OpenShift project can also be referred to as a namespace

A Red Hat OpenShift project can also be referred to as a ______________

IBM Db2 Event Store

A data store designed to rapidly ingest and analyze streamed data for event-driven apps. Capable of ingesting at rates of hundreds of billions of events per day and can analyze the ingested data immediately for real-time insights.

Installation Component - Repository configuration

A server definition YAML file that you download to your Linux or Mac OS client workstation and then customize. It specifies: · URLs and credentials for the file server to download Helm charts from. · URLs and credentials for the registry server to download images from. The default server definition file is repo.yaml.

Pod State - Succeeded

All the containers in the pod stopped successfully and will not be resarted.

Pod State - Failed

All the containers in the pod stopped, but at least one pod was stopped with an error or was stopped by kubernetes.

Analytics Zoo for Apache Spark

An analytics and AI platform that unifies TensorFlow, Keras, and BigDL for distributed Apache Spark environments.

An analytics project requires Git Integration to connect to an external Git repo

An analytics project requires ___________ to connect to an external Git repo

Watson Knowledge Catalog InstaScan (formerly known as IBM StoredIQ InstaScan)

An intelligent file analysis tool that leverages automation and statistical sampling models to quickly identify risk hot spots in unstructured cloud data. The tool helps accelerate regulatory compliance and data governance as part of a DataOps practice by providing unique capabilities, such as: · Risk assessment and remediation recommendations · Automatic application of classification labels to files in Box · Audit-ready compliance checks

What is Apache Zookeeper?

An open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

Analytics Services Included in CP4D

Analytics Engine powered by Spark, Data Refinery, Streams

Analytics Services not Included CP4D

Analytics Zoo for Spark, CA, Datameer, DO, Execution Engine for Hadoop, Figure Eight, Intel Python, Intel Deep learning Reference Stack, Operational Analytics for ERP, SPSS Modeler, WAND Foundation

Frequently Used Cleanse Organize Natural Language (Frequently Clean Organized Nature)

Data Refinery supports the following categories of GUI operations.

Customer Attrition Prediction Customer Life Event Prediction Customer Offer Affinity Customer Segmentation Intelligent Maintenance Prediction Streaming Analytics for Customer Life Event Prediction

Data Science assets are included in which Industry Accelerators?

Describe the benefits of Data Virtualization

By creating connections to your data sources, you can quickly view across your organization's data. This virtual data platform enables real-time analytics without moving data, duplication, ETLs, and additional storage requirements, so processing times are greatly accelerated. This brings real-time insightful results to decision-making applications or analysts more quickly and dependably than existing methods.

Project Assets that exist with WKC only

Data assets, Connections, Connected data, Folder assets, Data Refinery flows

CP4D provides an encrypted bearer token in the model deployment details that an application developer can use for evaluating models online with Rest APIs

CP4D provides an __________ in the model deployment details that an application developer can use for ________ with ______.

What data sources are supported in Cognos Dashboards on CP4D?

CSV Files Data in tables in Db2 Warehouse Db2 Warehouse on Cloud PostgreSQL Netezza SQL Server Data Virtualization assets

Project Assets that exist with Watson Studio

Data assets, Connections, Connected data, Folder assets, Jupyter notebooks, Modeler flows, Models, Experiments, Data Refinery flows

Data Source Services - NOT Included in CPD

CockroachDB, IBM® Db2® Advanced Edition, MongoDB

Change Apply stage

Combine the changes from the Change Capture stage with the original before data set to reproduce the after-data set.

Drift Detection Monitor

Compares degree of change in WML model accuracy to accuracy at training time -Drift tab, can also be done using Jupyter Notebook/scikit-learn

What are the Industry Accelerators?

Contact Center Optimization Credit Card Fraud Customer 360 Degree View Customer Attrition Prediction Customer Life Event Prediction Customer Offer Affinity Customer Segmentation Intelligent Maintenance Prediction Load Default Analysis Streaming Analytics for Customer Life Event Prediction

SPSS Modeler

Create flows to prepare and blend data, build and manage models, and visualize the results. Build machine learning pipelines for rapid iteration during the model building process.

Projects Connections Table definitions Jobs Parameter sets (People Can't Tease Joe Pesci)

DataStage Edition features the following tabs, which you use for quick access to essential actions:

Data Governance Services - NOT Included in CPD

DataStage® Edition, Senzing

Scoring model can be built using SPSS Modeler and deployed with WML, then in Streams you can use the SPSS Operator to apply the built model with the real-time data (with correct data format to match what the model expects)

Describe the process for scoring a model on streaming data using Streams

Streams

Develop and run apps that process in-flight data. Enables continuous and fast analysis of massive volumes of moving data to help improve the speed of business insight and decision making.

DataStage Edition

Effortlessly deliver data at the right time to the right place with integration, transformation, and delivery of data in batch and real time.

Execution Engine for Apache Hadoop

Explore data or build and deploy models on your Apache Hadoop cluster.

Industry solutions services - NOT Included in CP4D

Financial Crimes Insight®, Financial Services Workbench, Prolifics Customer Prospecting Accelerator

Watson Discovery

Find answers and uncover insights in your complex business content.

Watson Knowledge Catalog

Find the right data fast. Discover relevant, curated data assets using intelligent recommendations and user reviews. Provides a secure enterprise catalog management platform that is supported by a data governance framework. A catalog connects data and knowledge with the people who need to use it. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.

DevOps

Focuses on continuous integration and continuous delivery of software by leveraging on-demand IT resources (infrastructure as code) and by automating integration, test and deployment of code.

If you are using Portworx, the OpenShift cluster must include CRI-O

If you are using Portworx, the OpenShift cluster must include ________

If you plan to use SSL for a remote Db2 database or Db2 Warehouse on Cloud connection, select the Use SSL when you create the connection to the data source

If you plan to use SSL for a remote Db2 database or Db2 Warehouse on Cloud connection, select the ____________ when you create the connection to the data source

Governance Roles and goals: Role: Developer

Governance Roles and goals: Goal: View Watson Knowledge Catalog APIs

Governance Roles and goals: Role: Data Steward

Governance Roles and goals: Goals: Curate data Create governance artifacts

Governance Roles and goals: Role: Data Quality Analyst

Governance Roles and goals: Goals: Curate data Create governance artifacts Analyze data quality

Governance Roles and goals: Role: Business Analyst

Governance Roles and goals: Goals: Find data assets in a catalog View information assets

Avro CSV JSON Parquet (All Communists Just Puked)

If you select a file in a connection as the target for your Data Refinery flow output, you can select one of the following formats for that file:

In order to run Auto-AI experiments successfully, it is required that the processor supports the AVX2 instruction set. Otherwise the Auto-AI experiment run will fail.

In order to run Auto-AI experiments successfully, it is required that the processor supports the _____________. Otherwise the Auto-AI experiment run will fail.

Describe Software resources

Includes Python, R, or Scala coding languages, a set of pre-installed libraries, and optional libraries or packages that you can specify.

· Sample data sets and schemas · Jupyter notebooks that you can use to cleanse and prepare the data, run machine learning algorithms, and score the resulting models · Sample dashboards to display the results interactively · API endpoints that you can call from other applications

Industry Accelerators include what types of assets?

Watson OpenScale

Infuse your AI with trust and transparency. Understand how your AI models make decisions to detect and mitigate bias.

It is recommended that you disable TLS 1.0 and TLS 1.1 from Red Hat® OpenShift® Container Platform HAProxy routers on port 443

It is recommended that you disable _________ and ________from Red Hat® OpenShift® Container Platform HAProxy routers on port 443

Describe Hardware resources

It's the type of compute engine, for example, Spark, and the amount of processing power.

Developer tool services - Included

Jupyter Notebooks with Python 3.6 for GPU, Jupyter Notebooks with Python 3.6, Open Source Management, R Studio Server with R 3.6

Quality Monitor (Accuracy Monitor)

Monitors how well model predicts outcomes -Quality tab, provide quality alert threshold &min/max sample size

Pod State - Pending

Kubernetes cluster is creating the container images that are included in the pod

-Create tables -Monitor To load data into the database, you can use: -Batch insert APIs -a notebook -JDBC -CSV files

Name the Integrated database console tasks for IBM Db2 Event Store

AI Analytics Dashboards Data Governance Data Sources Developer Tools Industry Solutions Storage

Name the eight Integrated data and AI services

Redact Obfuscate Substitute Data

Name the three types of data masking / data rules

Quick Scan Automated Discovery

Name the two types of data discovery in WKC

Data Protection Rule - Masking or denying access to sensitive data Governance Rule - Define compliance related criteria

Name the two types of rules in WKC

Manage user access Determine connection information

Name two Integrated database console tasks for PostgreSQL

Storage Services Available - Not Included in CP4D

NetApp ONTAPP Portworx

One HTTPS port is exposed as the primary access point for the web client and for API requests

One __________ is exposed as the primary access point for the web client and for API requests

Grafana

Open source analytics and interactive visualization web application, providing charts, graphs, and alerts for the web when connected to supported data sources, is used as a dashboard and visualization tool in monitoring the system

Pod State - Running

Pod is deployed on a node. At least one container is running or is in the process of starting or restarting

View analytics assets Comment on analytic assets Run or schedule analytic assets Share notebooks Edit the project readme Add and read data assets Manage environment definitions Start or stop environment runtimes Export a project to desktop Remove Data Assets Manage project collaborators Set up integrations Export a project to GIT

Project Permissions: Administrator

View analytics assets Comment on analytic assets Run or schedule analytic assets Share notebooks Edit the project readme Add and read data assets Manage environment definitions Start or stop environment runtimes Export a project to desktop

Project Permissions: Editor

View analytics assets Comment on analytic assets

Project Permissions: Viewer

Automated Discovery in WKC

Provides detailed analysis results of all assets from data sources. Metadata and analysis results are automatically imported into the default catalog. Also, a broader set of analysis results is available for viewing in a workspace, including data quality score, automatically assigned data classes and business terms, data types, formats, frequency distributions, and more. Suitable for smaller numbers of tables and files from data sources, or from subsets (schemas or file paths) of data sources. Use when you already have a general overview of the quality and business content of your data, and you want to see and review the additional details.

Data Virtualization

Query many data sources as one. Real-time analytics without moving data, duplication, ETLs, or additional storage requirements.

Senzing

Real-time AI for entity resolution that scales with your data. Exploit the power of your data with minimal data preparation and transformation. Discover the people and places at play in your data.

IBM® Db2® Advanced Edition

Relational database that delivers advanced data management and analytics capabilities for transactional workloads.

Set Priority Level (Unix), Streams Processing Language

SPL

IBM Guardium

Safeguards your sensitive information by auditing what is happening in your sensitive-data environments, such as your databases, data warehouses, file systems, or Big Data environments.

Fairness Monitor

Scan WML deployments for biases to ensure fair outcomes -Fairness tab, provide values for favorable outcome for model, alert threshold, sample size

DataOps

Seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value.

Set target usage to control how many virtual cores each deployment can use per month.

Set _________ to control how many virtual cores each deployment can use per month.

Data Refinery

Simplify the process of preparing large amounts of raw data for analysis. Reduces the amount of time it takes to prepare data. Transform large amounts of raw data into consumable, quality data that's ready for analysis.

Microsoft Visual Studio Code Streams Studio Atom

Streams Processing Language (SPL). SPL applications can use the full set of Streams capabilities. You can use any of these three as your editor.

Streams Studio Visual Studio Code Atom Notebooks with Python Streams API

Streams can be devleoped with these four external tools

Event Streams is Apache Kafka based mostly for delivering messages(publish/subscribe with topics) with simple handling of data. Streams is IBM heavyweight real-time analytics applications and can handle huge volume and variety of data and analytics. Db2 Event store is an in-memory database for ingestion of real-time data/event-driven applications. Both Kafka and Streams can store data into Db2 Event Store.

Summarize the difference between Event Streams, Db2 Event Store and IBM Streams.

True

T/F - RStudio still appears in the Services Catalog, even though it is not supported for Decision Optimization

False

T/F - Red Hat OpenShift includes a provisioner plug-in to create an NFS storage class

False

T/F - The kubelet manages containers which weren't created by Kubernetes

True

T/F - You can back up all persistent volumes (PVs) in the IBM® Cloud Pak for Data control plane that are stored in Portworx.

Transport Layer Security

TLS

Watson Nat Lang Understanding

Take your understanding of unstructured data to a new level by extracting entities, keywords, sentiments, and more.

Compute resources can be defined by

The default environment definitions that are included with Watson Studio or Custom environment definitions that you create.

Data Classes

These describe the type of data contained in data assets, such as data fields or table columns, for example, city, account number, or credit card number. Watson Knowledge Catalog provides a predefined set.

NetApp Trident

Third-party CP4D service provides storage orchestration for containers and Kubernetes, integrating natively within the Kubernetes/OpenShift level and its Persistent Volume framework, to provision and manage persistent volumes

Data Lake Workspace

This workspace is optimized for quick analysis and data quality assessment of a large number of assets. The analysis runs on a sample.

In Depth Analysis Workspace

This workspace is optimized to run an in-depth analysis of a small number of assets. The analysis runs on all data.

PII Workspace

This workspace is optimized to search for personally identifiable information (PII). The analysis runs on a sample and skips non-PII data classes.

Regular Users System Users Service Accounts

Three User Types in OpenShift Container Platform

Online Batch Virtual

Three deployment options for WML

To dynamically provision NFS storage, use the Kubernetes NFS-Client Provisioner, which is available in the Kubernetes Incubator repository on GitHub

To dynamically provision NFS storage, use the _____________, which is available in the _________________ repository on GitHub

To protect against DDoS attacks, use an elastic load balancer that accepts only full HTTP Connections

To protect against DDoS attacks, use an _____________ that accepts only full _____________

Figure Eight

Transform text, images, audio, and videos into annotated training data to fuel your machine learning initiatives.

Datameer

Unlock the value in your raw data. Bring data from across your enterprise into a single, unified view so that everyone can blend, prepare, and explore the data to uncover hidden answers.

Data classes:

Used to categorize columns in relational data sets according to the type of the data and how the data is used. One of these is assigned to each column during profiling within a catalog. Catalog collaborators can change what is assigned to a column. Users with the manage discovery permission can assign these to data set columns before adding the data to a catalog.

Business terms:

Used to define business concepts in a standard way for your enterprise. Catalog collaborators can assign one or more to data assets and columns within relational data sets to describe the data. Users with the manage discovery permission can add these to data sets before adding them to a catalog.

Reference data sets:

Used to define values for specific types of columns. You can include this in the definition of a data class as part of the data matching criteria. During data profiling, if the values in a column match this and other criteria, that data class is assigned to the column. You can also use these in data quality analysis.

Policies:

Used to describe how to govern data in catalogs. You can include data protection rules to control access to data. You can also include governance rules to describe data.

Data protection rules:

Used to identify the data to control and to specify the method of control. You can include classifications, data classes, business terms, or tags to identify the data to control. You can choose to deny access to data or to mask sensitive data values.

Governance rules:

Used to provide a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives.

AI Not Inluded in CP4D

Watson Assistant, Watson Discovery, Watson Knowledge Studio, Watson NLU, Speech to Text, Text to Speech

Name the three Watson services that are available on CP4D outside the catalog

Watson Language Translator Watson Assistant for Voice Interaction Watson Machine Learning Accelerator

Ai Included in CP4D

Watson Machine Learning, Watson Openscale, Watson Studio

Exploring the tables in the database Loading data the database

What Integrated database console tasks are available for IBM Db2® Advanced Edition

a. Updating the DNS service name b. Securing communication ports c. Setting up the Cloud Pak for Data web client

What are the CP4D post-installation tasks?

a. Setting up your registry server b. Obtaining the installation files c. Setting up Portworx storage d. Preparing for air-gapped installations e. Setting up your Cloud Pak for Data environment

What are the CP4D pre-installation tasks?

Bitbucket GitHub GitLab Microsoft Team Foundation Server

What are the four supported GIT Repositories?

a. Define the project goals b. Prepare the data c. Choose the tool d. Train your model e. Deploy your model (Goals, Data, Tool, Train, Deploy) - Go Down To The DMV

What are the stages/steps of creating an ML model

Author Approve Review Publish

What are the steps in a single approval workflow?

Analytics Regulatory Transformation

What are the three project types

Oject Spec - Describes characteristics in its desired state Object Status - Describes the current state of the object

What are the two core object types used by Kubernetes?

Disk latency test - dd if=/dev/zero of=/path-to-installation-directory/testfile bs=512 count=1000 oflag=dsync Disk throughput test - dd if=/dev/zero of=/path-to-installation-directory/testfile bs=1G count=1 oflag=dsync

What two tests should be run to ensure the storage partition has good disk I/O performance?

-Default Workspace -PII Workspace -Data Lake Workspace -In Depth Analysis Workspace

What workspaces are available by default?

Readme file in the data science project

Where does a user find the instructions for using a regulatory accelerator data science asset?

Deployment Space

Where you will save the (WML) model, create the deployment, and find the information you need to score the model and get a prediction back, or embed the deployment in an app so you can interact with it programmatically.

Security Context Constraint - API that allows administrators to control permissions for pods. To examine a particular SCC, use oc get, oc describe, oc export, or oc edit.

Which is an element of OpenShift security architecture?

Analytics Engine Powered by Apache Spark Execution Engine for Apache Hadoop SPSS® Modeler Watson Knowledge Catalog Watson Machine Learning Watson Studio

Which services available in CP4D are scalable?

You can use SSL to encrypt communications to and from CP4D

You can use _______ to encrypt communications to and from CP4D

Classifications:

You use these to describe the sensitivity of the data in data assets. Each data asset has one. Catalog collaborators assign this when they add data assets to a governed catalog. You can also assign to governance artifacts, such as business terms, data classes, reference data, policies, and governance rules.

You can filter search results by these properties:

· Type of asset · Tags · Stewards or owners · Modifiers · A range of dates between which the asset was modified


Ensembles d'études connexes

ABA Task FK-11 (define and provide examples of: environment, stimulus and stimulus class

View Set

Julius Caesar Acts 4-5, Julius Caesar: Act 4-5, Julius Caesar: Act 5, Scene 5, Julius Caesar: Act 5, Scenes 3-4, Julius Caesar: Act 5, Scenes 1-2, Julius Caesar: Act 4, Scene 3, Act 4, Act I Scene 3 & Act II Scene 1, Julius Caesar - Act I Scenes 1 &...

View Set

Small Business Chapter II- Starting a Small Business

View Set

Political Science 101: Test 1, Chapters 1 - 5 Quizzes, Political Science Test #1- Dr. Brown, POLI 101,

View Set

Ch. 17 - Real Estate Syndicates and Real Estate Investment Trusts

View Set

physics chapter 5 conceptual questions

View Set

Entrepreneurship Test Study Guide

View Set

Absolute Purchasing Power Parity and Real Exchange Rates

View Set