CP4D Study Guide
Developer tool services - NOT Included in CPD
Lightbend Platform
Open Container Intiative
OCI
Pod State - Unknown
State of the pod cannot be determined, typically because of a communication issue.
Regulatory Accelerator
Streamline the process of complying with regulations.
Watson Knowledge Studio
Teach Watson the language of your domain. Create a machine learning model that understands the linguistic nuances, meaning, and relationships specific to your industry or to create a rule-based model that finds entities in documents based on rules that you define.
The installation of CP4D will provide a URL to access the web client
The installation of CP4D will provide a _________ to access the web client
UG DefaultWorkspace
This is the default workspace with the default settings.
Reports Dashboards Stories
Three visualiztion types in Cognos Analytics
To ensure that your data in CP4D is stored securely, you can encrypt your storage partition
To ensure that your data in CP4D is stored securely, you can encrypt your_____________
Watson Studio
Unleash the power of your data. Build custom models and infuse your business with AI and machine learning.
Execution Engine for Apache Hadoop
Used to integrate the Watson Studio service with the distributed computing power of a Hadoop cluster, allowing you to analyze data in place and use packages and libraries from Watson Studio without installing them on your Hadoop cluster
You can also sort search results by these properties:
· Name · Type · Tags · Stewards or owners · The user who last modified the asset · The last modified date
What properties can be searched?
· Tags · Name · Description · Column names
A runbook entry is a necessary backstop for any highly-available service and should be created for reconstructing the master
A _______________ is a necessary backstop for any highly-available service and should be created for _______________
A self-signed TLS Certficate is included in the CP4D installation, which can be used to enable HTTPS connections. By default this is untrusted by all HTTPS clients.
A __________is included in the CP4D installation, which can be used to enable HTTPS connections. By default this is _______ by all HTTPS clients.
IBM Db2 Warehouse
A high-performing analytics engine that combines in-memory processing with integrated database analytics.
Installation Component - Assembly
A particular Cloud Pak for Data service (and its dependent assemblies) to deploy to a project in Red Hat OpenShift. For Cloud Pak for Data control plane and services, these are stored on a public IBM file server, and do not have to be installed at the same time.
StoredIQ
A platform for IBM's information life cycle governance, big data governance, and enterprise content management technologies, and is used to govern unstructured data sources
Change Capture stage
A processing stage that compares two data sets and makes a record of the differences.
OpenShift Container Registry (OCR)
Adds the ability to automatically provision new image repositories on demand. This provides users with a built-in location for their application builds to push the resulting images.
Kafka
Open source streaming platform software that provides a publish and a subscribe messaging transport protocol using topics.
Watson AIOps
Uses artificial intelligence to simplify IT operations management and accelerate and automate problem resolution in complex modern IT environments
Minikube
Which tool can you use to run Kubernetes locally as a single-node cluster?
kubelet kube-proxy
Which two are required to run on each Kubernetes node?
Isolating User Access Isolating Applications or Jobs Maintaining a development and a production environment
Why might you want to provision more than one instance of a service?
To specify the hardware and software configurations for the environment runtimes.
Why specify an environment definition?
Analytics Engine Powered by Apache Spark
Will spin up lightweight, dedicated Apache Spark clusters to run a wide range of workloads on your IBM Cloud Pak for Data cluster
DataStage Data Virtualization Watson Studio Watson Knowledge Catalog.
Describe the ways to access remote data from Cloud Pak for Data
Data Source Services - Included CP4D
Data Virtualization, Db2Event Store, Db2 Warehouse, Db2 for z/OS Connector, PostgreSQL
MongoDB
Flexible and scalable database that is designed to store, query, and index documents.
Data Governance Services Included
Regulatory Accelerator, Watson Knowledge Catalog
TLS re-encryption for improved security TLS passthrough for improved security Multiple weighted backends (split traffic) Generated pattern-based hostnames Wildcard domains
Advantages of OpenShift Router vs. Kubernetes Ingress
Dashboard service - Included in CP4D
Analytics Dashboards
Container Runtime Interface
CRI
Watson Machine Learning
Deploy machine-learning models into production at scale. Build analytical models and neural networks, trained with your own data and deploy them for use in apps.
Terraform
Open-source infrastructure-as-code tool. Users define and provision data center infrastructure, using the declarative configuration language HCL, or optionally JSON. It is used to install RHOS on AWS.
What are the global search functions?
Filter, Sort, Search
Governance Roles and goals: Role: Data Scientist
Governance Roles and goals: Goal: Find data assets in a catalog
Governance Roles and goals: Role: Administrator
Governance Roles and goals: Goal: Set up the first catalog
Analytics Dashboards
Identify patterns in your data with sophisticated visualizations. No coding needed.
Loading data into the database.
Name the Integrated database console tasks for IBM Db2 Warehouse.
PostgreSQL
Open source SQL database that is extensible, ACID compliant, and supports HA and backup and restore functions.
Ansible
Open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code -- and is used by Red Hat OpenShift for operating system upgrades
Streams Db2 Event Store Spark Streaming
Systems that can be used for real time analytics
What is the command line for scaling wml?
./cpd-operating system \scale wml
A Red Hat OpenShift project can also be referred to as a namespace
A Red Hat OpenShift project can also be referred to as a ______________
IBM Db2 Event Store
A data store designed to rapidly ingest and analyze streamed data for event-driven apps. Capable of ingesting at rates of hundreds of billions of events per day and can analyze the ingested data immediately for real-time insights.
Installation Component - Repository configuration
A server definition YAML file that you download to your Linux or Mac OS client workstation and then customize. It specifies: · URLs and credentials for the file server to download Helm charts from. · URLs and credentials for the registry server to download images from. The default server definition file is repo.yaml.
Pod State - Succeeded
All the containers in the pod stopped successfully and will not be resarted.
Pod State - Failed
All the containers in the pod stopped, but at least one pod was stopped with an error or was stopped by kubernetes.
Analytics Zoo for Apache Spark
An analytics and AI platform that unifies TensorFlow, Keras, and BigDL for distributed Apache Spark environments.
An analytics project requires Git Integration to connect to an external Git repo
An analytics project requires ___________ to connect to an external Git repo
Watson Knowledge Catalog InstaScan (formerly known as IBM StoredIQ InstaScan)
An intelligent file analysis tool that leverages automation and statistical sampling models to quickly identify risk hot spots in unstructured cloud data. The tool helps accelerate regulatory compliance and data governance as part of a DataOps practice by providing unique capabilities, such as: · Risk assessment and remediation recommendations · Automatic application of classification labels to files in Box · Audit-ready compliance checks
What is Apache Zookeeper?
An open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.
Analytics Services Included in CP4D
Analytics Engine powered by Spark, Data Refinery, Streams
Analytics Services not Included CP4D
Analytics Zoo for Spark, CA, Datameer, DO, Execution Engine for Hadoop, Figure Eight, Intel Python, Intel Deep learning Reference Stack, Operational Analytics for ERP, SPSS Modeler, WAND Foundation
Frequently Used Cleanse Organize Natural Language (Frequently Clean Organized Nature)
Data Refinery supports the following categories of GUI operations.
Customer Attrition Prediction Customer Life Event Prediction Customer Offer Affinity Customer Segmentation Intelligent Maintenance Prediction Streaming Analytics for Customer Life Event Prediction
Data Science assets are included in which Industry Accelerators?
Describe the benefits of Data Virtualization
By creating connections to your data sources, you can quickly view across your organization's data. This virtual data platform enables real-time analytics without moving data, duplication, ETLs, and additional storage requirements, so processing times are greatly accelerated. This brings real-time insightful results to decision-making applications or analysts more quickly and dependably than existing methods.
Project Assets that exist with WKC only
Data assets, Connections, Connected data, Folder assets, Data Refinery flows
CP4D provides an encrypted bearer token in the model deployment details that an application developer can use for evaluating models online with Rest APIs
CP4D provides an __________ in the model deployment details that an application developer can use for ________ with ______.
What data sources are supported in Cognos Dashboards on CP4D?
CSV Files Data in tables in Db2 Warehouse Db2 Warehouse on Cloud PostgreSQL Netezza SQL Server Data Virtualization assets
Project Assets that exist with Watson Studio
Data assets, Connections, Connected data, Folder assets, Jupyter notebooks, Modeler flows, Models, Experiments, Data Refinery flows
Data Source Services - NOT Included in CPD
CockroachDB, IBM® Db2® Advanced Edition, MongoDB
Change Apply stage
Combine the changes from the Change Capture stage with the original before data set to reproduce the after-data set.
Drift Detection Monitor
Compares degree of change in WML model accuracy to accuracy at training time -Drift tab, can also be done using Jupyter Notebook/scikit-learn
What are the Industry Accelerators?
Contact Center Optimization Credit Card Fraud Customer 360 Degree View Customer Attrition Prediction Customer Life Event Prediction Customer Offer Affinity Customer Segmentation Intelligent Maintenance Prediction Load Default Analysis Streaming Analytics for Customer Life Event Prediction
SPSS Modeler
Create flows to prepare and blend data, build and manage models, and visualize the results. Build machine learning pipelines for rapid iteration during the model building process.
Projects Connections Table definitions Jobs Parameter sets (People Can't Tease Joe Pesci)
DataStage Edition features the following tabs, which you use for quick access to essential actions:
Data Governance Services - NOT Included in CPD
DataStage® Edition, Senzing
Scoring model can be built using SPSS Modeler and deployed with WML, then in Streams you can use the SPSS Operator to apply the built model with the real-time data (with correct data format to match what the model expects)
Describe the process for scoring a model on streaming data using Streams
Streams
Develop and run apps that process in-flight data. Enables continuous and fast analysis of massive volumes of moving data to help improve the speed of business insight and decision making.
DataStage Edition
Effortlessly deliver data at the right time to the right place with integration, transformation, and delivery of data in batch and real time.
Execution Engine for Apache Hadoop
Explore data or build and deploy models on your Apache Hadoop cluster.
Industry solutions services - NOT Included in CP4D
Financial Crimes Insight®, Financial Services Workbench, Prolifics Customer Prospecting Accelerator
Watson Discovery
Find answers and uncover insights in your complex business content.
Watson Knowledge Catalog
Find the right data fast. Discover relevant, curated data assets using intelligent recommendations and user reviews. Provides a secure enterprise catalog management platform that is supported by a data governance framework. A catalog connects data and knowledge with the people who need to use it. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.
DevOps
Focuses on continuous integration and continuous delivery of software by leveraging on-demand IT resources (infrastructure as code) and by automating integration, test and deployment of code.
If you are using Portworx, the OpenShift cluster must include CRI-O
If you are using Portworx, the OpenShift cluster must include ________
If you plan to use SSL for a remote Db2 database or Db2 Warehouse on Cloud connection, select the Use SSL when you create the connection to the data source
If you plan to use SSL for a remote Db2 database or Db2 Warehouse on Cloud connection, select the ____________ when you create the connection to the data source
Governance Roles and goals: Role: Developer
Governance Roles and goals: Goal: View Watson Knowledge Catalog APIs
Governance Roles and goals: Role: Data Steward
Governance Roles and goals: Goals: Curate data Create governance artifacts
Governance Roles and goals: Role: Data Quality Analyst
Governance Roles and goals: Goals: Curate data Create governance artifacts Analyze data quality
Governance Roles and goals: Role: Business Analyst
Governance Roles and goals: Goals: Find data assets in a catalog View information assets
Avro CSV JSON Parquet (All Communists Just Puked)
If you select a file in a connection as the target for your Data Refinery flow output, you can select one of the following formats for that file:
In order to run Auto-AI experiments successfully, it is required that the processor supports the AVX2 instruction set. Otherwise the Auto-AI experiment run will fail.
In order to run Auto-AI experiments successfully, it is required that the processor supports the _____________. Otherwise the Auto-AI experiment run will fail.
Describe Software resources
Includes Python, R, or Scala coding languages, a set of pre-installed libraries, and optional libraries or packages that you can specify.
· Sample data sets and schemas · Jupyter notebooks that you can use to cleanse and prepare the data, run machine learning algorithms, and score the resulting models · Sample dashboards to display the results interactively · API endpoints that you can call from other applications
Industry Accelerators include what types of assets?
Watson OpenScale
Infuse your AI with trust and transparency. Understand how your AI models make decisions to detect and mitigate bias.
It is recommended that you disable TLS 1.0 and TLS 1.1 from Red Hat® OpenShift® Container Platform HAProxy routers on port 443
It is recommended that you disable _________ and ________from Red Hat® OpenShift® Container Platform HAProxy routers on port 443
Describe Hardware resources
It's the type of compute engine, for example, Spark, and the amount of processing power.
Developer tool services - Included
Jupyter Notebooks with Python 3.6 for GPU, Jupyter Notebooks with Python 3.6, Open Source Management, R Studio Server with R 3.6
Quality Monitor (Accuracy Monitor)
Monitors how well model predicts outcomes -Quality tab, provide quality alert threshold &min/max sample size
Pod State - Pending
Kubernetes cluster is creating the container images that are included in the pod
-Create tables -Monitor To load data into the database, you can use: -Batch insert APIs -a notebook -JDBC -CSV files
Name the Integrated database console tasks for IBM Db2 Event Store
AI Analytics Dashboards Data Governance Data Sources Developer Tools Industry Solutions Storage
Name the eight Integrated data and AI services
Redact Obfuscate Substitute Data
Name the three types of data masking / data rules
Quick Scan Automated Discovery
Name the two types of data discovery in WKC
Data Protection Rule - Masking or denying access to sensitive data Governance Rule - Define compliance related criteria
Name the two types of rules in WKC
Manage user access Determine connection information
Name two Integrated database console tasks for PostgreSQL
Storage Services Available - Not Included in CP4D
NetApp ONTAPP Portworx
One HTTPS port is exposed as the primary access point for the web client and for API requests
One __________ is exposed as the primary access point for the web client and for API requests
Grafana
Open source analytics and interactive visualization web application, providing charts, graphs, and alerts for the web when connected to supported data sources, is used as a dashboard and visualization tool in monitoring the system
Pod State - Running
Pod is deployed on a node. At least one container is running or is in the process of starting or restarting
View analytics assets Comment on analytic assets Run or schedule analytic assets Share notebooks Edit the project readme Add and read data assets Manage environment definitions Start or stop environment runtimes Export a project to desktop Remove Data Assets Manage project collaborators Set up integrations Export a project to GIT
Project Permissions: Administrator
View analytics assets Comment on analytic assets Run or schedule analytic assets Share notebooks Edit the project readme Add and read data assets Manage environment definitions Start or stop environment runtimes Export a project to desktop
Project Permissions: Editor
View analytics assets Comment on analytic assets
Project Permissions: Viewer
Automated Discovery in WKC
Provides detailed analysis results of all assets from data sources. Metadata and analysis results are automatically imported into the default catalog. Also, a broader set of analysis results is available for viewing in a workspace, including data quality score, automatically assigned data classes and business terms, data types, formats, frequency distributions, and more. Suitable for smaller numbers of tables and files from data sources, or from subsets (schemas or file paths) of data sources. Use when you already have a general overview of the quality and business content of your data, and you want to see and review the additional details.
Data Virtualization
Query many data sources as one. Real-time analytics without moving data, duplication, ETLs, or additional storage requirements.
Senzing
Real-time AI for entity resolution that scales with your data. Exploit the power of your data with minimal data preparation and transformation. Discover the people and places at play in your data.
IBM® Db2® Advanced Edition
Relational database that delivers advanced data management and analytics capabilities for transactional workloads.
Set Priority Level (Unix), Streams Processing Language
SPL
IBM Guardium
Safeguards your sensitive information by auditing what is happening in your sensitive-data environments, such as your databases, data warehouses, file systems, or Big Data environments.
Fairness Monitor
Scan WML deployments for biases to ensure fair outcomes -Fairness tab, provide values for favorable outcome for model, alert threshold, sample size
DataOps
Seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value.
Set target usage to control how many virtual cores each deployment can use per month.
Set _________ to control how many virtual cores each deployment can use per month.
Data Refinery
Simplify the process of preparing large amounts of raw data for analysis. Reduces the amount of time it takes to prepare data. Transform large amounts of raw data into consumable, quality data that's ready for analysis.
Microsoft Visual Studio Code Streams Studio Atom
Streams Processing Language (SPL). SPL applications can use the full set of Streams capabilities. You can use any of these three as your editor.
Streams Studio Visual Studio Code Atom Notebooks with Python Streams API
Streams can be devleoped with these four external tools
Event Streams is Apache Kafka based mostly for delivering messages(publish/subscribe with topics) with simple handling of data. Streams is IBM heavyweight real-time analytics applications and can handle huge volume and variety of data and analytics. Db2 Event store is an in-memory database for ingestion of real-time data/event-driven applications. Both Kafka and Streams can store data into Db2 Event Store.
Summarize the difference between Event Streams, Db2 Event Store and IBM Streams.
True
T/F - RStudio still appears in the Services Catalog, even though it is not supported for Decision Optimization
False
T/F - Red Hat OpenShift includes a provisioner plug-in to create an NFS storage class
False
T/F - The kubelet manages containers which weren't created by Kubernetes
True
T/F - You can back up all persistent volumes (PVs) in the IBM® Cloud Pak for Data control plane that are stored in Portworx.
Transport Layer Security
TLS
Watson Nat Lang Understanding
Take your understanding of unstructured data to a new level by extracting entities, keywords, sentiments, and more.
Compute resources can be defined by
The default environment definitions that are included with Watson Studio or Custom environment definitions that you create.
Data Classes
These describe the type of data contained in data assets, such as data fields or table columns, for example, city, account number, or credit card number. Watson Knowledge Catalog provides a predefined set.
NetApp Trident
Third-party CP4D service provides storage orchestration for containers and Kubernetes, integrating natively within the Kubernetes/OpenShift level and its Persistent Volume framework, to provision and manage persistent volumes
Data Lake Workspace
This workspace is optimized for quick analysis and data quality assessment of a large number of assets. The analysis runs on a sample.
In Depth Analysis Workspace
This workspace is optimized to run an in-depth analysis of a small number of assets. The analysis runs on all data.
PII Workspace
This workspace is optimized to search for personally identifiable information (PII). The analysis runs on a sample and skips non-PII data classes.
Regular Users System Users Service Accounts
Three User Types in OpenShift Container Platform
Online Batch Virtual
Three deployment options for WML
To dynamically provision NFS storage, use the Kubernetes NFS-Client Provisioner, which is available in the Kubernetes Incubator repository on GitHub
To dynamically provision NFS storage, use the _____________, which is available in the _________________ repository on GitHub
To protect against DDoS attacks, use an elastic load balancer that accepts only full HTTP Connections
To protect against DDoS attacks, use an _____________ that accepts only full _____________
Figure Eight
Transform text, images, audio, and videos into annotated training data to fuel your machine learning initiatives.
Datameer
Unlock the value in your raw data. Bring data from across your enterprise into a single, unified view so that everyone can blend, prepare, and explore the data to uncover hidden answers.
Data classes:
Used to categorize columns in relational data sets according to the type of the data and how the data is used. One of these is assigned to each column during profiling within a catalog. Catalog collaborators can change what is assigned to a column. Users with the manage discovery permission can assign these to data set columns before adding the data to a catalog.
Business terms:
Used to define business concepts in a standard way for your enterprise. Catalog collaborators can assign one or more to data assets and columns within relational data sets to describe the data. Users with the manage discovery permission can add these to data sets before adding them to a catalog.
Reference data sets:
Used to define values for specific types of columns. You can include this in the definition of a data class as part of the data matching criteria. During data profiling, if the values in a column match this and other criteria, that data class is assigned to the column. You can also use these in data quality analysis.
Policies:
Used to describe how to govern data in catalogs. You can include data protection rules to control access to data. You can also include governance rules to describe data.
Data protection rules:
Used to identify the data to control and to specify the method of control. You can include classifications, data classes, business terms, or tags to identify the data to control. You can choose to deny access to data or to mask sensitive data values.
Governance rules:
Used to provide a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives.
AI Not Inluded in CP4D
Watson Assistant, Watson Discovery, Watson Knowledge Studio, Watson NLU, Speech to Text, Text to Speech
Name the three Watson services that are available on CP4D outside the catalog
Watson Language Translator Watson Assistant for Voice Interaction Watson Machine Learning Accelerator
Ai Included in CP4D
Watson Machine Learning, Watson Openscale, Watson Studio
Exploring the tables in the database Loading data the database
What Integrated database console tasks are available for IBM Db2® Advanced Edition
a. Updating the DNS service name b. Securing communication ports c. Setting up the Cloud Pak for Data web client
What are the CP4D post-installation tasks?
a. Setting up your registry server b. Obtaining the installation files c. Setting up Portworx storage d. Preparing for air-gapped installations e. Setting up your Cloud Pak for Data environment
What are the CP4D pre-installation tasks?
Bitbucket GitHub GitLab Microsoft Team Foundation Server
What are the four supported GIT Repositories?
a. Define the project goals b. Prepare the data c. Choose the tool d. Train your model e. Deploy your model (Goals, Data, Tool, Train, Deploy) - Go Down To The DMV
What are the stages/steps of creating an ML model
Author Approve Review Publish
What are the steps in a single approval workflow?
Analytics Regulatory Transformation
What are the three project types
Oject Spec - Describes characteristics in its desired state Object Status - Describes the current state of the object
What are the two core object types used by Kubernetes?
Disk latency test - dd if=/dev/zero of=/path-to-installation-directory/testfile bs=512 count=1000 oflag=dsync Disk throughput test - dd if=/dev/zero of=/path-to-installation-directory/testfile bs=1G count=1 oflag=dsync
What two tests should be run to ensure the storage partition has good disk I/O performance?
-Default Workspace -PII Workspace -Data Lake Workspace -In Depth Analysis Workspace
What workspaces are available by default?
Readme file in the data science project
Where does a user find the instructions for using a regulatory accelerator data science asset?
Deployment Space
Where you will save the (WML) model, create the deployment, and find the information you need to score the model and get a prediction back, or embed the deployment in an app so you can interact with it programmatically.
Security Context Constraint - API that allows administrators to control permissions for pods. To examine a particular SCC, use oc get, oc describe, oc export, or oc edit.
Which is an element of OpenShift security architecture?
Analytics Engine Powered by Apache Spark Execution Engine for Apache Hadoop SPSS® Modeler Watson Knowledge Catalog Watson Machine Learning Watson Studio
Which services available in CP4D are scalable?
You can use SSL to encrypt communications to and from CP4D
You can use _______ to encrypt communications to and from CP4D
Classifications:
You use these to describe the sensitivity of the data in data assets. Each data asset has one. Catalog collaborators assign this when they add data assets to a governed catalog. You can also assign to governance artifacts, such as business terms, data classes, reference data, policies, and governance rules.
You can filter search results by these properties:
· Type of asset · Tags · Stewards or owners · Modifiers · A range of dates between which the asset was modified