2.4 Understand Cloud Pak for Data Services architecture
Analytics that come with CP4D
-Analytics Engine Powered by Spark -Data Refinery -Streams
Data Sources that are separately priced
-CockroachDB -Db2 Advanced Edition -MongoDB
Analytics that is separately priced
-Cognos -Datameer -Decision Optimization -Execution Engine for Apache Hadoop -Figure Eight -Operational Analytics for ERP -SPSS -WAND Foundation Taxonomies
Data Sources that come with CP4D
-Data Virtualization -DB2 Event Store -DB2 Warehouse -DB2 for Z/OS Connector -PostgreSQL (Community)
Data Governance that is separately priced
-DataStage Edition -Senzing
Industry Solutions that are separately priced
-Financial Crimes Insight -Financial Services Workbench -Profilics Customer Prospecting Accelerator
Developer Tools that are separately priced
-LightbendPlatform
Storage that is separately priced
-NetApp ONTAP -Portworx
Data Governance that comes with CP4D
-WKC -Regulatory Accelerator
AI that is separately priced
-Watson Assistant Watson Discovery -Watson Knowledge Studio -Natural Language Understanding -Speech-to-Text -Text-to-Speech
Services Not Listed in Catalog
-Watson Language Translator -Watson Assistant for Voice Interaction -WML Accelerator
AI that comes with CP4D
-Watson Machine Learning -Watson OpenScale -Watson Studio
Dashboards that come with CP4D
Analytics Dashboards
Understand add-on reliability support
Each add-on has its own requirements for hardware (compute nodes/cores/memory/networking/storage allocation minimums). Some services may allow to setup dedicated nodes so that they may be used exclusively by those services (such as Db2/MongoDB).Services such as Db2 also allow for HADR configuration to protect against data loss. In order to make the service reliable, it is essential to have a backup and disaster recovery plan. There are two ways to back up and restore the file system: 1.If you're using Portworx, you can create volume snapshots using the CPD control plane 2.You can create volume backups on a local disk or an S3/S3-compatible object storeIn terms of disaster recovery, you can migrate a namespace, including data volumes and Kubernetes resources, from a primary cluster to a secondary (back up) cluster on a continuous schedule.
Explain how to authorize additional users to access a service's functionality
Each service has its own requirements for roles that a user must have in order to access the service. As an example, for Data Virtualization, users must be provided specific Data Virtualization roles based on their job description. The roles that would be provided to someone that wants to access DV would be one of the following: Data Virtualization Admin, Data Virtualization Engineer, Data Virtualization User, Data Virtualization Steward. For a Db2 service, a role of either Admin or User can be provided. Here's an example of how you would do that in IBM Streams: 1.Locate the service instance that you want to manage a. Click the Services icon from the CPD UI b. From the navigation menu, click on My instances or Administer > Manage instances c. Click the Provisioned Instances tab 2. From the service instance options menu, click Manage Access 3. From the Streams User Management page, you can choose from the following actions: a. Add users: Grant additional users' access to the service instance and specify their role b. Edit Role: Change the role that is assigned to a user c. Remove:Remove a user's access to the service instance.
Describe the hardware sizing/requirements for the different types of add-ons
Each service has itsown hardware/sizing requirements, you can see system requirements for services here: https://community.ibm.com/community/user/cloudpakfordata/viewdocument/system-requirements-for-services?CommunityKey=c0c16ff2-10ef-4b50-ae4c-57d769937235&tab=librarydocuments
Describe the Cloud Pak for Data base entitlement functionality contained in add-ons that are available, but not pre-installed into Cloud Pak for Data
Entitlements for add-ons that are not pre-installed must be purchased separately. Entitlements purchased are exclusive to the add-ons. Example -VPCs for DataStage are to be used only for DataStage add-on, versus CP4D VPCs that can be used across the different pre-installed services.
Describe the process for deploying Services into Cloud Pak for Data
First, we must install the service. This depends on the service, and more information can be found in the Knowledge Center. The process would usually include: -Defining Storage Class -Create YAML file -Run ADM command (which pushes Helm Charts and gets Image to Docker) -Use CPD CLI command to install image In the web UI, click on Services icon in the top right, then click the add-on you want, then click Deploy.Before you click deploy, you must make sure that you have met all the requirements in order to deploy the service, such as having the right storage amount available, having the right entitlements purchased, etc. Deployed add-ons can be found in the My Instances page.
Understand when to establish Affinity and Anti-affinity support on worker nodes
For a variety of reasons, a user may want a service or container to only run on a specific type of hardware. To set this up, you can setup affinity on specific nodes for specific services. In Kubernetes, affinity/anti-affinity is done through nodeSelector which provides a simple way to constrain pods to nodes with particular labels. Example of services that use affinity are Watson Knowledge Studio, Speech to Text, Watson Assistant, MongoDB, Db2, WAVI, Data Virtualization. Certain nodes may have resources that others do not. Services that require those resources can only run on those nodes. For example, AutoAI requires AVX2 chips and can only be run on nodes that have that resource and would have an affinity for those nodes.In Cloud Pak for data, you'd want to establish affinity during the installation of a database such as DB2 Warehouse and MongoDB, so that the nodes are used exclusively by the database containers/pods. You also need node affinity if you need to configure local volumes (disk, partition, directory.etc), as used in the WKS installation:
Describe the process for creating service instances for add-ons that require a secondary provisioning step
From the web interface choose Add-ons, then your desired add-on, then Provision instance, then provide the required parameters such as: the Kubernetes namespace, the Docker pull secret, the ZooKeeper connection string or storage class, the external libraries storage class or persistent volume claim, the application cache persistent volume claim, and the build service persistent volume claim, then click Provision The Service installation process flow is as follows:You will download the appropriate .bin file for your add-on to your Linux installation.At this point, your Service in the catalog will still show as Premium.You will execute the .bin file which will create a .tar file. You will extract the components of your service, which will create a deploy.shfile. You will run the deploy script, which will put your service in the correct IBM directory of your Linux system.(deploy.sh will be replaced with cpd install tool)At this point, your service in the catalog will show as Available .From your services listing, you will provision your instance with the service.At this point, your service in the catalog will show as Enabled. Detailed instructions and specific directory locations for the tasks listed above vary with each individual add-on. See the add-on installation instructions for your specific service.
Describe the hardware architectures supported by Cloud Pak for Data add-ons
In terms of Cloud Pak for Data as a whole, the only Hardware Architecture supported is Intel x86-64. -AutoAIr equires a processor that supports the AVX2 instruction set; without that instruction set, AutoAI will not run. -Intel Deep Learning Reference Stack -PyTorch and Intel Deep Learning Reference Stack -TensorFlow require a processor that supports the AVX-512 instruction set; without that instruction set, both those services will not run.
Explain difference of Service installation and Service instance provision
Installation of an add-on requires downloading the images for the add-on from the registry location into the CP4D cluster. The add-on will show as Available when the image has been installed, but it is still not Enabled nor ready for use. To provision the add-on, the install process needs to be completed first and you need to make sure that your CP4D cluster meets the requirements of the service. Once requirements are met, you can provision the instance to officially enable the add-on to be used. Provisioning involves providing information such as the Kubernetes namespace, Docker pull secret, external libraries storage class or persistent volume claim, etc. In order to actually begin building applications off of the particular add-on
Developer Tools that come with CP4D
Jupyter Notebooks w/ Python 3.6 for GPU, Jupyter Notebooks with R 3.6, Lightbend Platform, Open Source Management, RStudio Server with R 3.6
Understand add-on scalability support
Project admins can adjust services by scaling them to support high-availability or increase processing capacity. When you scale up a service, any components or services that it requires are also scaled up. You can determine the scale of a service by checking the number of replicas it has. Scalable services include: Analytics Engine, Execution Engine, SPSS Modeler, WKC, WML, Watson Studio Scaling up a service requires increase the number of pods. You can scale up at any time after installation. Here is an example of the command to scale after installation:./cpd-Operating_System\scale -a Assembly\-n Project\--config medium \--load-from Image_directory_location
Describe the pre-installed functionality in Cloud Pak for Data
Services that are included with Cloud Pak for Data are:Watson ML, Watson OpenScale, Watson Studio, Analytics Engine, Data Refinery, Streams, Analytics Dashboards, Regulatory Accelerator, Watson Knowledge Catalog, Data Virtualization, Db2 Event Store, Db2 Warehouse, Db2 for z/OS, PostgreSQL, Jupyter Notebooks with Python 3.6 for GPU and R 3.6, Open Source Management, RStudio Server When you install the lite assembly, the only functionality that is available are administration features (both for OS and the Web Client). These features include, but are not limited to: -Monitoring Resources -LDAP ConnectionsUser Configuration Settings -Manage Access Services must be installed and provisioned in order to increase functionality of the lite assembly. Read here for more information on Services and how to install the ones you need:https://www.ibm.com/support/knowledgecenter/SSQNUZ_2.5.0/cpd/svc/svc.html
Describe the functional categories for Cloud Pak for Data add-ons
The functional categories for CP4D add-ons include: AI, Analytics, Dashboards, Data Governance, Data Sources, Developer Tools, Industry Accelerators, and Storage
Describe where to locate and deploy the Cloud Pak for Data Services
Web UI: click on Services icon in the top right, then click the add-on you want, then click Deploy CLI: to install a service you need to change to the directory where the CP4D CLI is placed and log in to your RHOS cluster as an admin (oc login OpenShift_URL:port) then run the following command: [./cpd-Operating_System--repo ./repo.yaml \--assembly dv \--namespace Project\--storageclass Storage_class_name\--transfer-image-to Registry_location\--cluster-pull-prefix Registry_from_cluster\--ask-push-registry-credentials] where OS is either linux or darwin, and the remaining values need to be provided by your cluster administrator
Explain where to find usage metrics for a specific add-on
You can access point-in-time resource allocation in the Cloud Pak for Data administration tool. To access point-in-time resource allocations in the Cloud Pak for Data administration tool, log in to the web client as an administrator, and from the navigation menu, select Administer > Manage platform. From here you can: -Track resource usage (CPU virtual cores and memory) over time for each deployment -Set resource target limits -See an aggregated view of the monthly usage of all services -Drill down to the pods, dynamic runtimes, and service instances to view details on resource usage and logs, and to perform administrative tasks, such as starting, stopping, and deleting pods -View and download usage reports -Run diagnostic jobs to collect logs for all services so that you can troubleshoot problems If you consistently use more resources than you are entitled to, you can either license additional cores or decrease the number of services that are running in your Cloud Pak for Data cluster.