Final Prep for CS 498: Cloud Computing Applications

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

a new type of data repository for storing massive amounts of raw data in its native form, in a single location (both structured and unstructured)

A Data Lake is a new type of data repository for storing massive amounts of unstructured data in a single location for processing, cleaning, and structuring a new type of data repository for storing massive amounts of structured data in a single location, rather than spread over multiple datacenters, in order to exploit data locality to speed the analysis a new type of data repository for storing massive amounts of raw data in its native form, in a single location (both structured and unstructured)

data structure, more specifically, a sophisticated nested array

A Datacube is best thought of as a(n) function that structures and compresses data data structure, more specifically, a sophisticated nested array archival service provided by AWS specialized hardware for fast analysis of massive data

one or more application containers.

A Pod is a Kubernetes abstraction that represents a group of one or more application containers. one or more nodes. one or more virtual machines

combines the power of analytics with the flexibility of big data models and the agility and limitless resources of the cloud

A data lake combines the power of analytics with the flexibility of big data models and the agility and limitless resources of the cloud by definition can only store unstructured data as a concept preceded the concept of a data warehouse, going back to 1960s

False

A datacube is an example of a columnar RDBMS True False

True

A node in Kubernetes can run several pods and each pod can run several containers. True False

False

A user application is not allowed to load control registers when running in kernel mode. True False

False It's the opposite for both cases.

AWS ECS has two launch tyeps: EC2 and Fargate. EC2 will automatically manages all resource provisioning while for Fargate it's managed by customer. True False

microVM

AWS Lambda and AWS Fargate are using Container microVM

organizations / end users

According to Gartner, the vast majority of cloud security failures in the next few years will be caused by organizations / end users chip makers hypervisor architects lax firewall rules

FaaS

According to Gartner, which of these is likely to grow the fastest over the next few years? FaaS PaaS IaaS SaaS

code

According to Ted Malaska, it used to be that what locked you into a system was money, but now it is business relationships much more open due to the fact that there is so much competition in the space code

True

All containers without a --network specified, are attached to the default bridge network. This is risky operation as it allows unrelated services to communicate. True False

Kubernetes

Anthos is mostly based on Docker Kubernetes OpenStack

True

Besides using Dockerfile to create container image, one can also start a container using existing image and install necessary packages on top of it to create a new image. True False

False Binary translation only modifies sensitive instructions.

Binary translation modifies all instructions on the fly and does not require changes to the guest operating system kernel. True False

Vision Voice Language

Choose all that apply: What are examples of unstructured data in a cloud machine learning context? Graphs Vision Voice Language

ALL

Choose all that apply: Which of these are security threats in the cloud computing field? Vulnerabilities in management APIs Incomplete data deletion Multi-tenant data leakage due to failure of separation of control Insider abuse

OLAP cubes require complicated pipelines to transform data from a SQL database into OLAP cubes A Datacube is a data structure

Choose all the true statements OLAP cubes require complicated pipelines to transform data from a SQL database into OLAP cubes A Datacube is a data structure MonetDB is better than BigTable at storing data retrieved by a key or a sequence of keys.

OLTP and OLAP are both for structured data

Choose the true statement OLTP is for structured data, and OLAP is for unstructured data OLTP is for unstructured data, and OLAP is for structured data OLTP and OLAP are both for unstructured data OLTP and OLAP are both for structured data

False

Containers on different networks can communicate using the bridge network. True False

False T​he routing mesh will automatically route the incoming traffic to a node that has a service task running on it.

Docker Swarm routing mesh will report an error if an external load balancer reaches a node that does not have a task belonging to the requested service. True False

Virutal IP When a Service is requested the resulting DNS query is forwarded to the Docker Engine, which in turn returns the IP of the service, a virtual IP.

Docker internal load balancing is done using Virutal IP Published port numbers to the host system Service name

Guarantee that the software will always run the same irrespective of environment U​sing the Dockerfile format, and relying on Union filesystem technology, docker images downloaded from a hub guarantee specfic software environments for deployment.

Docker is used to: Monitor progress of jobs running on OpenStack Send messages from one machine to another Guarantee that the software will always run the same irrespective of environment Run a Java program

True

Environment variables and DNS are two primary modes for service discovery in Kubernetes. True False

True

Etcd is a key-value store that provides a consistent distributed state for a Kubernetes cluster. True False

False It's only possible to use container name in a user-defined bridge network.

For a container to communicate with another container running on the default bridge network, one can use either the target container's ip address or its container name directly. True False

overlay network

For communication among containers running on different Docker daemon hosts, you should use bridge network overlay network

overlay network

For containers running on different host to communicate, you should use bridge network host network overlay network macvlan network

False

GPUs are too expensive to use at the massive cloud scale. True False

95%

Gartner predicts that through 2022, at least what percentage of cloud security failures will be caused by organizations / end users? 25% 50% 75% 95%

True

Google Anthos is a framework for Hybrid Cloud deployments True False

ALL

How can you mount a storage location on the host to a container? Bind mount Volume tmpfs

They make a docker container of the inferencing code, and keep a reference to a BLOB storage bucket where the trained model's parameters are stored. Any time a HTTPS request for a model inference arrives, they launch the container, which fetches the parameters and runs the inference.

How do cloud providers technically handle model deployment? They make a docker container of the inferencing code, and keep a reference to a BLOB storage bucket where the trained model's parameters are stored. Any time a HTTPS request for a model inference arrives, they launch the container, which fetches the parameters and runs the inference. Amazon Sagemaker stores all trained models in DynamoDB. Upon an HTTPS request, it asks DynamoDB for the model and runs the proper pre-written algorithm along with parameters fetched from DynamoDB. They keep a pool of virtual machines active, so that any time a HTTPS request for a model inference arrives, one of the VMs is ready to fetch the model artifacts from the model repository and run it. T he models are stored in javascript. Any browser that wishes to run a model fetches the model parameters from a cloud-based BLOB storage, and simply runs the model code locally.

It regularly uses "ping" messages. It uses, ping messages, keep-alives to make sure that everything is actually responding and doing processing.

How does Pregel detect the failure? The master periodically instructs the workers to save the state of their partitions to persistent storage. Each worker communicates with the other workers. It regularly uses "ping" messages. The workers all reload their partition state from the most recent available checkpoint.

Allows Storm to be used from many language Thrift allows users to define and create services which are both consumable by and serviceable by numerous languages

How does Thrift contribute to Storm? Enables the usage of streams Provides load-balancing functionality Provides scalability Allows Storm to be used from many language

Trident has first class support for state, but the exact implementation is up to the application developer.

How does Trident treat state? Trident has first class support for state and is completely automatic without the need of any help from the application developer. Trident has first class support for state, but the exact implementation is up to the application developer. Trident does not have any support for state. None of the above.

When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in an in-memory filesyste

How does a service get access to secret information in Docker Swarm? When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in an in-memory filesystem When you grant a newly-created or running service access to a secret, the encrypted secret is mounted into the container in an in-memory filesystem . The program in the container needs to decrypt the secret using the appropriate master key. When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in a disk mounted location

The master periodically instructs the workers to save the state of their partitions to persistent storage.

How is checkpointing done in Pregel? The workers all reload their partition state from the most recent available checkpoint. The master periodically instructs the workers to save the state of their partitions to persistent storage. Each worker communicates with the other workers. It regularly uses "ping" messages.

The workers all reload their partition state from the most recent available checkpoint. where the master re-assigns the graph portions to currently available work, workers so you can share out what's the job that's not being finished, you could share that out to other workers that are alive and can process the system. And the workers just reload their partition state for the most available check point and then continue.

How is recovery being done in Pregel? The workers all reload their partition state from the most recent available checkpoint. The master periodically instructs the workers to save the state of their partitions to persistent storage. It regularly uses "ping" messages. Each worker communicates with the other workers.

User define

How many master node does Kubernetes has? One Two User define

OLAP

If the vast majority of your queries involve a very large number of rows but only a few columns, which system is a more natural fit? OLTP OLAP

OLTP

If the vast majority of your queries only involve a few rows, but involve most or all of the columns for those rows, which system is a more natural fit? OLTP OLAP

FPM

If we want to find which set of items in a grocery shop are frequently bought together, which of the following approaches should we use? K-Means Naïve Bayes Decision Forests FPM

OLAP Datacubes RDBMS are often limited by the constraints of SQL

If your primary interest is the richest possible analysis capabilities, which of these two options would likely be the better choice? Column-Oriented Data Warehouse OLAP Datacubes

ALL

In Docker Swarm, a service is used to handle Launching and monitoring tasks Rolling updates Network routing

True

In Docker Swarm, ingress is a overlay network that handles control and data traffic related to swarm services. True False

Map: A,C Reduce: B

In K-means done on Map Reduce, which of the following steps is done in the Map phase, and which in the Reduce phase? A. For each data point, assign to the closest centroid B. For each cluster, re-compute the centroids C. Read the k centroids Map: C Reduce: A,B Map: C,B Reduce A Map: A,C Reduce: B Map: A Reduce: B,C

D -> A -> C -> B

In K-means, what is the order of the following steps? A. For each data point, assign to the closest centroid B. If new centroids are different from the old, re-iterate through the loop C. For each cluster, re-compute the centroids D. Randomly select k centroids

False

In Kubernetes users need to take care of mapping container ports to host ports. True False

Launching and monitoring pods

In Kubernetes, a ReplicaSet takes care of Launching and monitoring pods Rolling updates Network routing All above

False Ted Malaska's opinion is that SQL will be around forever

In Ted Malaska's opinion, SQL is dying, and will eventually be replaced by NoSQL approaches. True False

databases

In Ted Malaska's opinion, Spark & Flink should be categorized as databases streaming services deep archive systems analytics engines

QuoteSplitterBolt, WordCountBolt, SortBolt, MergeBolt

In a Storm program that produces a sorted list of the top K most frequent words encountered across all the documents streamed into it, four kinds of processing elements (bolts in Storm) might be created: QuoteSplitterBolt, WordCountBolt, MergeBolt, and SortBolt. What is the order in which words flow through the program? WordCountBolt, QuoteSplitterBolt, SortBolt, MergeBolt QuoteSplitterBolt, WordCountBolt, SortBolt, MergeBolt QuoteSplitterBolt, SortBolt, WordCountBolt, MergeBolt WordCountBolt, QuoteSplitterBolt, MergeBolt, SortBolt

Obtaining data, scrubbing data, exploring the dataset, train and evaluate a model, and interpreting the results

In a typical data science workflow, what are the steps involved? Obtaining data, scrubbing data, exploring the dataset, train and evaluate a model, and interpreting the results Obtaining data, data cleaning, model training, model exploration, model deployment Model training, model exploration, cleaning the outcomes, interpreting the results. Cleaning data, exploring data, model training and evaluation, obtaining results, deploying the model

False

In coming years, it is likely that cloud computing will become more and more centralized in the USA, while cloud computing in other areas of the world will decline. True False

False Correct! OLAP cubes require that data teams manage complicated pipelines to transform data from a SQL database into OLAP cubes

In general, it is very easy and straightforward to transform data from a SQL database into an OLAP cube. True False

Events are double processed.

In the "At Least Once" message process, what happens if there is a failure? Events are double processed. Storm's natural load-balancing takes over. Storm's natural fault-tolerance takes over. You must create and implement your load-balance algorithm.

Compound Annual Growth Rate

In the context of this week's lectures, CAGR stands for Coursera Approved Grading Results Cloud Assisted Gradient Regressor Compound Annual Growth Rate Computer Augmented Graphical Reality

False There is only one address space in unikernel. Applicaiton can be seen as running in kernel mode the whole time.

In unikernel, user application can transit to kernel mode using special instructions. True False

True

In x86, kernel mode code runs in ring 0 while user processes run in ring 3. True False

False Making changes to unikernel requires recompilation. Unikernel normally only runs one application.

Is it possible to install a second application with different dependencies into an existing unikernel? True False

False

It is likely that regulated industries will NOT be able to move to the cloud due to regulations such as the GDPR. True False

False

It is likely that the GDPR will completely prevent most regulated industries in Europe from moving to the cloud. True False

None above

Kubernetes can be classified as is a Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Software as a Service (Saas) None above

False Kubernetes is deprecating Docker as a container runtime after v1.20.

Kubernetes can only support Docker container runtime. True False

T​rue

Kubernetes provides a platform for automating deployment, scaling, and operations of application containers across clusters of hosts. F​alse T​rue

True The VM simulates enough hardware to allow an unmodified guest OS (one designed for the same CPU) to be run in isolation.

Kubernetes provides a platform for automating deployment, scaling, and operations of application containers across clusters of hosts. True False

True

Messages in Pregel are delivered exactly once. True False

False Microsoft predicts that 90% of new apps will be created with low-code / no-code tools

Microsoft believes that approximately 90% of new apps in the next five years will be created by C# developers True False

90%

Microsoft predicts that over the 2020-2025 timespan approximately what percent of new apps will be created with low-code / no-code tools? 10% 30% 75% 90%

building Datacubes to running OLAP workloads directly on columnar databases

Over the past decade (2010-2020), the momentum seems to be shifting from building Datacubes to running OLAP workloads directly on columnar databases running OLAP workloads directly on columnar databases to building sophisticated Datacubes

False Pregel uses a master/worker model

Pregel works on an egalitarian model - all of the nodes perform the same functions and have the same responsibilities True False

False

Pregel's message passing API guarantees message delivery order True False

False. In Redshift, blocks are immutable. In general, Columnar Stores are not good at updates compared to other approaches

Redshift, like most Columnar Stores, makes it easy to update blocks. True. Redshift, like most Columnar Stores, are write-optimized, so updates are easy False. In Redshift, blocks are immutable. In general, Columnar Stores are not good at updates compared to other approaches True, though this property is rarely used in practice since Columnar Stores are primarily utilized by read-heavy applications False. Redshift is not a Columnar Store, but a data pipeline that connects Columnar Stores to analysis engines

ALL

Select all that apply: What are some commonly available datacube operations? Slicing Dicing Drill Up / Down Roll-up Pivot

ZooKeeper JobTracker TaskTracker Worker

Select all that apply: which are elements of the Giraph framework? ZooKeeper JobTracker TaskTracker Worker Hadoop

False

Sidecar model requires two nodes to work. True False

False In fact, employment in this area is projected to grow faster than average

Since AI and automation are being applied more and more to the maintenance and management of clouds, it is forecasted that the employment of computer and IT occupations will fall over the next five years. True False

Significantly more than 5%

Suppose a table contains 10000 rows and 100 columns. A query that uses all of the rows and 5 columns will need to read approximately what percentage of the data contained in the table if you are using a traditional row-based RDBMS system? Significantly less than 5% Approximately 5% Significantly more than 5%

True

Ted Malaska would agree with this opinion: Storm is not a competitor with Spark and Flink in the real-time database world. True False

an order of magnitude or more larger than the adjacency list of the web (modeled as a collection of vertices and edges) A model of the web would have approximately 50 billion vertices, 1 trillion edges, and a 30 TB adjacency list, while a model of the brain would have approximately 100 billion vertices, 100 trillion edges, and a 2.84 PB adjacency list

The adjacency list of the brain (modeled as a collection of vertices and edges) is an order of magnitude or more larger than the adjacency list of the web (modeled as a collection of vertices and edges) about the same size as the adjacency list of the web (modeled as a collection of vertices and edges) an order of magnitude or more smaller than the adjacency list of the web (modeled as a collection of vertices and edges)

5.4

The latest CentOS comes with Linux kernel version 4.18. If you are running a latest CentOS container on a Ubuntu with kernel version 5.4, which kernel version would you see inside the container? 4.18 5.4

False There is expected to be a shortfall of one million developers, so there should be plenty of work for everyone!

The low-code / no-code paradigm is a major threat to the employment prospects of developers and programmers around the world. True False

False Today, OLAP cubes refer specifically to contexts in which these data structures far outstrip the size of the hosting computer's main memory

Today, OLAP cubes are always designed to fit in the hosting computer's main memory to maximize analytical performance True False

True

Traditionally, OLAP cubes were known for extreme performance advantages over row-oriented RDBMS True False

True

Virtual machine using full virtualization can only run guest OS designed for one type of CPU (the same as host). True False

Sources of streams

What are spouts in Apache Storm? Unbounded sequences of tuples Network of spouts and bolts Processors of input Sources of streams

Unbounded sequences of tuples

What are streams in Apache Storm? A network of spouts and bolts Unbounded sequences of tuples Aggregators Processors of input

Hyperparameter optimization tunes the training parameters of a single training algorithm, while AutoML tries out multiple training algorithms on the input dataset.

What are the definitions of hyperparameter optimization and AutoML? Hyperparameter optimization tunes the training parameters of a single training algorithm, while AutoML tries out multiple training algorithms on the input dataset. Hyperparameter optimization means adjusting the parameters of a search space using gradient descent, and is a technical term. AutoML is a special case of hyperparameter optimization, and is marketing jargon. They are both the same and used interchangeably. Hyperparameter optimization refers to adjusting the parameters of a hyper plane that divides the search space in equidistance quadrants. AutoML means the cloud provider takes care of orchestrating machine learning artifacts' deployment.

Networks of spouts and bolts

What are topologies in Apache Storm? Unbounded sequences of tuples Sources of streams Processors input Networks of spouts and bolts

ALL

What does AWS SageMaker provide? A system to build ML models Managed Notebooks Hyperparameter Optimization tools Capability to detect concept drift

Provides a persistent state for the bolts, but the exact implementation is up to the user

What does Trident do? Provides a persistent state for the topology, with a predefined set of characteristics Provides a persistent state for the bolts, but the exact implementation is up to the user Provides a persistent state for the spout, but the exact implementation is up to the user Provides a persistent state for the bolts, with a predefined set of characteristics

Explore Data

What does the "E" in OSEMN stand for? Engineer Features Extract Features Explain Results Explore Data

Responsible for coordination Master is responsible for coordination: • Assigns partitions to workers • Coordinates synchronization • Requests checkpoints • Aggregates aggregator values • Collects health statuses

What is Master's role for task assignment in Giraph? Responsible for the state of computation Communicate with other workers Responsible for vertices Responsible for coordination

Responsible for vertices Worker is responsible for vertice: • Invokes active vertices compute() function • Sends, receives, and assigns messages • Computes local aggregation values

What is Worker's role for task assignment in Giraph? Responsible for the state of computation Communicate with other workers Responsible for vertices Responsible for coordination

Responsible for the state of computation ZooKeeper is responsible for computation state: • Partition/worker mapping • Global state: #superstep • Checkpoint paths, aggregator values, statistics

What is ZooKeeper's role in task assignment in Giraph? Responsible for the state of computation Responsible for coordination Responsible for vertices Communicate with other workers

A graph database is a storage system that provides index-free adjacency

What is a graph database? A graph database is any storage system that provides index adjacency A graph database is a storage system that holds "maps" and "routes" A graph database is a storage system that provides index-free adjacency A graph database is any storage system where the data rests on the vertices and is computed on the edges

A recommendation engine working based on the user preferences and others with similar preferences Collaborative filtering is to have multiple filters working together to extract just the information you want.

What is an example of a collaborative filtering application? Finding the frequent item sets frequently bought together Placing new items into predefined categories Grouping similar object together without knowing the groups ahead A recommendation engine working based on the user preferences and others with similar preferences

Finding the frequent item sets frequently bought together this is an example of frequent pattern mining

What is an example of an FPM application? Finding the frequent item sets frequently bought together Placing new items into predefined categories Grouping similar object together without knowing the groups ahead A recommendation engine working based on the user preferences and others with similar preferences

A graph database in any storage system that provides index-free adjacency with a graph data base with all the information in. It's going to provide some way of retrieving their data and typically a graph database is going to provide index-free adjacency.

What is graph processing? A non-relational, distributed database A distributed real-time computation system A graph database in any storage system that provides index-free adjacency A framework for distributed storage and processing of large data sets

The process of eliminating events to keep up with the rate of events

What is load shedding? Distributing applications across many servers The process of eliminating events to keep up with the rate of events Enabling a system to continue operating properly in the case of the failure of some of its components Distributing the data across different parallel computing nodes

t is micro-batch, which increase the minimum end-to-end latency of the system. the disadvantage is that it's not really streaming in the strict sense. it's really batches data, and then runs that batch of data very quickly.

What is the main disadvantage of Spark Streaming? It does not handle failures. It is micro-batch, which increase the minimum end-to-end latency of the system. It lacks a rich eco-system of big data tools. It does not support state

Pod

What is the smallest control unit in Kubernetes? Deployment Node Pod

ALL except Cryptocurrency

What technologies / concepts are part of the "Digital Transformation"? Cloud Big Data AI / ML IoT Cryptocurrency

True

When a user application needs to handle an interrupt, it has to enter kernel mode. True False

False

When you publish a service port, the swarm makes the service accessible at the target port only on nodes that have a task running for that service. True False

AWS QuickSight

Which best fits the category of graphical BI tool? AWS QuickSight Kibana Azure Analysis Services A​WS Glue

kube-proxy

Which component of Kubernetes cluster node is responsible for implementing a form of virtual IP for Services? kubelet kube-proxy container Runtime

kubelet

Which component of Kubernetes cluster node is responsible for responsible for managing the pods? kubelet kube-proxy API Server

etcd

Which component of Kubernetes master node is responsible for storing the state of the cluster? controller scheduler etcd

Sidecar pattern

Which design pattern will you use for adding HTTPS to a legacy service? Sidecar pattern Ambassador pattern Adapter pattern

Third

Which generation of hardware virtualization introduced IOMMU virtualization? First Second Third

Clustering k-means clustering aims to partition n observations into k clusters

Which group does K-means fall into? Collaborative filtering Frequent pattern mining Clustering Classification

Having a minimal security protection. Using minimal device model and kernel configuration can reduce attacker surface of microVM and does not reduce security protection.

Which is not the reason that microVM is faster than a normal VM Having a minimal device model. Having a minimal security protection. Having a minimal guest kernel configuration.

bridge

Which is the default Docker network driver? bridge host overlay

Running each application in a separate container.

Which is the recommended practice in the following scenario? "Multiple applications with communication requirements" Running all applications in one container. Running each application in a separate container.

Only one stream processing pipeline but with the ability to handle failures In Kappa Architecture, they try to get away from the two parallel paths and they just do the streaming but they try to do streaming good enough so that if there are failures the state doesn't get messed up.

Which of the following best describes Kappa architecture? A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline A serial processing pipeline of first a streaming processing system and then a batch processing system A serial processing pipeline of first a batch processing system and then a stream processing system Only one stream processing pipeline but with the ability to handle failures

A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline Lamda Arch has two parallel data processing paths.The first processing path would use a stream event processing system like storm and Then on the parallel path you have the batch processing system

Which of the following best describes Lambda architecture? (Not to be confused with AWS Lambda). A serial processing pipeline of first a batch processing system and then a stream processing system Only a stream processing pipeline but with the ability to handle failures A serial processing pipeline of first a streaming processing system and then a batch processing system A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline

A set of labeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. Naive Bayes is a supervised classification method

Which of the following best describes how Naïve Bayes works? A set of unlabeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. A set of labeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. A set of items is given, and the most frequent set of items is found. A set of data points are given and it classifies them into k groups

A bolt processes input streams and produces new streams.

Which of the following is correct? A topology is a network of tuples and streams. A stream connects a bolt to a spout. A plant jar can receive output from many streams. A bolt processes input streams and produces new streams.

K-means K-means is a clustering method

Which of the following is not a classification mechanism? Naïve Bayes K-means Convolution Neural Network Classifier Decision forests

HDFS HDFS is the storage part of Hadoop

Which of the following is not a component of the Storm Architecture? Worker Supervisor Zookeeper HDFS Nimbus

Kube-proxy

Which of the following is not a component of the master node? Scheduler Controller Kube-proxy API server

Create container image based on user specified requirements.

Which of the following is not a function of Kubernetes? Schedule containers to run on physical or virtual machines. Create container image based on user specified requirements.

Replaces the functionality of Dockerfile. It still needs Dockerfile to create images.

Which of the following statements about Docker Compose is incorrect? Uses a YAML file to configure application's services. Replaces the functionality of Dockerfile. Main tool by Docker for container orchestration

The routing mesh uses IP based service discovery and load balancing. The routing mesh uses port based service discovery and load balancing.

Which of the following statements is NOT true about Docker routing mesh? The routing mesh enables each node in the swarm to accept connections on published ports for any service running in the swarm. The routing mesh uses IP based service discovery and load balancing. By default all nodes participate in an ingress routing mesh.

Spark Streaming chops a stream into small batches and processes each batch independently. it is called micro-batch

Which of the following statements is true? Spark Streaming treats each tuple independently and replays a record if not processed. Spark Streaming chops a stream into small batches and processes each batch independently. Spark Streaming has no support for state. Spark Streaming uses transactions to update state.

Associative data sets Graph database has a bunch of associative data sets, so you look up one item and you retrieve a different item from it like a node and it's connection to and age.

Which of these is a property of a graph database? Associative data sets Performs the same operation on large numbers of data Uses a relational model of data Entity type has its table

BigTable A​s a wide-column store, BigTable is specialized for access by a key

Which of these would probably be best for storing data retrieved by a key or a sequence of keys? MonetDB SybaseIQ BigTable Vertica

Infrastructure as a Service

Which one is preferred by enterprise as of 2020? Platform as a Service Infrastructure as a Service

Infrastructure as a service

Which one is preferred for security reason? Infrastructure as a service Platform as a service Function as a service Software as a service

N​iFi in Nifi you can design a graph to process your data

Which system has a great graphical UI to design dataflows? N​iFi S​form D​ruid S​park Streaming

RDBMS

Which system is a more natural fit for OLTP? Datawarehouse D​ata Lake Managed M​achine Learning platforms RDBMS

Druid Druid provides tricks that processes OLAP queries fast

Which system is best for Online Analytical Processing (OLAP)? NiFi Storm Druid Spark Streaming

overlay network

Which type of network connects multiple Docker daemons together and enables swarm services to communicate with each other. bridge network host network overlay network macvlan network

Hardware-assisted

Which type of virtualization is feasible for the following scenario? "A service needs to run an unknown and unmodified OS on an advanced processor." Para-virtualization Hardware-assisted

Full virtualization

Which type of virtualization is feasible for the following scenario? "A service needs to run an unmodified OS on a basic processor, separate from the host operating sysetm." Full virtualization C​ontainer Para-virtualization

Hardware-assisted full virtualization

Which type of virtualization is feasible for the following scenario? "Application that needs different custom operating systems (kernels)" Para-virtualization Hardware-assisted full virtualization Containers

No virtualization E​very sort of virtualization technology has some kind of performance impact. For the absolute best performance, no virtualization is the best option.

Which type of virtualization provides better performance for the following scenario? "One application running on a single piece of hardware" Full virtualization Containers No virtualization JVM

Containers

Which type of virtualization provides better performance for the following scenario? "Running multiple independent applications sharing the same kernel" Hardware-assisted full virtualization Containers

Full virtualization

Which type of virtualization provides better performance for the following scenario? "Running two independent applications, each needs a different version of a kernel module". Full virtualization Containers

Containers

Which type of virtualization provides better performance for the following scenario? "Multiple applications with high memory usage" Containers Full virtualization through binary translatio

Host OS (Base OS) kernel

Who is responsible for scheduling and memory management when using containers? Virtual Machine Manager Hypervisor Host OS (Base OS) kernel Supervisor

In fact, Pregel does not use MapReduce, because MapReduce produces too much communication between stages. MapReduce tends to be inefficient because the graph state must be stored at each stage of the graph algorithm, and each computational stage will produce much communication between the stages.

Why does Pregel use MapReduce? Pregel does not use MapReduce, because Pregel was created long before MapReduce MapReduce is widely known and used by cloud developers, so Pregel uses MapReduce to ease developer burden In fact, Pregel does not use MapReduce, because MapReduce produces too much communication between stages. MapReduce is a well-established and effective tool that can easily solve most graph problems

It produces too much communication between stages. MapReduce tends to be inefficient because the graph state must be stored at each stage of the graph algorithm, and each computational stage will produce much communication between the stages.

Why is MapReduce not efficient for large-scale graph processing? It brings load imbalance. It is fault-tolerance enough. It produces too much communication between stages. The map function is computationally a bottleneck.

False With an OLAP system, the typical query involves one column, but an OLTP query typically involves most or all of the columns in a row

With an OLTP system, the typical query involves one column True False

No Ted Malaska's opinion, Hadoop is made up of two fundamental projects: Map-Reduce and HDFS. Spark replaces Map-Reduce, but HDFS is replaced by Object Stores, not Spark.

Would Ted Malaska agree that Spark replaces Hadoop? Yes No

True aravirtualization is a software-only virtualization approach.

Xen does not require special hardware support, such as Intel "VT-x" or "AMD-V". True False

Bind mount

You are working on a course MP and want to use the IDE on host to edit codes and run the codes inside a container. Which is the best way to make the codes accessible inside the container? Bind mount Volume tmpfs

5.4 Container always uses host kernel.

You built a latest CentOS container on an Ubuntu host with kernel version 4.18. After you upgraded the Ubuntu kernel to 5.4, what will be the kernel version used by the built CentOS container? 5.4 4.18

compute(list of messages) -> return list of messages

You want to build a shortest path algorithm using parallel breadth-first search in Pregel. Which of the following pseudo-codes is the proper "compute function" for this program? compute(list of messages) -> return list of messages compute(list of vertexes) -> return list of messages compute(list of edges) -> return list of messages compute(graph) -> return list of messages


Ensembles d'études connexes

Chapter 6: Identity and Personality

View Set

FINAL EXAM For Pharm Practice questions

View Set

5.7.5 Practice Questions TestOut PC Pro

View Set

test 2 Alain Muscular System lab

View Set

MIS - Chapter 7: Telecommunications, Internet and Wireless

View Set

CompTIA A+ Exam 220-1002 Windows Control Panel Utilities

View Set