Final Prep for CS 498: Cloud Computing Applications
a new type of data repository for storing massive amounts of raw data in its native form, in a single location (both structured and unstructured)
A Data Lake is a new type of data repository for storing massive amounts of unstructured data in a single location for processing, cleaning, and structuring a new type of data repository for storing massive amounts of structured data in a single location, rather than spread over multiple datacenters, in order to exploit data locality to speed the analysis a new type of data repository for storing massive amounts of raw data in its native form, in a single location (both structured and unstructured)
data structure, more specifically, a sophisticated nested array
A Datacube is best thought of as a(n) function that structures and compresses data data structure, more specifically, a sophisticated nested array archival service provided by AWS specialized hardware for fast analysis of massive data
one or more application containers.
A Pod is a Kubernetes abstraction that represents a group of one or more application containers. one or more nodes. one or more virtual machines
combines the power of analytics with the flexibility of big data models and the agility and limitless resources of the cloud
A data lake combines the power of analytics with the flexibility of big data models and the agility and limitless resources of the cloud by definition can only store unstructured data as a concept preceded the concept of a data warehouse, going back to 1960s
False
A datacube is an example of a columnar RDBMS True False
True
A node in Kubernetes can run several pods and each pod can run several containers. True False
False
A user application is not allowed to load control registers when running in kernel mode. True False
False It's the opposite for both cases.
AWS ECS has two launch tyeps: EC2 and Fargate. EC2 will automatically manages all resource provisioning while for Fargate it's managed by customer. True False
microVM
AWS Lambda and AWS Fargate are using Container microVM
organizations / end users
According to Gartner, the vast majority of cloud security failures in the next few years will be caused by organizations / end users chip makers hypervisor architects lax firewall rules
FaaS
According to Gartner, which of these is likely to grow the fastest over the next few years? FaaS PaaS IaaS SaaS
code
According to Ted Malaska, it used to be that what locked you into a system was money, but now it is business relationships much more open due to the fact that there is so much competition in the space code
True
All containers without a --network specified, are attached to the default bridge network. This is risky operation as it allows unrelated services to communicate. True False
Kubernetes
Anthos is mostly based on Docker Kubernetes OpenStack
True
Besides using Dockerfile to create container image, one can also start a container using existing image and install necessary packages on top of it to create a new image. True False
False Binary translation only modifies sensitive instructions.
Binary translation modifies all instructions on the fly and does not require changes to the guest operating system kernel. True False
Vision Voice Language
Choose all that apply: What are examples of unstructured data in a cloud machine learning context? Graphs Vision Voice Language
ALL
Choose all that apply: Which of these are security threats in the cloud computing field? Vulnerabilities in management APIs Incomplete data deletion Multi-tenant data leakage due to failure of separation of control Insider abuse
OLAP cubes require complicated pipelines to transform data from a SQL database into OLAP cubes A Datacube is a data structure
Choose all the true statements OLAP cubes require complicated pipelines to transform data from a SQL database into OLAP cubes A Datacube is a data structure MonetDB is better than BigTable at storing data retrieved by a key or a sequence of keys.
OLTP and OLAP are both for structured data
Choose the true statement OLTP is for structured data, and OLAP is for unstructured data OLTP is for unstructured data, and OLAP is for structured data OLTP and OLAP are both for unstructured data OLTP and OLAP are both for structured data
False
Containers on different networks can communicate using the bridge network. True False
False The routing mesh will automatically route the incoming traffic to a node that has a service task running on it.
Docker Swarm routing mesh will report an error if an external load balancer reaches a node that does not have a task belonging to the requested service. True False
Virutal IP When a Service is requested the resulting DNS query is forwarded to the Docker Engine, which in turn returns the IP of the service, a virtual IP.
Docker internal load balancing is done using Virutal IP Published port numbers to the host system Service name
Guarantee that the software will always run the same irrespective of environment Using the Dockerfile format, and relying on Union filesystem technology, docker images downloaded from a hub guarantee specfic software environments for deployment.
Docker is used to: Monitor progress of jobs running on OpenStack Send messages from one machine to another Guarantee that the software will always run the same irrespective of environment Run a Java program
True
Environment variables and DNS are two primary modes for service discovery in Kubernetes. True False
True
Etcd is a key-value store that provides a consistent distributed state for a Kubernetes cluster. True False
False It's only possible to use container name in a user-defined bridge network.
For a container to communicate with another container running on the default bridge network, one can use either the target container's ip address or its container name directly. True False
overlay network
For communication among containers running on different Docker daemon hosts, you should use bridge network overlay network
overlay network
For containers running on different host to communicate, you should use bridge network host network overlay network macvlan network
False
GPUs are too expensive to use at the massive cloud scale. True False
95%
Gartner predicts that through 2022, at least what percentage of cloud security failures will be caused by organizations / end users? 25% 50% 75% 95%
True
Google Anthos is a framework for Hybrid Cloud deployments True False
ALL
How can you mount a storage location on the host to a container? Bind mount Volume tmpfs
They make a docker container of the inferencing code, and keep a reference to a BLOB storage bucket where the trained model's parameters are stored. Any time a HTTPS request for a model inference arrives, they launch the container, which fetches the parameters and runs the inference.
How do cloud providers technically handle model deployment? They make a docker container of the inferencing code, and keep a reference to a BLOB storage bucket where the trained model's parameters are stored. Any time a HTTPS request for a model inference arrives, they launch the container, which fetches the parameters and runs the inference. Amazon Sagemaker stores all trained models in DynamoDB. Upon an HTTPS request, it asks DynamoDB for the model and runs the proper pre-written algorithm along with parameters fetched from DynamoDB. They keep a pool of virtual machines active, so that any time a HTTPS request for a model inference arrives, one of the VMs is ready to fetch the model artifacts from the model repository and run it. T he models are stored in javascript. Any browser that wishes to run a model fetches the model parameters from a cloud-based BLOB storage, and simply runs the model code locally.
It regularly uses "ping" messages. It uses, ping messages, keep-alives to make sure that everything is actually responding and doing processing.
How does Pregel detect the failure? The master periodically instructs the workers to save the state of their partitions to persistent storage. Each worker communicates with the other workers. It regularly uses "ping" messages. The workers all reload their partition state from the most recent available checkpoint.
Allows Storm to be used from many language Thrift allows users to define and create services which are both consumable by and serviceable by numerous languages
How does Thrift contribute to Storm? Enables the usage of streams Provides load-balancing functionality Provides scalability Allows Storm to be used from many language
Trident has first class support for state, but the exact implementation is up to the application developer.
How does Trident treat state? Trident has first class support for state and is completely automatic without the need of any help from the application developer. Trident has first class support for state, but the exact implementation is up to the application developer. Trident does not have any support for state. None of the above.
When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in an in-memory filesyste
How does a service get access to secret information in Docker Swarm? When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in an in-memory filesystem When you grant a newly-created or running service access to a secret, the encrypted secret is mounted into the container in an in-memory filesystem . The program in the container needs to decrypt the secret using the appropriate master key. When you grant a newly-created or running service access to a secret, the decrypted secret is mounted into the container in a disk mounted location
The master periodically instructs the workers to save the state of their partitions to persistent storage.
How is checkpointing done in Pregel? The workers all reload their partition state from the most recent available checkpoint. The master periodically instructs the workers to save the state of their partitions to persistent storage. Each worker communicates with the other workers. It regularly uses "ping" messages.
The workers all reload their partition state from the most recent available checkpoint. where the master re-assigns the graph portions to currently available work, workers so you can share out what's the job that's not being finished, you could share that out to other workers that are alive and can process the system. And the workers just reload their partition state for the most available check point and then continue.
How is recovery being done in Pregel? The workers all reload their partition state from the most recent available checkpoint. The master periodically instructs the workers to save the state of their partitions to persistent storage. It regularly uses "ping" messages. Each worker communicates with the other workers.
User define
How many master node does Kubernetes has? One Two User define
OLAP
If the vast majority of your queries involve a very large number of rows but only a few columns, which system is a more natural fit? OLTP OLAP
OLTP
If the vast majority of your queries only involve a few rows, but involve most or all of the columns for those rows, which system is a more natural fit? OLTP OLAP
FPM
If we want to find which set of items in a grocery shop are frequently bought together, which of the following approaches should we use? K-Means Naïve Bayes Decision Forests FPM
OLAP Datacubes RDBMS are often limited by the constraints of SQL
If your primary interest is the richest possible analysis capabilities, which of these two options would likely be the better choice? Column-Oriented Data Warehouse OLAP Datacubes
ALL
In Docker Swarm, a service is used to handle Launching and monitoring tasks Rolling updates Network routing
True
In Docker Swarm, ingress is a overlay network that handles control and data traffic related to swarm services. True False
Map: A,C Reduce: B
In K-means done on Map Reduce, which of the following steps is done in the Map phase, and which in the Reduce phase? A. For each data point, assign to the closest centroid B. For each cluster, re-compute the centroids C. Read the k centroids Map: C Reduce: A,B Map: C,B Reduce A Map: A,C Reduce: B Map: A Reduce: B,C
D -> A -> C -> B
In K-means, what is the order of the following steps? A. For each data point, assign to the closest centroid B. If new centroids are different from the old, re-iterate through the loop C. For each cluster, re-compute the centroids D. Randomly select k centroids
False
In Kubernetes users need to take care of mapping container ports to host ports. True False
Launching and monitoring pods
In Kubernetes, a ReplicaSet takes care of Launching and monitoring pods Rolling updates Network routing All above
False Ted Malaska's opinion is that SQL will be around forever
In Ted Malaska's opinion, SQL is dying, and will eventually be replaced by NoSQL approaches. True False
databases
In Ted Malaska's opinion, Spark & Flink should be categorized as databases streaming services deep archive systems analytics engines
QuoteSplitterBolt, WordCountBolt, SortBolt, MergeBolt
In a Storm program that produces a sorted list of the top K most frequent words encountered across all the documents streamed into it, four kinds of processing elements (bolts in Storm) might be created: QuoteSplitterBolt, WordCountBolt, MergeBolt, and SortBolt. What is the order in which words flow through the program? WordCountBolt, QuoteSplitterBolt, SortBolt, MergeBolt QuoteSplitterBolt, WordCountBolt, SortBolt, MergeBolt QuoteSplitterBolt, SortBolt, WordCountBolt, MergeBolt WordCountBolt, QuoteSplitterBolt, MergeBolt, SortBolt
Obtaining data, scrubbing data, exploring the dataset, train and evaluate a model, and interpreting the results
In a typical data science workflow, what are the steps involved? Obtaining data, scrubbing data, exploring the dataset, train and evaluate a model, and interpreting the results Obtaining data, data cleaning, model training, model exploration, model deployment Model training, model exploration, cleaning the outcomes, interpreting the results. Cleaning data, exploring data, model training and evaluation, obtaining results, deploying the model
False
In coming years, it is likely that cloud computing will become more and more centralized in the USA, while cloud computing in other areas of the world will decline. True False
False Correct! OLAP cubes require that data teams manage complicated pipelines to transform data from a SQL database into OLAP cubes
In general, it is very easy and straightforward to transform data from a SQL database into an OLAP cube. True False
Events are double processed.
In the "At Least Once" message process, what happens if there is a failure? Events are double processed. Storm's natural load-balancing takes over. Storm's natural fault-tolerance takes over. You must create and implement your load-balance algorithm.
Compound Annual Growth Rate
In the context of this week's lectures, CAGR stands for Coursera Approved Grading Results Cloud Assisted Gradient Regressor Compound Annual Growth Rate Computer Augmented Graphical Reality
False There is only one address space in unikernel. Applicaiton can be seen as running in kernel mode the whole time.
In unikernel, user application can transit to kernel mode using special instructions. True False
True
In x86, kernel mode code runs in ring 0 while user processes run in ring 3. True False
False Making changes to unikernel requires recompilation. Unikernel normally only runs one application.
Is it possible to install a second application with different dependencies into an existing unikernel? True False
False
It is likely that regulated industries will NOT be able to move to the cloud due to regulations such as the GDPR. True False
False
It is likely that the GDPR will completely prevent most regulated industries in Europe from moving to the cloud. True False
None above
Kubernetes can be classified as is a Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Software as a Service (Saas) None above
False Kubernetes is deprecating Docker as a container runtime after v1.20.
Kubernetes can only support Docker container runtime. True False
True
Kubernetes provides a platform for automating deployment, scaling, and operations of application containers across clusters of hosts. False True
True The VM simulates enough hardware to allow an unmodified guest OS (one designed for the same CPU) to be run in isolation.
Kubernetes provides a platform for automating deployment, scaling, and operations of application containers across clusters of hosts. True False
True
Messages in Pregel are delivered exactly once. True False
False Microsoft predicts that 90% of new apps will be created with low-code / no-code tools
Microsoft believes that approximately 90% of new apps in the next five years will be created by C# developers True False
90%
Microsoft predicts that over the 2020-2025 timespan approximately what percent of new apps will be created with low-code / no-code tools? 10% 30% 75% 90%
building Datacubes to running OLAP workloads directly on columnar databases
Over the past decade (2010-2020), the momentum seems to be shifting from building Datacubes to running OLAP workloads directly on columnar databases running OLAP workloads directly on columnar databases to building sophisticated Datacubes
False Pregel uses a master/worker model
Pregel works on an egalitarian model - all of the nodes perform the same functions and have the same responsibilities True False
False
Pregel's message passing API guarantees message delivery order True False
False. In Redshift, blocks are immutable. In general, Columnar Stores are not good at updates compared to other approaches
Redshift, like most Columnar Stores, makes it easy to update blocks. True. Redshift, like most Columnar Stores, are write-optimized, so updates are easy False. In Redshift, blocks are immutable. In general, Columnar Stores are not good at updates compared to other approaches True, though this property is rarely used in practice since Columnar Stores are primarily utilized by read-heavy applications False. Redshift is not a Columnar Store, but a data pipeline that connects Columnar Stores to analysis engines
ALL
Select all that apply: What are some commonly available datacube operations? Slicing Dicing Drill Up / Down Roll-up Pivot
ZooKeeper JobTracker TaskTracker Worker
Select all that apply: which are elements of the Giraph framework? ZooKeeper JobTracker TaskTracker Worker Hadoop
False
Sidecar model requires two nodes to work. True False
False In fact, employment in this area is projected to grow faster than average
Since AI and automation are being applied more and more to the maintenance and management of clouds, it is forecasted that the employment of computer and IT occupations will fall over the next five years. True False
Significantly more than 5%
Suppose a table contains 10000 rows and 100 columns. A query that uses all of the rows and 5 columns will need to read approximately what percentage of the data contained in the table if you are using a traditional row-based RDBMS system? Significantly less than 5% Approximately 5% Significantly more than 5%
True
Ted Malaska would agree with this opinion: Storm is not a competitor with Spark and Flink in the real-time database world. True False
an order of magnitude or more larger than the adjacency list of the web (modeled as a collection of vertices and edges) A model of the web would have approximately 50 billion vertices, 1 trillion edges, and a 30 TB adjacency list, while a model of the brain would have approximately 100 billion vertices, 100 trillion edges, and a 2.84 PB adjacency list
The adjacency list of the brain (modeled as a collection of vertices and edges) is an order of magnitude or more larger than the adjacency list of the web (modeled as a collection of vertices and edges) about the same size as the adjacency list of the web (modeled as a collection of vertices and edges) an order of magnitude or more smaller than the adjacency list of the web (modeled as a collection of vertices and edges)
5.4
The latest CentOS comes with Linux kernel version 4.18. If you are running a latest CentOS container on a Ubuntu with kernel version 5.4, which kernel version would you see inside the container? 4.18 5.4
False There is expected to be a shortfall of one million developers, so there should be plenty of work for everyone!
The low-code / no-code paradigm is a major threat to the employment prospects of developers and programmers around the world. True False
False Today, OLAP cubes refer specifically to contexts in which these data structures far outstrip the size of the hosting computer's main memory
Today, OLAP cubes are always designed to fit in the hosting computer's main memory to maximize analytical performance True False
True
Traditionally, OLAP cubes were known for extreme performance advantages over row-oriented RDBMS True False
True
Virtual machine using full virtualization can only run guest OS designed for one type of CPU (the same as host). True False
Sources of streams
What are spouts in Apache Storm? Unbounded sequences of tuples Network of spouts and bolts Processors of input Sources of streams
Unbounded sequences of tuples
What are streams in Apache Storm? A network of spouts and bolts Unbounded sequences of tuples Aggregators Processors of input
Hyperparameter optimization tunes the training parameters of a single training algorithm, while AutoML tries out multiple training algorithms on the input dataset.
What are the definitions of hyperparameter optimization and AutoML? Hyperparameter optimization tunes the training parameters of a single training algorithm, while AutoML tries out multiple training algorithms on the input dataset. Hyperparameter optimization means adjusting the parameters of a search space using gradient descent, and is a technical term. AutoML is a special case of hyperparameter optimization, and is marketing jargon. They are both the same and used interchangeably. Hyperparameter optimization refers to adjusting the parameters of a hyper plane that divides the search space in equidistance quadrants. AutoML means the cloud provider takes care of orchestrating machine learning artifacts' deployment.
Networks of spouts and bolts
What are topologies in Apache Storm? Unbounded sequences of tuples Sources of streams Processors input Networks of spouts and bolts
ALL
What does AWS SageMaker provide? A system to build ML models Managed Notebooks Hyperparameter Optimization tools Capability to detect concept drift
Provides a persistent state for the bolts, but the exact implementation is up to the user
What does Trident do? Provides a persistent state for the topology, with a predefined set of characteristics Provides a persistent state for the bolts, but the exact implementation is up to the user Provides a persistent state for the spout, but the exact implementation is up to the user Provides a persistent state for the bolts, with a predefined set of characteristics
Explore Data
What does the "E" in OSEMN stand for? Engineer Features Extract Features Explain Results Explore Data
Responsible for coordination Master is responsible for coordination: • Assigns partitions to workers • Coordinates synchronization • Requests checkpoints • Aggregates aggregator values • Collects health statuses
What is Master's role for task assignment in Giraph? Responsible for the state of computation Communicate with other workers Responsible for vertices Responsible for coordination
Responsible for vertices Worker is responsible for vertice: • Invokes active vertices compute() function • Sends, receives, and assigns messages • Computes local aggregation values
What is Worker's role for task assignment in Giraph? Responsible for the state of computation Communicate with other workers Responsible for vertices Responsible for coordination
Responsible for the state of computation ZooKeeper is responsible for computation state: • Partition/worker mapping • Global state: #superstep • Checkpoint paths, aggregator values, statistics
What is ZooKeeper's role in task assignment in Giraph? Responsible for the state of computation Responsible for coordination Responsible for vertices Communicate with other workers
A graph database is a storage system that provides index-free adjacency
What is a graph database? A graph database is any storage system that provides index adjacency A graph database is a storage system that holds "maps" and "routes" A graph database is a storage system that provides index-free adjacency A graph database is any storage system where the data rests on the vertices and is computed on the edges
A recommendation engine working based on the user preferences and others with similar preferences Collaborative filtering is to have multiple filters working together to extract just the information you want.
What is an example of a collaborative filtering application? Finding the frequent item sets frequently bought together Placing new items into predefined categories Grouping similar object together without knowing the groups ahead A recommendation engine working based on the user preferences and others with similar preferences
Finding the frequent item sets frequently bought together this is an example of frequent pattern mining
What is an example of an FPM application? Finding the frequent item sets frequently bought together Placing new items into predefined categories Grouping similar object together without knowing the groups ahead A recommendation engine working based on the user preferences and others with similar preferences
A graph database in any storage system that provides index-free adjacency with a graph data base with all the information in. It's going to provide some way of retrieving their data and typically a graph database is going to provide index-free adjacency.
What is graph processing? A non-relational, distributed database A distributed real-time computation system A graph database in any storage system that provides index-free adjacency A framework for distributed storage and processing of large data sets
The process of eliminating events to keep up with the rate of events
What is load shedding? Distributing applications across many servers The process of eliminating events to keep up with the rate of events Enabling a system to continue operating properly in the case of the failure of some of its components Distributing the data across different parallel computing nodes
t is micro-batch, which increase the minimum end-to-end latency of the system. the disadvantage is that it's not really streaming in the strict sense. it's really batches data, and then runs that batch of data very quickly.
What is the main disadvantage of Spark Streaming? It does not handle failures. It is micro-batch, which increase the minimum end-to-end latency of the system. It lacks a rich eco-system of big data tools. It does not support state
Pod
What is the smallest control unit in Kubernetes? Deployment Node Pod
ALL except Cryptocurrency
What technologies / concepts are part of the "Digital Transformation"? Cloud Big Data AI / ML IoT Cryptocurrency
True
When a user application needs to handle an interrupt, it has to enter kernel mode. True False
False
When you publish a service port, the swarm makes the service accessible at the target port only on nodes that have a task running for that service. True False
AWS QuickSight
Which best fits the category of graphical BI tool? AWS QuickSight Kibana Azure Analysis Services AWS Glue
kube-proxy
Which component of Kubernetes cluster node is responsible for implementing a form of virtual IP for Services? kubelet kube-proxy container Runtime
kubelet
Which component of Kubernetes cluster node is responsible for responsible for managing the pods? kubelet kube-proxy API Server
etcd
Which component of Kubernetes master node is responsible for storing the state of the cluster? controller scheduler etcd
Sidecar pattern
Which design pattern will you use for adding HTTPS to a legacy service? Sidecar pattern Ambassador pattern Adapter pattern
Third
Which generation of hardware virtualization introduced IOMMU virtualization? First Second Third
Clustering k-means clustering aims to partition n observations into k clusters
Which group does K-means fall into? Collaborative filtering Frequent pattern mining Clustering Classification
Having a minimal security protection. Using minimal device model and kernel configuration can reduce attacker surface of microVM and does not reduce security protection.
Which is not the reason that microVM is faster than a normal VM Having a minimal device model. Having a minimal security protection. Having a minimal guest kernel configuration.
bridge
Which is the default Docker network driver? bridge host overlay
Running each application in a separate container.
Which is the recommended practice in the following scenario? "Multiple applications with communication requirements" Running all applications in one container. Running each application in a separate container.
Only one stream processing pipeline but with the ability to handle failures In Kappa Architecture, they try to get away from the two parallel paths and they just do the streaming but they try to do streaming good enough so that if there are failures the state doesn't get messed up.
Which of the following best describes Kappa architecture? A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline A serial processing pipeline of first a streaming processing system and then a batch processing system A serial processing pipeline of first a batch processing system and then a stream processing system Only one stream processing pipeline but with the ability to handle failures
A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline Lamda Arch has two parallel data processing paths.The first processing path would use a stream event processing system like storm and Then on the parallel path you have the batch processing system
Which of the following best describes Lambda architecture? (Not to be confused with AWS Lambda). A serial processing pipeline of first a batch processing system and then a stream processing system Only a stream processing pipeline but with the ability to handle failures A serial processing pipeline of first a streaming processing system and then a batch processing system A parallel processing pipeline of two branches: a stream processing pipeline and a batch processing pipeline
A set of labeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. Naive Bayes is a supervised classification method
Which of the following best describes how Naïve Bayes works? A set of unlabeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. A set of labeled data points are given. A model based on those data points is built such that for any new unlabeled data point the label is determined. A set of items is given, and the most frequent set of items is found. A set of data points are given and it classifies them into k groups
A bolt processes input streams and produces new streams.
Which of the following is correct? A topology is a network of tuples and streams. A stream connects a bolt to a spout. A plant jar can receive output from many streams. A bolt processes input streams and produces new streams.
K-means K-means is a clustering method
Which of the following is not a classification mechanism? Naïve Bayes K-means Convolution Neural Network Classifier Decision forests
HDFS HDFS is the storage part of Hadoop
Which of the following is not a component of the Storm Architecture? Worker Supervisor Zookeeper HDFS Nimbus
Kube-proxy
Which of the following is not a component of the master node? Scheduler Controller Kube-proxy API server
Create container image based on user specified requirements.
Which of the following is not a function of Kubernetes? Schedule containers to run on physical or virtual machines. Create container image based on user specified requirements.
Replaces the functionality of Dockerfile. It still needs Dockerfile to create images.
Which of the following statements about Docker Compose is incorrect? Uses a YAML file to configure application's services. Replaces the functionality of Dockerfile. Main tool by Docker for container orchestration
The routing mesh uses IP based service discovery and load balancing. The routing mesh uses port based service discovery and load balancing.
Which of the following statements is NOT true about Docker routing mesh? The routing mesh enables each node in the swarm to accept connections on published ports for any service running in the swarm. The routing mesh uses IP based service discovery and load balancing. By default all nodes participate in an ingress routing mesh.
Spark Streaming chops a stream into small batches and processes each batch independently. it is called micro-batch
Which of the following statements is true? Spark Streaming treats each tuple independently and replays a record if not processed. Spark Streaming chops a stream into small batches and processes each batch independently. Spark Streaming has no support for state. Spark Streaming uses transactions to update state.
Associative data sets Graph database has a bunch of associative data sets, so you look up one item and you retrieve a different item from it like a node and it's connection to and age.
Which of these is a property of a graph database? Associative data sets Performs the same operation on large numbers of data Uses a relational model of data Entity type has its table
BigTable As a wide-column store, BigTable is specialized for access by a key
Which of these would probably be best for storing data retrieved by a key or a sequence of keys? MonetDB SybaseIQ BigTable Vertica
Infrastructure as a Service
Which one is preferred by enterprise as of 2020? Platform as a Service Infrastructure as a Service
Infrastructure as a service
Which one is preferred for security reason? Infrastructure as a service Platform as a service Function as a service Software as a service
NiFi in Nifi you can design a graph to process your data
Which system has a great graphical UI to design dataflows? NiFi Sform Druid Spark Streaming
RDBMS
Which system is a more natural fit for OLTP? Datawarehouse Data Lake Managed Machine Learning platforms RDBMS
Druid Druid provides tricks that processes OLAP queries fast
Which system is best for Online Analytical Processing (OLAP)? NiFi Storm Druid Spark Streaming
overlay network
Which type of network connects multiple Docker daemons together and enables swarm services to communicate with each other. bridge network host network overlay network macvlan network
Hardware-assisted
Which type of virtualization is feasible for the following scenario? "A service needs to run an unknown and unmodified OS on an advanced processor." Para-virtualization Hardware-assisted
Full virtualization
Which type of virtualization is feasible for the following scenario? "A service needs to run an unmodified OS on a basic processor, separate from the host operating sysetm." Full virtualization Container Para-virtualization
Hardware-assisted full virtualization
Which type of virtualization is feasible for the following scenario? "Application that needs different custom operating systems (kernels)" Para-virtualization Hardware-assisted full virtualization Containers
No virtualization Every sort of virtualization technology has some kind of performance impact. For the absolute best performance, no virtualization is the best option.
Which type of virtualization provides better performance for the following scenario? "One application running on a single piece of hardware" Full virtualization Containers No virtualization JVM
Containers
Which type of virtualization provides better performance for the following scenario? "Running multiple independent applications sharing the same kernel" Hardware-assisted full virtualization Containers
Full virtualization
Which type of virtualization provides better performance for the following scenario? "Running two independent applications, each needs a different version of a kernel module". Full virtualization Containers
Containers
Which type of virtualization provides better performance for the following scenario? "Multiple applications with high memory usage" Containers Full virtualization through binary translatio
Host OS (Base OS) kernel
Who is responsible for scheduling and memory management when using containers? Virtual Machine Manager Hypervisor Host OS (Base OS) kernel Supervisor
In fact, Pregel does not use MapReduce, because MapReduce produces too much communication between stages. MapReduce tends to be inefficient because the graph state must be stored at each stage of the graph algorithm, and each computational stage will produce much communication between the stages.
Why does Pregel use MapReduce? Pregel does not use MapReduce, because Pregel was created long before MapReduce MapReduce is widely known and used by cloud developers, so Pregel uses MapReduce to ease developer burden In fact, Pregel does not use MapReduce, because MapReduce produces too much communication between stages. MapReduce is a well-established and effective tool that can easily solve most graph problems
It produces too much communication between stages. MapReduce tends to be inefficient because the graph state must be stored at each stage of the graph algorithm, and each computational stage will produce much communication between the stages.
Why is MapReduce not efficient for large-scale graph processing? It brings load imbalance. It is fault-tolerance enough. It produces too much communication between stages. The map function is computationally a bottleneck.
False With an OLAP system, the typical query involves one column, but an OLTP query typically involves most or all of the columns in a row
With an OLTP system, the typical query involves one column True False
No Ted Malaska's opinion, Hadoop is made up of two fundamental projects: Map-Reduce and HDFS. Spark replaces Map-Reduce, but HDFS is replaced by Object Stores, not Spark.
Would Ted Malaska agree that Spark replaces Hadoop? Yes No
True aravirtualization is a software-only virtualization approach.
Xen does not require special hardware support, such as Intel "VT-x" or "AMD-V". True False
Bind mount
You are working on a course MP and want to use the IDE on host to edit codes and run the codes inside a container. Which is the best way to make the codes accessible inside the container? Bind mount Volume tmpfs
5.4 Container always uses host kernel.
You built a latest CentOS container on an Ubuntu host with kernel version 4.18. After you upgraded the Ubuntu kernel to 5.4, what will be the kernel version used by the built CentOS container? 5.4 4.18
compute(list of messages) -> return list of messages
You want to build a shortest path algorithm using parallel breadth-first search in Pregel. Which of the following pseudo-codes is the proper "compute function" for this program? compute(list of messages) -> return list of messages compute(list of vertexes) -> return list of messages compute(list of edges) -> return list of messages compute(graph) -> return list of messages