Midterm Prep for CS 498: Cloud Computing Applications
Remote Procedure Call
"RPC" stands for Remote Procedure Call Rolling Policy Check Really Predictable Code Realtime Performance Clusters
Google Protocol Buffer
A Which of the following RPC frames is created by Google to deal with communication in big data deployment? Apache Thrift Google Protocol Buffer JVM
False Big Data, by definition, cannot be processed in a timely manner by a single standard computer.
A commercial off the shelf laptop, disconnected from the internet, has the storage, memory, and computational capacity to process Big Data in a timely manner. True False
No Cloud computing only makes economic sense when the Utility Premium is less than the ratio of Peak demand to Average demand. If Peak demand falls while Average demand remains steady, cloud computing becomes less economically attractive than before."
A company determines that the most economical decision is to use in-house servers. Over time the company's peak demand for computing resources decreases sharply, while its average demand remains steady. Should the company consider switching from in-house servers to a cloud approach? Yes No
GSI It is tolerable if the tweets of a user show up one client and does not show up on another client immediately. GSI follows Eventual consistency model and ensures that the view of the clients will be the same eventually. GSI also does not constrain table size and hence can be used to store a large number of tweet
A social media company wants to use DynamoDB for storing posts of users. Which secondary indexing method should it use? LSI GSI
False It is an interval, not a single value.
A timestamp in the TrueTime API is a single value, exposing clock uncertainty. True False
False The interval exposes the clock uncertainty.
A timestamp in the TrueTime API is an interval, which masks the clock uncertainty. True False
False, AWS Lambda functions are not allowed to run for more than a few minutes
AWS Lambda is a good technology to use when you have a function that will take several days to run. True, AWS Lambda is optimized for long-running jobs False, AWS Lambda functions are not allowed to run for more than a few minutes
False Once an object is deleted, it may take some time before all of the replicated copies of the object are deleted. Other processes may still read those replicated copies before they are all deleted.
Amazon S3 BLOB Storage's consistency model guarantees that once an object is deleted by a process it cannot be read by any other process. True False
False
Apache Spark cannot read from any Hadoop input. True False
MapReduce
Apache Spark was created to solve the shortcomings of which technology? Juju Apache Storm YARN MapReduce
False In Ceph, the data and metadata are decoupled. Ceph improves performance by: * limiting interaction between clients and servers * leveling metadata load * offloading decision making to the many data servers
Ceph achieves high performance in part by storing data and its accompanying metadata in the same server for faster access. True False
located on disks that are physically attached to the host computer well-suited for caching well-suited for temporary logs
Check all that apply: AWS Instance Store is located on disks that are physically separated from the host computer located on disks that are physically attached to the host computer well-suited for long-term persistent storage well-suited for caching well-suited for temporary logs
ALL of them
Choose all that apply: Which are examples of clustered file systems? Lustre NFS SMB Ceph
Microsoft OneDrive Apple iCloud Drive Dropbox
Choose all that apply: Which of these are Internet-Level Personal Filesystems? Microsoft OneDrive Glacier Redis Apple iCloud Drive Dropbox
Negative Correlation Jobs with negative correlation have a small coefficient of variation, which can lead to higher utilization.
Cloud providers prefer variable jobs that have Positive Correlation Negative Correlation
False Cloud services do not need to be cheaper to be economical, as long as the utility premium is less than the ratio between peak demand and average demand.
Cloud services need to be cheaper (e.g. through economies of scale) to be economical compared to in-house servers. True False
False Partitions are immutable
Each topic has partitions and each partition is ordered, numbered, and mutable. True False
False Even though AWS Instance Stores can handle higher throughput because they are located on disks that are physically attached to the host computer, experiments show that their edge over EBS in throughput tests is not orders of magnitude, a testament to the efficiency of NVM over Fiber technologies and data center networking designs.
Experiments have shown that AWS Instance Stores perform at least an order of magnitude or more better than AWS Elastic Block Stores in throughput tests in comparable settings. True False
some availability for stronger consistency
Compared to NoSQL, NewSQL sacrifices partition tolerance for high availability some availability for stronger consistency strong consistency for high availability strong consistency for partition tolerance
False CosmosDB supports multiple consistency models, and one of them is strong consistency
CosmosDB is an eventually-consistent system, so if you want strong consistency, you must use a different system. True False
False The CAP theorem says that no system can achieve guaranteed perfection in consistency, availability, and partition tolerance. Google Spanner can only achieve consistency and partition tolerance (CP).
Google Spanner is the first real-world system to achieve guaranteed consistency, availability, and partition tolerance (CAP) True False
False Index servers and document servers are backend workers, and the Google Web server is the frontend.
Google's cluster architecture uses index servers as frontend load-balancers, and document servers to hold the indexed information. True False
True
Graph databases are considered one of the four NoSQL categories. True False
False HBase is built on HDFS.
HDFS is built on HBase True False
False HFiles are immutable, so the key-value pairs cannot be updated.
HFiles are optimized so users can quickly update the values associated with keys. True False
binlog is on the master, and the relay log is on the slave
In multinode RDBMS with master/slave asynchronous replication, normally the binlog and relay log are both on the master binlog is on the master, and the relay log is on the slave binlog is on the slave, and the relay log is on the master binlog and relay log are both on the slave
permanently saving state
Once you upload your code, AWS Lambda does NOT automatically handle autoscaling capacity provisioning fault tolerance permanently saving state
Data parallelism
Select all of the solutions provided by Hadoop. Caching for loop-invariant data Caching for fixpoint evaluation Data parallelism Loop-aware task scheduling
ALL except for extract
Select all of the transformations which create an RDD. extract groupBy map join filter
ALL of them
Select all of which are properties of HDFS: Synergistic with Hadoop Throughput scales with attached HDs Massive throughput Optimized for reads, sequential writes, and appends
Is a persistant ordered immutable map from keys to values Is an on disk file format representing a map from a string to a string Is stored in HDFS
Select all that apply: An HFile: Is a persistant ordered immutable map from keys to values Is a distributed collection of objects Is an on disk file format representing a map from a string to a string Is stored in HDFS
ALL of them
Select all that apply: Spark SQL can read data from HIVE tables HDFS JSON
True
The HBase master assigns an HRegion to HRegion servers. True False
True
The Hive approach translates queries into map/reduce jobs. True False
Voice over IP (VoIP) Occasional packet loss is an acceptable tradeoff given the speedy packet delivery for VoIP
UDP is best suited for which of the following tasks? Encrypted communications Voice over IP (VoIP) Communication between computers on the same rack in a data center Reliablely transmiting a medium-sized file over a congested link
In a massive multiplayer game
Under which scenario would we be better off using HTTP Streaming API? Initiating a database table In a massive multiplayer game Requesting an invoice data structure from a financial server
Yet Another Resource Negotiator
What is YARN? Your Adept Resource Negotiator Your Adept Reason Negotiator Your Applicable Resource Negotiator Yet Another Resource Negotiator
Infrastructure-as-a-Service
What is the best model of delivery for the following scenario? "A custom, lighting-fast storage solution for gigantic amount of data" Software-as-a-Service Platform-as-a-Service Infrastructure-as-a-Service
Platform-as-a-Service
What is the best model of delivery for the following scenario? "A web hosting solution for PHP web applications" Platform-as-a-Service Infrastructure-as-a-Service Software-as-a-Service
Software-as-a-Service
What is the best model of delivery for the following scenario? "An Electronic Health Record system for clinics and doctors" Platform-as-a-Service Infrastructure-as-a-Service Software-as-a-Service
Reduce phase can't start until Map phase is completely finished
What is the bottleneck of the MapReduce programming model? Combine phase can't start until Reduce phase is completely finished Reduce phase can't start until Map phase is completely finished Combine phase can't start until Map phase is completely finished Map phase can't start until Reduce phase is completely finished
JSON has no namespaces while XML has
Which of the following about JSON and XML is correct? JSON and XML does not have security issue JSON does not have security issue while XML has JSON has namespaces while XML has JSON has no namespaces while XML has
Bandwidth
Which of the following are not advantages of VPC Security Bandwidth Flexibility Data control
Amazon AWS Lambda Lambda is an example of a FaaS
Which of the following is NOT considered a PaaS? Amazon AWS Lambda Amazon Elastic BeanStalk Google AppEngine Microsoft Azure App Service
XML JSON
Which of the following is main formats of data representation? XML JSON HTTP
fetch
Which of the following is not a transport method? flush read fetch write
Resend
Which of the following is not an event type of WebSocket? Message Error Open Resend
Message Queuing Systems can handle change in demand.
Which of the following is true? Producer and Consumers have to coordinate with each other in Message Queue systems. Producers and Consumers must communicate synchronously in Message Queue systems. Message Queuing Systems can handle change in demand. AWS Simple Queue Service follows the Publish Subscriber Model.
Write Through
Which of the following prevents stale data in cache? Cache Aside Write Through Write Back Both "Write though" and "Write Back"
Incoming traffic from Internet cannot access the private subsets Outgoing traffic from a private subnet cannot access the Internet
Which of the following statements about private subsets are true? Outgoing traffic from a private subnet can access the Internet Incoming traffic from Internet cannot access the private subsets Outgoing traffic from a private subnet cannot access the Internet Incoming traffic from Internet can access the private subsets
NAT
Which of the following technology can help private subnet access the Internet? NAT CIDR Internet Gateway
Apache Kafka
Which of the following would be best for the following use case? "Record user activity like page views, searches, and clicks on a website and make that data readily accessible and available to process in a streaming manner" Apache Kafka HBase Cassandra AWS Lambda
Enhance programmability Extend the MapReduce model to better support two common classes of analytic applications
Which of the followings state the purpose of Apache Spark? Enhance programmability Extend the MapReduce model to better support two common classes of analytic applications Eliminate locking found in the MapReduce model Add concurrency to the MapReduce model
Apple
Which of these companies is least involved in providing IaaS? Apple Microsoft Amazon Google
D-Streams
Which of these frameworks is not built on Spark? Mllib SparkSQL GraphX D-Streams
GMail
Which of these is an example of Software as a Service (SaaS) VMWare vCloud Google AppEngine Juju GMail
Metal as a Service (MaaS)
Which of these is not considered serverless computing? Function as a Service (FaaS) Platform as a Service (PaaS) Metal as a Service (MaaS)
Data parallelism across many machines
Which one of the following features is a main factor in the philosophy of Apache Hadoop? Caching for loop-invariant data Caching for fixpoint evaluation Data parallelism across many machines Loop-aware task scheduling
Asynchronous replication
Which replication strategy would likely have the fastest commits (at the expense of possibly weaker consistency)? Asynchronous replication Semi-synchronous replication Synchronous replication
Glacier Deep Archive
Which storage technology is the best for the following scenario? "An application that archives 1,000 TB of data for two years for compliance reasons" Dropbox Glacier Deep Archive AWS S3
Instance store Instance store provides storage solution that is terminates when the instance terminates or reboots.
Which storage technology is the best for the following scenario? "An application that stores 200 GBs of binary data for a few minutes and doesn't need it to be persistent if the instance running it fails" Glacier Instance store AWS S3
AWS S3
Which storage technology is the best for the following scenario? "An application that stores and queries 2,000 TB of binary data for a few weeks" Redis Glacier AWS S3
AWS S3 clearly AWS S3 would be ideal for storing data for just few weeks, analogous to using cloud computing over owning the machines
Which storage technology is the best for the following scenario? "An application that works with 2 TBs of sound files for a few weeks" Instance Store Lambda AWS S3
Swift Swift is best when dealing with huge sizes of images
Which storage technology is the best for the following scenario? "An application that works with 80 GBs of images on in-house data center" Swift AWS S3 HIVE
HIVE Hive's drive component allows complicated queries (SQL-like) which essentially can be done on structured data
Which storage technology is the best for the following scenario? "An application which needs complicated queries on structured data" HIVE AWS S3 MemCacheD
Ceph Ceph is ideal when dealing with structured data, it has better performance and speed
Which storage technology is the best for the following scenario? "Application needs to update structured data frequently" Swift Ceph HIVE
Local hard disk or instance store
Which storage technology is the best for the following scenario? "Application runs on a single node, needs to store 10GB of data for a few minutes" AWS S3 Local hard disk or instance store HDFS
Swift Swift is ideal when dealing with unstructured data like operating system's data and binaries
Which storage technology is the best for the following scenario? "Store an operating system and application binaries remotely" HIVE HDFS Swift
Dropbox
Which storage technology is the best for the following scenario? "Sync files on a few personal devices" AWS Glacier AWS S3 Dropbox Swift
DropBox
Which storage technology is the best for the following scenario? "Sync files on a few personal devices" DropBox AWS DynamoDB AWS S3 Google App Engine
JSON
Which technology can best address the following need? "Human readable representation of data" Thrift RPC JSON WebRTC
XML
Which technology can support addressing the following need? "Data representation for (un)marshalling on different machines and programming languages" WebRTC REST RMI XML
YARN
Which technology is the best suited for the following use case? "Assigning resources to a highly parallel application" YARN HDFS Spark Hadoop
Hadoop
Which technology is the best suited for the following use case? "Finding the set of words utilized in the Wikipedia website" Spark Hadoop YARN HDFS
Spark
Which technology is the best suited for the following use case? "Interactively exploring a new large dataset" YARN Hadoop Spark HDFS Amazon S3
HDFS
Which technology is the best suited for the following use case? "Storing a large set of images on thousands of computers" YARN Hadoop HDFS Spark
Spark
Which technology is the best suited for the following use case? "Training a machine learning model on a large dataset with several iterations" HADOOP Spark YARN HDFS
REST
Which technology will address the following need? "Create, Update, Read, and Remove objects over the web" JSON XML REST RMI
REST
Which technology will address the following need? "Create, Update, Read, and Remove objects over the web" JSON-RPC REST XML Websockets
XML
Which technology will address the following need? "Data representation for (un)marshalling on different machines and programming languages" MBaaS RMI XML REST
JSON
Which technology will address the following need? "Data representation using a dictionary with key and value" HTML XML TXT JSON
JSON
Which technology will address the following need? "Human readable representation of data" RPC REST JSON RMI
MBaaS
Which technology will address the following need? "Provides a way for mobile web applications to link to backend storage" MBaaS XML JSON REST
SOAP SOAP is a remote procedure call technology. CORBA could also work.
Which technology will address the following need? "Send method execution requests to a remote object" SSH SOAP XML Websockets
CORBA
Which technology will address the following need? "Send requests to a remote object" CORBA XML SSH Juju
Read replicas Read replicas for scalability; Multi-AZ deployments for high availability; Multi-Region deployments for disaster recovery and local performance.
Which would you choose for scalability (as opposed to availability or disaster recovery) Multi-AZ deployments Read replicas Multi-Region deployments
Apache Thrift, because it is scalable and easy to use the auto-generated RPC functions. Note: RMI, MPI are for communication in a local network.
What Communication framework or technology do many Big Data systems use, and why? Apache Thrift, because it is scalable and easy to use the auto-generated RPC functions. SOAP, because these frameworks are all enterprise systems. Remote Method Invoation (RMI), because it is the standard library of choice in Java. MPI, since MPI is extremely light weight and therefore provides high throughput and low latency.
PUT, GET, DELETE 4 verbs: PUT, POST, GET, DELETE
What HTTP verbs are used in RESTful APIs? PUT, POST, APPEND XML, JSON, SOAP PUT, GET, DELETE ATTACH, APPEND, PATCH
Kubernetes
What underlying technologies are typically utilized to offer serverless compute cloud offerings? HDFS file system Kubernetes Analytics and AI packages Metal as a Service provisioning systems
A process writes a new object to Amazon S3 and immediately attempts to read it. Until the change is fully propagated, Amazon S3 might report "key does not exist." A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list. Amazon S3 has a weak consistency model
Which of the following statements are true regarding Amazon S3? A process writes a new object to Amazon S3 and immediately attempts to read it. Until the change is fully propagated, Amazon S3 might report "key does not exist." Object writes to S3 immediately take effect. A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list. Amazon S3 has a weak consistency model
UDP, TCP
Which of the following technologies are transport layer systems in the Internet Protocol? IMAP, SSL, HTTP HTTP, WebSocket and SOAP TCP, IP UDP, TCP
Microsoft Azure CosmosDB Amazon DynamoDB
Which one of the following are examples of NOSQL key/value store cloud offerings? Microsoft Azure CosmosDB IBM Object Storage Google cloud Filestore Amazon DynamoDB
SOAP
What evolved as the successor to XML-RPC? JSON SOAP REST HTTP/2 Push
Distributed NOSQL key/value storage service
Amazon DynamoDB is an example of a Cloud-optimized SQL database Centralized Big-Data blob storage Distributed NOSQL key/value storage service Function as a Service (FaaS) dynamic container offering
replicating the data to multiple machines
Amazon S3 BLOB Storage aims to provide high availability primarily by relying on subcontractors to provide excess capacity using proprietary, expensive, high quality storage hardware that rarely fails only offering the service to users with predictable, regular workloads to limit congestion replicating the data to multiple machines
True
Amazon S3 BLOB Storage uses a weak consistency model. False True
Software as a Service (SaaS)
Applications are managed for you when using Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Software as a Service (SaaS) Metal as a Service (MaaS)
True
BLOB stands for "Binary Large OBjects" True False
Longest prefix match
For VPC, how to select the optimum route for network traffic? Longest prefix match Shortest prefix match Any prefix match
00001010.00001010.00000001.00011000 This is because it ranges from the last three bits 000 to 1111 Need to keep the first 29 digits the same.
Given the CIDR of IP4, 10.10.1.16/29, which of the following IP4 is out of range? 00001010.00001010.00000001.00010110 00001010.00001010.00000001.00010010 00001010.00001010.00000001.00010001 00001010.00001010.00000001.00011000
You have to declare a route through Amazon API Gateway.
How can you utilize a lambda function as an RPC using a Websocket call? Using Amazon CloudWatch to "watch" for incoming RPC calls, and routing them to the right Lambda function. You have to declare a route through Amazon API Gateway. By deploying an instance of a webserver using an approved web-server AMI on an EC2 instance, and connecting it to your lambda functions on AWS console.
Through OAuth protocol
How does the Dropbox API handle authentication? Through OAuth protocol Dropbox API doesn't offer authentication service, and third party plugins should be used. By using persistent cookies in the HTTP session
True
In Amazon AWS Aurora, the log is the database. True False
many users may share the same physical computer and database
In the context of cloud computing, multi-tenancy means many users may share the same physical computer and database many computers may share the same rack many cloud providers may share the same data center many data centers may share the same electric plant
False
Infrastructure as a Service, by definition, provides load balancing automatically for you. True False
False This market segment does not accept access pattern limits, so cloud providers use statistics to extract average access patterns and set pricing. On average, they make a profit but may lose money on some users with high access patterns.
Internet-level Personal Filesystems (like Dropbox) have strict access pattern limits, ensuring that they make a profit on each customer. True False
Availability
It is impossible for a distributed system to have guaranteed consistency, availability, and partition tolerance simultaneously (following the CAP theorem). Which does Google Cloud Spanner sacrifice? Consistency Availability Partition Tolerance
increase the utility of clouds, because the IoT devices will generate a huge amount of data that will be processed by the cloud
It is likely that a move towards Internet of Things (IoT) in the near future will increase the utility of clouds, because the IoT devices will generate a huge amount of data that will be processed by the cloud decrease the utility of clouds, because IoT brings computational power closer to the user, while cloud computing is far from the user have no effect on clouds, since they are totally unrelated technologies
A Dataset organized into named Columns
In Apache Spark SQL, what is a DataFrame? Equivalent to a column of data in a relational database A Dataset organized into named Columns A Spark RDD automatically transformed for SQL queries
Events that are collected by the cloud backend and trigger your function
In almost all serverless function as a service offerings, how are the functions called? The functions are periodically executed, with frequency depending on the observed traffic Events that are collected by the cloud backend and trigger your function You have to deploy a pub-sub middleware such as Apache or similar engines to collect events and route them to the right function
True Logging needs to be fast, and the in-memory approach to data storage that Redis takes is well suited to keep up with intense logging demands.
Logging is a good use case for Redis. True False
Row key, column key, timestamp
Map in HBase is indexed by: Key Row key, column key, index Row key, column key Row key, column key, timestamp
True
Multiplexing demand in a cloud infrastructure leads to higher utilization. True False
False, there is no "official standard"
NIST developed an official standard for REST, because it recognized the importance of interoperability in cloud environments. True, although IBM later developed their own competing standard True, and the success of the standard directly led to the recent explosive growth of the cloud industry False, NIST is opposed to the goal of interoperability False, there is no "official standard"
Consumers Kafka maintains feeds of messages in categories called Topics. Processes that publish messages to a Kafka topic are Producers. Kafka is run as a cluster comprised of one or more servers each of which is called a Broker.
Processes that subscribe to topics and process the feed of published messages are: Consumers Producers Brokers Topics
True
Software as a Service, by definition, provides load balancing automatically for you. True False
CREATE ALTER INSERT UPDATE DELETE
The following are logged in a binary log CREATE ALTER INSERT UPDATE DELETE SELECT SHOW
Because load balancers must make quick decisions, empirically more complex algorithms are likely to be slower than simpler algorithms
The primary reason most commonly-used load-balancing algorithms are relatively simple is Because load balancers must make quick decisions, empirically more complex algorithms are likely to be slower than simpler algorithms Tradition. There are well-known complex load balancing algorithms that work better than simple algorithms (like round-robin) in almost all situtations. Because an exhaustive theoretical analysis of the algorithms has determined that complex load balancing algorithms all have exponential runtimes To save the developer time writing and maintaining code, since load-balancing isn't very important
have three phases: opening handshake, data transfer, closing handshake
WebSockets requires the users to "poll" to receive data have three phases: opening handshake, data transfer, closing handshake have longer latency than RESTful approaches are built on top of UDP to minimize latency
Open, Close, Error, Message
What are the event types in WebSocket? onError, onMessage, onInit, onClose Open, Close, Error, Message Mmonitor, Connect, Disconnect, Send, Receive Open, Close, InitReceive, initSend
Isolation ACID: Atomicity Consistency Isolation Durability
What does the "I" in ACID stand for? Integrity Isolation Insistent Instance
Software that provides services to applications beyond those generally available at the operating system.
What is the function of Middleware? Software that provides services to applications beyond those generally available at the operating system. An architecture style where the state of the program is recorded in the operating system and not the user application, hence the term "Middle" in middleware. The set of Device Drivers that provide network support for the operating system. The kernel of an Operating System.
Cache aside
What policy does Memcached use? Cache aside Write through Write back Write front
Opening handshake, Data transfer, Closing Handshake
What three phrases does WebSocket have? Opening handshake, Sending message, Closing Handshake Opening handshake, Data transfer, Closing Handshake Opening SSH, Sending message, Closing SSH
Multi-AZ deployments
What would you choose for high availability? Multi-AZ deployments Multi-Region Deployments Read replicas
is stored across multiple availability zones (AZs) rather than one
When compared to Amazon EBS PIOPS, Amazon EFS (Elastic File Store) is well suited for NoSQL databases is stored across multiple availability zones (AZs) rather than one generally has lower throughput generally has lower per-operation latency
PaaS
Which *aaS is best described by "The unit of compute is a full app"? MaaS IaaS FaaS PaaS
Infrastructure-as-a-Service
Which approach is feasible for the following scenario with the minimum efforts? "ACME company needs to be able to change the cloud provider frequently." Infrastructure-as-a-Service Packaged software Platform-as-a-Service Software-as-a-Service
Infrastructure-as-a-Service
Which approach is feasible for the following scenario with the minimum efforts? "ACME company needs to deploy a system with a modified OS." Platform-as-a-Service Infrastructure-as-a-Service Packaged software Software-as-a-Service
Software-as-a-Service
Which approach is feasible for the following scenario with the minimum efforts? "ACME company needs to provide a widely used application for its marketing team." Packaged software Software-as-a-Service Infrastructure-as-a-Service Platform-as-a-Service
Cloud computing
Which approach is more economical for the following scenario? "A long-running business needs 10,000 computers for one-time data processing." Hybrid approach In-house servers Cloud computing
Hybrid approach
Which approach is more economical for the following scenario? "A long-running business serves 1,000 daily but 1,000,000 during the holiday session." In-house servers Hybrid approach Cloud computing
In-house servers
Which approach is more economical for the following scenario? "An established, mature business serves 10,000 users during business hours (9am to 5pm) and 100 users outside of business hours each day." Cloud computing In-house servers Hybrid approach
Cloud computing Setting up the infrastructure to serve 1,000,000 customers is time-consuming. Cloud computing should allow the startup to start serving customers much more quickly, beating their competitors to the market.
Which approach is the most sensible for the following scenario? "A new startup needs to quickly scale their infrastructure to serve 1,000,000 customers, or risk losing market share to their competitors." Cloud computing In-house servers Hybrid approach
ALL of them
Which are the benefits of service-oriented architecture (SOA)? Scalability Interaction Reduce costs Reusable Code
Partitioning-aware
Which feature of Spark Scheduler avoids extra shuffles? Dryad-like DAG Cache-aware work reuse and locality Partitioning-aware Pipelining functions within a stage
When in a database one small write results into multiple physical data writes because of the way the storage subsystem is designed.
Which guarantee can be relaxed for the following use case? "Data can be served on a single server." A general problem in distributed SQL database engines where an UPDATE query generates intermediate temporary data writing operations. It is a cascade effect in some SQL queries, where a write in one table results in more writes in other joined tables. When in a database one small write results into multiple physical data writes because of the way the storage subsystem is designed.
HBase
Which has better support for incremental addition of small batches (e.g. record-level insertion)? HBase HDFS
Strong-read weak-write
Which is NOT one of the CosmosDB Consistency Models? Consistent Prefix Eventual Bounded-staleness Strong Session Strong-read weak-write
Redis
Which is NOT one of the building blocks of HBase? Apache ZooKeeper HFile HDFS Redis
Unstructured Data
Which is NOT one of the four NoSQL categories? BigTable Unstructured Data Key-Value Document Graph DB
Ephemeral file system
Which is NOT one of the three layers of a file system? Logical file system Virtual file system Physical file system Ephemeral file system
HBase
Which is better for fast record lookup? HBase HDFS
Bandwidth is infinite
Which is not an advantage of resilient distributed datasets? Retain the attractive properties of MapReduce Allow apps to keep working sets in memory for efficient reuse Support a wide range of applications Bandwidth is infinite
VPC network does not use ARP while Physical Ethernet Network implements ARP
Which is one of the differences between Physical Ethernet Network and VPC network? Physical Ethernet Network does not use ARP while VPC network implements ARP Both of them intercept ARP request VPC network does not use ARP while Physical Ethernet Network implements ARP
Make communication between VPCs whether these two VPCs belong to the same account or different accounts.
Which is the function of VPC peering? Make communication between VPCs within the same account. Make communication between VPCs whether these two VPCs belong to the same account or different accounts. Only make communication between VPCs for different accounts
TCP
Which layer does WebSocket runs on top of? IP UPD TCP
DNS HTTP IMAP
Which of the follow protocol lies in Application Layer? DNS HTTP IMAP UDP
TCP UDP
Which of the follow protocols lies in Transport Layer? TCP HTTP UDP
ALL except APPEND
Which of the following HTTP verbs are used in REST APIs? DELETE APPEND GET POST
Function as a Service deployment should be stateless, and rely on an external storage service for state storage. (think about Lambda Function)
Which one of the following is correct Function as a Service deployment Should be stateless, and as such are severely limited in what they can accomplish. Function as a Service deployment support in-built state storage, accessible through a special state manipulation API. Function as a Service deployment should be stateless, and rely on an external storage service for state storage. Function as a Service deployment support state storage by launching an instance of a relational database in the same container, and can access it using SQL commands.
Because of the relaxed consistency requirements of S3, building the distributed system to support it is much easier.
Why do object stores such as AWS S3 cost lower than managed file systems such as AWS FSx Lustre? Because S3 is not replicated and therefore less reliable, while Lustre has replication.. Because of the relaxed consistency requirements of S3, building the distributed system to support it is much easier. Because storage in AWS S3 is ephemeral, while storage in Lustre is non-volatile.
IaaS
You are tasked with choosing between a PaaS and a IaaS approach. Flexibility is important in your company, and you must avoid being locked in by a vendor. Which should you choose? PaaS IaaS
Map(key = line, value = contents): for each word in value: emit intermediate (word, 1)
You want to build a word count program. Which of the following pseudo-code is the proper Map function for this program? Note that the indenting is not accurate. "Word Count Program: You have a huge text file that consists of many lines. The goal is to count the number of times each distinct word appears in the file." Map(key = line, value = contents): result = 0; for each word in value: result += value; emit(key, result) Map(key = line, values = uniq_counts): Sum all 1's in values list Emit result (word, sum) Map(key = line, value = contents): for each word in value: emit intermediate (word, 1) Map(key, values): for each value in intermediate values: value += 1; emit intermediate(key, values)
Reduce(key = word, values = uniq_counts): Sum all 1's in values list Emit result (word, sum)
You want to build a word count program. Which of the following pseudo-code is the proper Reduce function for this program? Note that the indenting is not accurate. "Word Count Program: You have a huge text file that consists of many lines. The goal is to count the number of times each distinct word appears in the file." Reduce(key = line, value = contents): result = 0; for each word in value: result += value; emit(key, result) Reduce(key = word, values = uniq_counts): Sum all 1's in values list Emit result (word, sum) Reduce(key, values): for each value in intermediate values: value += 1; emit intermediate(key, values) Reduce(key = line, value = contents): for each word in value: emit intermediate (word, 1)
Map(key = x,y, value = R,G,B) emit intermediate(key, value)
You want to build an image smoother program. Which of the following is the proper Map function for this program? Note that the indenting is not accurate. "Image Smoother Program: To smooth an image, use a sliding mask and replace the value of each pixel." Map(key = x,y value = list of R,G,B) compute average of R,G,B emit intermediate(key, average R,G,B) Map(key = x,y, value = R,G,B) emit intermediate(key, value)
Map(key = x,y, value = R,G,B) { emit intermediate(key, value) }
You want to build an image smoother program. Which of the following is the proper Map function for this program? Note that the indenting is not accurate. "Image Smoother Program: To smooth an image, use a sliding mask and replace the value of each pixel." Map(key = x,y value = list of R,G,B) { compute average of R,G,B emit intermediate(key, average R,G,B) } Map(key = x,y, value = R,G,B) { emit intermediate(key, value) }
Reduce(key = x,y value = list of R,G,B) compute average of R,G,B emit (key, average R,G,B)
You want to build an image smoother program. Which of the following is the proper Reduce function for this program? Note that the indenting is not accurate. "Image Smoother Program: To smooth an image, use a sliding mask and replace the value of each pixel." Reduce(key = x,y, value = R,G,B) emit (key, value) Reduce(key = x,y value = list of R,G,B) compute average of R,G,B emit (key, average R,G,B)
Reduce(key = x,y ; value = list of R,G,B) { compute average of R,G,B emit (key, average R,G,B) }
You want to build an image smoother program. Which of the following pseudocode is the proper Reduce function for this program? Note that the indenting is not accurate. "Image Smoother Program: To smooth an image, use a sliding mask and replace the value of each pixel." Reduce(key = x,y ; value = R,G,B) { emit (key, value) } Reduce(key = x,y ; value = list of R,G,B) { compute average of R,G,B emit (key, average R,G,B) }
Microsoft Azure App Service Since the application is already written, the best method would be PaaS model, where the unit of compute is a whole app.
You have a whole application already written that you want o deploy to the cloud without much re-architecturing. Which of the cloud models would best fit this scenario? Amazon EC2 IBM Cloudant Microsoft Azure App Service Oracle functions