CS 643 Cloud Computing Final

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Explain Brewer's CAP theorem and its three coupled systemic requirements.

Brewer's CAP theorem states that in a distributed system, there are three coupled requirements: Consistency, Availability, and Partition Tolerance. Changes in one requirement affect the others. The theorem informally suggests that a distributed system can achieve only two of the three mentioned requirements.

How can side-channel attacks, such as cache-based load measurement, be mitigated in the context of information leakage in the cloud?

Mitigation techniques include blinding techniques to minimize leaked information, such as cache wiping, random delay insertion, and adjusting each machine's perception of time. However, most countermeasures have drawbacks, and the best defenses involve denying the ability to perform co-residence checks and exploiting instance placement.

Describe the trade-offs and considerations of the hosted virtualization architecture, focusing on advantages and disadvantages.

Advantages of the hosted architecture include coping with diverse hardware, supporting hardware-independent code, and running on various hardware without requiring special device drivers. Disadvantages include I/O performance degradation, overhead in world switches, and potential poor resource scheduling choices by the host OS.

What is the primary motivation behind the development of Google's Bigtable? A. To store unstructured data efficiently B. To manage structured data in a distributed environment C. To support a relational data model D. To minimize storage costs for small-scale databases

B To manage structured data in a distributed environment.

How does Bigtable handle fault tolerance for tablet servers? A. Tablet servers use Bloom filters for fault tolerance. B. The master periodically checks tablet servers' lock status. C. Tablet servers replicate data to ensure fault tolerance. D. Chubby is responsible for fault tolerance in tablet servers.

B The master periodically checks tablet servers' lock status.

What is the purpose of the commit-log refinement in Bigtable? A. To optimize data storage in the commit log. B. To minimize the number of disk seek operations. C. To simplify recovery in the case of a crash. D. To enhance the effectiveness of Chubby for synchronization.

B To minimize the number of disk seek operations.

Why does Google choose not to use commercial databases for Bigtable? A. Commercial databases lack features for structured data. B. The scale of Bigtable is too large for most commercial databases. C. Commercial databases are cost-effective for Google's requirements. D. Google prefers low-level storage optimizations provided by commercial databases.

B The scale of Bigtable is too large for most commercial databases.

Explain the significance of Bloom filters in the context of Bigtable. A. Bloom filters are used for efficient searching of SSTables during reads. B. Bloom filters ensure consistency in data across tablet servers. C. Bloom filters are responsible for data partitioning in Bigtable. D. Bloom filters enhance the performance of the commit log.

A Bloom filters are used for efficient searching of SSTables during reads.

Explain the concept of a Docker image and its role in container deployment.

A Docker image is an executable package that includes everything needed to run an application—code, runtime, libraries, environment variables, and configuration files. A container is launched by running an image, and it represents a runtime instance of the image in memory.

Describe the key characteristics of Amazon's Dynamo.

Amazon's Dynamo is a highly-available distributed storage system designed for managing the state of Amazon's services. It sacrifices consistency under certain failure scenarios, making it an eventually consistent data store. Dynamo is used in a loosely coupled, service-oriented architecture, both stateless and stateful, with stringent latency requirements.

What side channels can a malicious instance exploit for cross-VM information leakage in a cloud environment?

A malicious instance can utilize time-shared caches for side-channel attacks. This includes measuring when other instances are experiencing computational load (co-residence detection), estimating the rate of web traffic a co-resident site receives, and timing keystrokes by an honest user (via SSH) of a co-resident instance.

Explore the caching strategies employed in AFS and their impact on scalability and client-server message traffic. How does AFS handle updates and inform clients of newer versions?

AFS caching improves scalability by reducing client-server message traffic. Once a file is cached, all operations are performed locally, with a write-back cache on file close. Session semantics in AFS mean that updates are only visible upon file close. The server records who has copies of a file and uses callback messages to inform clients of newer versions.

How can an adversary determine co-residence between two instances in a cloud environment?

An adversary can determine co-residence by comparing the internal IP addresses of two instances. If the internal IP addresses are numerically close, the adversary can use traceroute to an open port on the target. If there is a single hop (Dom0 IP address) in the traceroute, then the instances are co-resident.

How can applications deal with the semantics of record append in the Google File System? What measures should applications take to handle duplicated records, and why are checksums and unique IDs important in this context?

Applications in GFS dealing with record append semantics should include checksums in records they write using record append. Readers can identify padding/record fragments using checksums. If an application cannot tolerate duplicated records, it should include a unique ID in the record. Readers can use unique IDs to filter duplicates.

Define Atomic Consistency and provide an example of a more relaxed consistency model.

Atomic consistency requires that any two non-overlapping operations must be viewed in their real-time order by all nodes in a distributed system. An example of a more relaxed consistency model is one where reads can appear in any order as long as they appear after writes to the values being read.

How does Bigtable handle caching, and what are the two levels of caching used by tablet servers? A. Bigtable uses a single-level block cache for efficient data retrieval. B. Bigtable uses scan and block caches for two-level caching. C. Caching is not employed in Bigtable for simplicity. D. Bigtable uses a distributed caching system across tablet servers.

B Bigtable uses scan and block caches for two-level caching.

Describe the role of Chubby in the Bigtable system. A. Chubby handles data storage in Bigtable. B. Chubby provides a distributed lock service for synchronization. C. Chubby is responsible for data partitioning in tablets. D. Chubby handles the distribution of SSTables.

B Chubby provides a distributed lock service for synchronization.

Describe the structure and components of Barrelfish, a Multikernel OS.

Barrelfish consists of a privileged-mode CPU driver and a user-mode monitor process on each core. It also includes device drivers and system services running in user-level processes. The CPU driver handles traps, exceptions, and CPU interrupts, while the monitor mediates local operations on global state, maintains global consistency, and performs inter-core coordination.

Discuss the advantages of Barrelfish over traditional operating systems like Linux and Windows in terms of page unmap latency and network throughput.

Barrelfish outperforms Linux and Windows in terms of page unmap latency by quickly adapting to the multicore environment. It also demonstrates competitive network throughput, as shown in the benchmarks for UDP throughput and web server performance. The design of Barrelfish allows for efficient communication and reduced context switch overhead.

How does Barrelfish achieve shared address spaces in a multicore environment?

Barrelfish supports shared address spaces by sharing hardware page tables among all dispatchers, replicating hardware page tables when needed, and allowing user-level code to handle virtual memory management. Thread management, capabilities sharing, and cross-core coordination for thread-related tasks are performed in user space.

How does Bigtable ensure the high availability of Chubby, and what protocol is used for consistency among Chubby replicas? A. Chubby uses a leader-follower model for high availability and Paxos for consistency. B. Chubby relies on a single server for high availability and uses the MapReduce protocol for consistency. C. Chubby uses multiple servers (replicas) for high availability and employs the Raft protocol for consistency. D. Chubby relies on the CAP theorem for high availability and uses the Gossip protocol for consistency.

C Chubby uses multiple servers (replicas) for high availability and employs the Raft protocol for consistency.

How are rows in Bigtable dynamically partitioned, and what advantages does this provide? A. Rows are partitioned based on column families, enhancing access control. B. Rows are partitioned into tablets based on the master's decision. C. Row keys are arbitrary strings sorted lexicographically, providing locality. D. Rows are evenly distributed among tablet servers for load balancing.

C Row keys are arbitrary strings sorted lexicographically, providing locality.

Discuss the significance of CAP in application scaling and its impact on latency.

CAP becomes significant in distributed systems as transaction volumes increase. At low transaction volumes, latency for consistency has minimal impact. However, as volumes increase, latency becomes noticeable. High latency can have adverse effects, such as a drop in sales (e.g., Amazon claims a 100 ms delay causes a 1% loss) or a decrease in traffic (e.g., Google observed a 20% traffic drop with a 500 ms delay).

Examine the characteristics of CMU's Andrew File System (AFS). Highlight the differences between AFS and NFS, including aspects such as workstation clients, dedicated file servers, and the single namespace.

CMU's Andrew File System (AFS) involves workstation clients and dedicated file servers. The servers are replicated for fault tolerance, and it operates in a stateful manner. AFS provides a single namespace where every workstation sees the same files. File caching occurs on workstations' disks, contributing to improved performance.

Explain the concept of CPU virtualization through binary translation and its advantages.

CPU virtualization through binary translation intercepts guest instructions, translates them, and stores them in a translation cache for future use. This approach offers the ability to modify guest instruction execution with low overhead, and each instruction is translated and executed from the cache.

What are some challenges associated with I/O virtualization, and how does the overhead impact devices like a network card in virtualized environments?

Challenges in I/O virtualization include the world switch between the VMM and the host, handling of privileged instructions, and the potential overhead in I/O interrupt handling. Devices like a network card may experience extra CPU time accrual due to world switches and packet transmission involving multiple device drivers.

Describe the processes of chunk creation, deletion, rereplication, rebalancing, and garbage collection in the Google File System. What factors does the master consider when making decisions related to these operations?

Chunk creation, deletion, rereplication, rebalancing, and garbage collection in GFS are done by the master to balance space utilization and access speed. The master considers factors such as spreading replicas across racks, aggregate bandwidth per rack or server, re-replicating data if redundancy falls below a threshold, rebalancing data for smoother load distribution, and deleting stale replicas based on chunk version numbers. Garbage collection in GFS is a dependable operation run by the master to clean up replicas not known to be useful. The master records deletions in its log, renames files to hidden names including deletion timestamps, and scans file and chunk namespaces in the background. It removes files with hidden names that have been deleted for longer than 3 days and unreferenced chunks from chunk servers.

Explain the process of cloud cartography and its role in placing malicious virtual machines (VMs) on the same physical machine as target victims.

Cloud cartography involves mapping cloud services to understand the potential locations of targets and the parameters needed to establish co-residence of an adversarial instance with a target. This mapping speeds up adversarial strategies for placing malicious VMs on the same machine as the targets, facilitating the exploitation of information leakage.

What is container orchestration, and what are some tools for orchestrating containerized applications?

Container orchestration involves automated deployment, scaling, and management of containerized applications. Examples of tools include Docker Compose and Docker Swarm for Docker-based orchestration, and Kubernetes for a more comprehensive container orchestration solution.

What is containerization, and what are the key benefits of using containers?

Containerization is a lightweight form of OS virtualization that isolates processes and resources, creating a virtual environment for a service. Key benefits include flexibility, lightweight resource usage, interchangeability for updates, portability across environments, scalability, and stackability.

What is the purpose of compactions in Bigtable, and what types of compactions exist? A. Compactions are for sorting commit logs; types include minor and major. B. Compactions are for tablet server fault tolerance; types include merging and splitting. C. Compactions are for garbage collection in GFS; types include bloom filter and cache compactions. D. Compactions are for merging SSTables; types include minor and major.

D Compactions are for merging SSTables; types include minor and major.

What is a Docker Swarm, and how does it facilitate the deployment of multi-container, multi-machine applications?

Docker Swarm is a container orchestration tool that enables the deployment of multi-container, multi-machine applications. It joins multiple machines into a "Dockerized" cluster called a swarm. Swarm managers use strategies like "emptiest node" or "global" to run containers, and the swarm can deploy services across the cluster.

How does Docker Compose help in defining and managing services in a Dockerized application?

Docker Compose uses a YAML file (docker-compose.yml) to define how Docker containers should behave in production. It allows the definition, running, and scaling of services in a distributed application. Each service can be its own Docker image, and multiple instances can be run as containers for scaling purposes.

How does Dynamo handle read and write operations, and what role do quorums play in maintaining consistency?

Dynamo allows any node to receive get() and put() for any key, with requests forwarded to the coordinator. Quorums, defined by parameters R and W, ensure consistency. Reads involve forwarding to N-1 nodes, and writes involve generating a new vector clock and forwarding to N-1 nodes. Latency is dictated by the slowest replica, with R and W set to be less than N.

Explain how Dynamo handles temporary failures and maintains high availability.

Dynamo ensures all reads/writes are performed on the first N healthy nodes, addressing failures among the first N nodes. Replicas stored on nodes outside the first N have hints in metadata, indicating temporary replicas. Hints are maintained separately and sent back when the original node recovers.

What design considerations influenced the need for Amazon's Dynamo, and how does it address the trade-off between Consistency and Availability?

Dynamo is designed to meet the highly available, scalable storage needs for service state management. It uses a key-value storage pattern, favoring high availability and user-perceived consistency. It trades off strong consistency in favor of high availability, making it suitable for Amazon's internal services without security concerns.

Provide an overview of Dynamo's design, including its use of distributed hash tables, consistent hashing, optimistic replication, Merkel trees, and object versioning.

Dynamo uses a 0-hop distributed hash table (DHT) for fast communication, consistent hashing for data partitioning, optimistic replication for eventual consistency, Merkel trees for efficient background synchronization, and object versioning for reconciling read discrepancies.

Describe the anti-entropy protocol based on Merkel trees and how it handles permanent failures in Dynamo.

Dynamo uses an anti-entropy protocol based on Merkel trees to keep replicas synchronized. Each node maintains a separate Merkel tree for each key range, and nodes exchange tree roots to detect and fix inconsistencies. Merkel trees enable checking branches independently, reducing data transfer during synchronization.

Explain the mechanisms for membership and failure detection in Dynamo.

Dynamo uses background gossip to build a 0-hop DHT for ring membership. External entities bootstrap the system to avoid partitioned rings. For failure detection, standard gossip, heartbeats, and timeouts are employed to implement an effective mechanism for identifying and handling failures.

Give an example of a situation that causes a Bigtable tablet server to shutdown. Why must the server shut down in this situation?

Each tablet server must create a file and hold a lock on it (from Chubby) to show the master that it's still alive. Each lock is allocated for a short period of time (i.e., a lease). If the lease expires, the server has to renew it. Let's assume the network link of the tablet server goes down for a while. During this time, the master tries to contact the server and cannot do it. Therefore, it assumes that the tablet server crashed and decides to re-allocate the tablets. If the lock lease expired, the master can acquire it and remove the file associated with the tablet server. When the tablet server's link comes back up, it sees that its file was deleted and shuts itself down. It does this because only one tablet server can be responsible for a tablet - and its tablets have been re-assigned already. If it doesn't do it, it can lead to inconsistent data (due to stale data, concurrent writes, etc.).

What is the role of backend and frontend drivers in Xen's I/O virtualization for paravirtualization and hardware-assisted virtualization?

For paravirtualization, backend drivers communicate directly with hardware for all Domain U PV guests, while frontend drivers in Domain 0 are virtual XEN-aware drivers that invoke backend drivers through the Hypervisor. In hardware-assisted virtualization, XEN virtualization firmware is attached to guest OSs, and I/O calls are trapped by the Hypervisor and sent to QEMU daemons, communicating with hardware.

Why did Google decide to develop a new distributed file system, and what were the key characteristics of Google's hardware, data, and application environment that influenced the design?

Google decided to develop a new distributed file system to exploit the characteristics of its hardware, data, and applications. The large-scale infrastructure, consisting of thousands of machines with thousands of disks, experiences component failures as the norm. With hundreds of thousands of machines and disks, along with a high Mean Time Between Failures (MTBF) of 3 years per disk, failures are a common occurrence. The design decision was influenced by the need to manage a modest number of large files, ranging from a few million with sizes between 100MB to multi-GB. The file access model primarily involves reads and appends, with random writes being practically non-existent.

Discuss the implementation details of Sun's NFS, focusing on network communication, the stateless service approach, and any caching mechanisms involved. How does NFS handle server crashes and caching for performance reasons?

In the original design of NFS (up to NFS v3), the network communication is client-initiated, and Remote Procedure Calls (RPC) are based on UDP, a non-reliable protocol. The server retains no knowledge of the client, making server crashes invisible to the client. NFS relies on server and client caching for performance, and there is no inherent support for locking or synchronization. The global lock manager is used separately from NFS to address these issues.

Explain the read operation in the GFS architecture. Describe the roles of the master server, chunk servers, and clients in handling read operations.

In GFS architecture, the read operation involves one master server (with state replicated on backups), many chunk servers (100s - 1000s) spread across racks for better throughput and fault tolerance, and many clients accessing files stored on the same cluster. Data flow occurs directly between the client and chunk server, with the master involved only in control.

Provide an example illustrating how data values can be consistent or inconsistent in a distributed system under partitioning.

In a consistent scenario, an update message is successfully transmitted, and all nodes see the correct updated value. In an inconsistent scenario due to partitioning, the update message is lost, leading to nodes having different views, causing incorrect data values.

Explain the data flow in the Google File System during client operations. How is data pushed, and what is the significance of choosing a carefully picked chain of chunk servers?

In the data flow of GFS, clients can push data to any replica. Data is pushed linearly along a carefully picked chain of chunk servers, with each machine forwarding data to the "closest" machine in the network topology that has not received it. This approach, while introducing some delay, offers good bandwidth utilization through pipelining.

What role does Kubernetes play in container orchestration, and what distinguishes it from traditional PaaS systems?

Kubernetes is a container-centric management environment that orchestrates computing, networking, and storage infrastructure for user workloads. It combines the simplicity of Platform as a Service (PaaS) with the flexibility of Infrastructure as a Service (IaaS). Unlike traditional PaaS systems, Kubernetes operates at the container level rather than the hardware level.

Discuss the strategies involved in placing an adversarial instance on the same physical machine as a target victim in EC2.

Strategies include brute force, where many instances are launched to achieve reasonable success rates, and targeting recently-launched instances. Additionally, attackers can exploit placement locality, either sequential or parallel, to increase the chances of co-residence with target victims.

Explain the concept of live migration in Xen, including the main design goals and implementation strategies.

Live migration involves migrating running operating systems across physical hosts. The main design goals are to minimize downtime and total migration time. Xen uses a pre-copy approach with a bounded iterative Push phase and a short Stop-and-copy phase. It aims to provide effective memory migration and control the impact of memory migration traffic on running services.

Describe two problems associated with data partitioning in distributed systems. Also, describe the solution provided by Dynamo for both problems.

Load balancing/scalability. The idea is to distribute the data evenly across the set of nodes. Lookup time. The idea is to be able to find the data as fast as possible with minimum amount of meta-data stored at nodes. The solution provided by Dynamo is consistent hashing + virtual nodes + routing table entries to all other nodes. Consistent hashing represents data as (key, value)pairs, and then evenly distributes the keys in the system. The nodes can find the data in O(1) because they maintain routing for all other nodes. However, the load balancing might not be very good with a low number of nodes in the system. Additionally, load balancing may suffer from hardware heterogeneity. An optimization, used in Dynamo, is to create virtual nodes and assign a certain number of virtual nodes to each physical node.

What is managed migration in Xen, and how does it utilize dirty logging for live migration?

Managed migration in Xen is performed by migration daemons running in management VMs of source and destination hosts. Dirty logging involves trapping page faults when the guest OS tries to modify a memory page, updating a dirty bitmap. During stop-and-copy, the OS prepares for migration, and the dirty bitmap is scanned for remaining pages, transferring the checkpointed CPU-register state.

Discuss the fault-tolerance mechanisms for both the master and chunk servers in the Google File System. How does the system handle master failure, and what actions are taken when a chunk server fails?

Master fault-tolerance in GFS involves storing file/chunk namespaces and file-to-chunk mappings in memory and on disk, replicating them in the operation log. On master failure, a monitor outside GFS starts a new master, which recovers its state by replaying the operation log. To speed up recovery, the master checkpoints its state whenever the operation log grows too large. The master learns chunk locations by querying chunk servers at startup, and shadow masters provide read-only access to chunks even when the master is down. Chunk server fault-tolerance in GFS involves the master periodically communicating with each chunk server using heartbeat messages. On server failure, the master notices missing heartbeats, decrements the count of replicas for all chunks on the dead chunk server, and re-replicates chunks missing replicas in the background, with the highest priority given to chunks missing the greatest number of replicas.

How does memory virtualization work in the context of virtualization, and what role do shadow page tables play?

Memory virtualization involves adding one more level of indirection, where the VMM maps guest physical memory pages to actual machine memory pages. Shadow page tables are used to enable direct lookups, containing direct mappings from guest virtual memory to machine memory, updated each time the guest OS changes virtual to physical mappings.

Provide more details about metadata and chunks in the GFS. What types of metadata are present in the system, and how is metadata handled to optimize performance and reduce client-master interactions?

Metadata in GFS includes three types: file/chunk namespaces, file-to-chunk mappings, and the location of each chunk's replicas. All this metadata is stored in memory, consuming less than 64 bytes per chunk. The use of large chunks provides advantages such as fewer client-master interactions and reduced metadata size. Clients cache metadata, such as chunk location, to optimize performance.

How do most major operating systems perceive and handle multicore architectures?

Most major operating systems perceive each core in a multicore system as a separate processor. Operating systems like Windows, Linux, and Mac OS X support multicore architectures, allowing them to leverage multiple cores for parallel processing.

What challenges led to the emergence of multicore architectures in the computing landscape?

Multicore architectures emerged due to difficulties in increasing single-core clock frequencies, issues with heat dissipation, speed of light constraints, complex design and verification processes, and the realization that parallelism could enhance application performance, especially with the rise of multithreaded applications.

Describe the concepts of mutations, leases, and version numbers in the Google File System. How do these mechanisms ensure consistency in parallel writes, and what role does the master play in coordinating these operations?

Mutations in GFS are operations that change the contents or metadata of a chunk. Leases are used to maintain consistent mutation order across replicas. The master grants a chunk lease to one replica (primary chunk server), and all replicas follow the same serial order when applying mutations. Chunks have version numbers to distinguish between up-to-date and stale replicas, stored on disk at both master and chunk servers.

Compare and contrast the stateless service approach in NFS with the stateful nature of AFS. Discuss the implications of these design choices on issues such as locking, synchronization, and file semantics.

NFS follows a stateless service approach, up to NFS v3, where the server retains no knowledge of the client, and caching is performed at both the server and client sides. This design does not guarantee Unix file semantics or session semantics. In contrast, AFS is stateful, with servers fetching information about a file upon an open request. AFS provides a single namespace, supports file caching on workstations, and ensures session semantics through callback messages.

What are the challenges faced by operating systems in the context of multicore architectures?

Operating systems face challenges in optimizing for highly heterogeneous hardware configurations, adapting to different core types, and addressing issues related to message-passing hardware, such as routing and congestion. The diversity in hardware also poses scalability problems, impacting performance due to shared data structures and locking.

Provide an overview of Sun's Network File System (NFS). How does NFS implement remote directories, and what is the significance of mounting a remote directory onto a local directory?

Sun's Network File System (NFS) allows a remote directory to be mounted onto a local directory. This enables the remote directory to appear as part of the local file system. NFS provides a transparent way to access files on a remote server, and the mounted directory may contain other mounted directories.

Explain the differences between paravirtualization and hardware-assisted virtualization in Xen, including their impact on guest operating systems.

Paravirtualization involves modifying certain instructions of guest operating systems to trap into the Hypervisor, providing high performance. Hardware-assisted virtualization allows unmodified guest operating systems and relies on hardware support like Intel-VT or AMD-V. Xen runs in ring "-1," and GuestOSs run in ring 0 for hardware-assisted virtualization.

Describe the impact of network partitions on Partition Tolerance in the CAP theorem.

Partition tolerance in the CAP theorem refers to the ability of a system to handle partitions, which occur when nodes cannot communicate with each other due to network failures. It is defined as "No set of failures less than total network failure is allowed to cause the system to respond incorrectly."

Describe the main features of remote service and caching in the context of distributed file systems. How does caching contribute to reducing server load and improving scalability?

Remote service involves implementing all file operations on the server, communicating through Remote Procedure Calls (RPC). This is particularly applicable for scenarios with a large amount of write activity. Caching, on the other hand, involves caching files locally, reducing the need for frequent server contact, thus lowering server load and network traffic. The challenge in caching lies in determining how much of a file to cache and ensuring cache coherence.

Explain the concept of replication in distributed file systems. How does replication contribute to availability and service time reduction? Discuss the challenges associated with maintaining consistency among replicas.

Replication involves creating replicas of the same file on failure-independent machines. This improves availability and can shorten service time. Replicas must be distinguished from each other by different lower-level names, and updates to any replica must be reflected on all other replicas to maintain consistency.

What are Service Level Agreements (SLAs), and why are they important in Amazon's e-Commerce Platform?

SLAs are contracts where clients and services agree on performance characteristics, such as latency bounds under expected request rate distributions. In Amazon's e-Commerce Platform, SLAs are crucial due to stringent latency requirements. Amazon targets optimization for 99.9% of queries, ensuring a 300ms response time for 99.9% of requests, even at peak loads.

Compare shared memory communication with message passing communication in the context of updating shared OS state in multicore architectures.

Shared memory communication involves each core updating the same memory locations, leading to cache-coherence protocol operations. On the other hand, message passing communication moves the operation to a single server core, with other client cores sending RPCs. The performance comparison shows that message passing performs better for a larger number of cores and updated cache lines.

Explain the similarities and differences between the mechanisms used for memory virtualization in VMware and Xen.

Similarity: both add one more level of indirection, i.e., physical pages (what the OS believes to be physical memory) to machine pages (real physical memory as seen by VMM). Difference: In VMware, VMM maintains shadow page tables for all guest OSs. In regular OSs, TLB points to the OS page table. In VMware, it points to the shadow page table which contains composite mappings. These shadow page tables are updated every time the OS page tables are updated. Practically, the TLB contains mapping between virtual pages of the running guest OS and machine pages. Every TLB miss goes to the shadow page tables to find the mappings. This has a high overhead because it may require a lookup in the OS page table as well. In Xen, each guest OS allocates and manages its own memory (allocated at create time by VMM and potentially updated using the balloon driver). VMM points the TLB to the running guest OS page table. Any read from the page table is fast because it uses the OS page table. This achieves better performance than shadow page tables. However, page table updates trap into VMM for validation (e.g., OS maps page that it owns). This (trap for updates but not for reads) can be done because guest OSs are modified

What is the advantage of using the record append function instead of the write function in GFS? Are GFS replicas guaranteed to have the same content if only record append functions are used to write data? Justify your answers.

Since the files are huge, it's quite often the case that many clients need to write them concurrently. These writes can lead to inconsistent data (or undefined). Locking slows everything down when the clients write at different offsets. The situation is even worse if distributed locking is required (when using multiple replicas). Record appends provide a solution to this problem, but it forces the clients to use only appends. These appends are guaranteed atomicity and implicitly lead to defined file regions across replicas: it's impossible to overwrite each other's data. A primary replica decides a sequential order of applying the updates at all replicas. Replicas can be different because some appends can fail at some replica - therefore the entire append fails. The client re-tries and eventually succeeds to write the record at the same offset in all replicas. At-least-once semantics can lead to duplicates/fragments/padding in some replicas.

Explain the concept of snapshots in the Google File System. How are snapshots created, and what role does copy-on-write play in optimizing the duplication process?

Snapshots in GFS create a low-cost copy of a file or directory tree. When the master receives a snapshot request, it first revokes any outstanding leases on the chunks in the files it is about to snapshot. Copy-on-write is used to speed up duplication, copying chunks only when modified. Newly created snapshot files point to the same chunks as the source files.

Differentiate between stateless and stateful services in distributed file systems. Highlight the advantages and disadvantages of each approach, and provide examples of situations where each might be more suitable.

Stateless services avoid storing state information on the server by making each request self-contained. This approach simplifies recovery from failures but offers poor support for locking or synchronization. In contrast, stateful services involve the server fetching information about a file from disk upon an open request, storing it in server memory, and returning a unique connection identifier for subsequent accesses. While stateful services offer increased performance, recovery from failures can be more expensive.

Provide the steps involved in building and running a Docker image using a Dockerfile.

Steps include Docker installation, creation of a Dockerfile defining the environment, saving the Dockerfile, and building the Docker image using the "docker build" command. The image can then be viewed using "docker image ls" and run using "docker run -p [host_port]:[container_port] [image_name]".

What is the primary advantage of storage area networks (SANs) for large-scale data storage, and how does it differ from distributed file systems? Provide an example of a storage area network solution.

Storage area networks (SANs) are advantageous for large-scale data storage as they allow remote storage devices to be attached to servers through fast network communication, such as fiber optics. SANs provide block-level abstraction and good reliability, often using techniques like RAID. An example of a storage area network solution is HP's StorageWorks, which supports up to 247PB of storage, although it comes with significant cost implications.

How does the Multikernel approach address challenges in multicore architectures?

The Multikernel approach structures the operating system as a distributed system with a small-size kernel on each core. It leverages application-level OS services, relies on inter-core communication based on messages, and avoids inter-core shared memory. The advantages include scalability, hardware neutrality through RPC for communication, and improved performance by moving many OS services to the application level.

Describe the role of the Virtual Machine Monitor (VMM) and its function in virtualization.

The VMM is a thin software layer that sits between hardware and the operating system, managing and virtualizing hardware resources such as instruction set, memory, interrupts, and basic I/O. It facilitates the creation of a virtualized environment for multiple operating systems.

How does the Virtual Machine Monitor (VMM) operate with respect to x86 privilege levels, and what challenges did PC CPU virtualization face?

The VMM operates by taking complete control of machine hardware, creating virtual machines, and allowing VMs to execute directly on the hardware. Challenges for PC CPU virtualization included the non-virtualizability of the Intel IA-32 processor architecture using trap & emulate, which led to the need for alternative approaches.

How can an attacker determine the load of a web server running in the AWS cloud if this attacker owns a VM on the same physical machine with the webserver?

The attacker can use externally-generated load to determine the correlation between the load and its own cache-based load measurement. The cache-based load measurements estimate the effect of another VM load on the read performance of the current VM. Essentially, higher load on the co-located VM leads to more cache lines of the attacker being evicted, thus its read performance decreases.

What were the design criteria for the Google File System (GFS)? Explain the importance of automatic failure detection, the file size distribution, and the workload characteristics in shaping the design decisions.

The design criteria for the Google File System (GFS) included the automatic detection, tolerance, and recovery from failures, handling a modest number of large files (a few million, each between 100MB to multi-GB), supporting a read-mostly workload with large streaming reads and sequential append operations, and providing atomic consistency to parallel writes with low overhead.

Discuss the consistency model employed by the Google File System. What are the different consistency states, and how does the system handle file namespace mutations and concurrent mutations?

The consistency model in GFS ensures atomicity for file namespace mutations (create/delete). The state of a file region depends on the success/failure of mutations and the existence of concurrent mutations. Consistency states include consistent (all clients see the same data regardless of replica), defined (consistent, and clients see their mutations in their entirety), and inconsistent (due to a failed mutation).

Discuss the design decisions made in the Google File System regarding file storage, reliability through replication, the role of a single master, and the absence of caching. How do these decisions contribute to the system's performance and reliability?

The design decisions in GFS include storing files as chunks, ensuring reliability through replication (3+ replicas), employing a single master to coordinate access and keep metadata, and the absence of caching. The reliance on the Linux buffer cache for data storage in memory, combined with client caching of metadata (e.g., chunk location), optimizes system performance.

Explain the key requirements that led to the development of Amazon's Dynamo.

The key requirements for Dynamo include always being writable, user-perceived consistency, guaranteed performance (99.9th percentile latency), low total cost of ownership, incremental scalability, tunable tradeoffs between cost, consistency, durability, and latency. No existing production-ready solutions met these requirements.

What problem did the concept of virtualization aim to address in the 1960s, specifically in the context of mainframe hardware?

The main problem was how to run multiple operating system instances on mainframe hardware while achieving application isolation.

What is the motivation behind considering information leakage in the cloud, especially in the context of multitenancy?

The motivation stems from the threat of adversaries penetrating the isolation between virtual machines (VMs) in a cloud environment. Multitenancy, which involves multiplexing VMs of different customers on the same physical hardware, introduces the risk of adversaries being co-resident with their targets, leading to potential information leakage through side-channels.

Walk through the step-by-step process of mutations in the Google File System. Include details on the identities of primary and secondary chunk servers, data flow, and the mechanisms to maintain consistency across replicas.

The mutation process in GFS involves the following steps: identifying the primary chunk server holding the lease and secondaries holding the other replicas, replying to the client, pushing data to all replicas for consistency, sending a mutation request to the primary, forwarding the mutation request to all secondaries, acknowledging completion, and replying to the client (with a client retry in case of errors).

Compare and contrast the original VMware hosted architecture with the VMware full virtualization architecture.

The original VMware hosted architecture co-exists with a pre-existing host operating system and relies on it for device support. In contrast, the VMware full virtualization architecture uses a hypervisor, with one vmkernel per host managing host-wide tasks, and one VMM per VM handling the details of virtualizing specific x86 architectures.

Describe the threat model considered in the context of information leakage in the cloud.

The threat model assumes that the cloud provider and its infrastructure are trusted. It excludes attacks relying on subverting a cloud's administrative functions and focuses on adversaries who are malicious parties not affiliated with the provider. Adversaries can run and control multiple instances in the cloud, potentially being co-resident with target victims running cloud services requiring confidentiality.

What are the three types of memory migration in Xen, and what problems are associated with stop-and-copy migration?

The three types of memory migration are Push, Stop-and-copy, and Pull. Stop-and-copy migration has problems as downtime and total migration time depend on the amount of physical memory in the VM, potentially leading to unacceptable service outage.

How does the Google File System address the challenges of undefined states in the context of traditional random writes? Explain the concept of atomic record append and its impact on consistency.

To avoid undefined states in GFS, where traditional random writes would require expensive synchronization, GFS introduces the concept of atomic record append. This allows multiple clients to append data to the same file concurrently, with the serialization of append operations at the primary chunk server solving the problem. The result of successful operations is defined, providing "at least once" semantics.

Explain the key properties of virtualization, including isolation, encapsulation, and portability.

Virtualization provides isolation, ensuring fault and performance isolation. It offers encapsulation by cleanly capturing all virtual machine (VM) state, allowing for features like snapshots and clones. Portability is achieved by being independent of physical hardware, enabling the migration of live, running VMs.

How can you share a Docker image on Docker Hub, and what is Docker Hub's role in container distribution?

To share a Docker image, tag it using "docker tag," publish it on Docker Hub with "docker push," and then pull and run the image from the remote repository. Docker Hub is a cloud-based repository for Docker images, facilitating the creation, testing, storage, and distribution of container images.

Explain the tradeoff between Availability and Consistency in traditional databases.

Traditional databases usually favor consistency over availability. Availability means the service is available to the client, but it is often lost when the site is busy. In traditional databases, the emphasis on consistency may lead to reduced availability, especially during high load.

Explain the concept of virtual nodes in Dynamo and their advantages.

Virtual nodes are used in Dynamo to address uneven data distribution in the basic Chord scheme. Each physical node has multiple virtual nodes, and they are distributed across different data centers. Advantages include even load distribution when a node becomes unavailable or available, accommodating heterogeneity in the physical infrastructure.

What are some applications of virtualization, and how does it contribute to server consolidation and improved data center management?

Virtualization applications include server consolidation, which converts underutilized servers to VMs, leading to significant cost savings. It simplifies data center provisioning, monitoring, and dynamic load balancing, improving availability with features like automatic restart, fault tolerance, and disaster recovery.

How does virtualization improve system security? How can it improve load balancing/fault-tolerance? What are the two main conditions that must be satisfied to provide load balancing/fault-tolerance for Internet servers?

Virtualization improves system security because each application can be run into its own virtual machine (i.e., guest OS). If the application or its OS crashes (or they are compromised), they won't affect other applications/OSs. Load balancing/fault-tolerance can be improved through live migration. It's easier to transfer entire VMs than individual processes. Ensure that the server maintains the same IP address and have a minimal downtime when moving the VM such that TCP connections don't time out.

Explain the concept of Writable Working Set (WWS) and its role in the pre-copy approach of Xen's live migration.

Writable Working Set (WWS) represents the set of memory pages that are updated very frequently during pre-copying. Xen transfers these pages during the stop-and-copy phase, aiming to minimize the overhead of transferring memory pages multiple times.

How does Xen handle virtual memory management, and what is the role of balloon drivers in adjusting memory reservations?

Xen has three levels of address translation, and it registers GuestOS page tables directly with the MMU. Balloon drivers in GuestOS can adjust memory reservations by making hypercalls to Xen. Balloon drivers allow domains to get more pages in physical memory, and Xen may invoke balloon drivers to force GuestOS to relinquish physical pages.

What are the key characteristics of Xen virtualization, and why is it named "Xen"?

Xen is a virtualization system supporting both paravirtualization and hardware-assisted virtualization. It was initially created by the University of Cambridge Computer Laboratory (Ian Pratt) and is now maintained by Citrix. The name "Xen" comes from neXt gENeration virtualization.

Describe the basic architecture of Xen, including the roles of the Hypervisor and Domain 0.

Xen's basic architecture consists of a Hypervisor (VMM) responsible for CPU and memory virtualization, CPU scheduling, and memory partitioning. The Hypervisor has no knowledge of networking, storage devices, and other I/O. Domain 0 is a unique VM running on the Hypervisor with special rights to access I/O resources and interact with other VMs.

Describe Xen's pre-copy approach for live migration and how it combines iterative pre-copy and stop-and-copy phases.

Xen's pre-copy approach combines a bounded iterative Push phase and a very short Stop-and-copy phase. Pre-copying occurs in rounds, transferring pages modified in the previous round. The number of rounds is bounded based on the analysis of the Writable Working Set behavior of typical server workloads.

How can you deploy a stack on Kubernetes using Docker commands, and what does the "kubectl get services" command reveal?

You can deploy a stack on Kubernetes with Docker stack deploy, the docker-compose.yml file, and the stack name. The "kubectl get services" command reveals the services deployed in a Kubernetes cluster, providing information about the deployed services and their status.


Kaugnay na mga set ng pag-aaral

MENTAL HEALTH: CHAPTER 12: ABUSE & VIOLENCE

View Set

Alcohol, Drugs and Tobacco Test, my sets

View Set

8A IV Therapy; ATI skills module, pharm book, Igancioius, Article

View Set