Cloud Architect Interview Prep

Ace your homework & exams now with Quizwiz!

Layer 4 vs Layer 7 Load Balancing

Layer 4 - This type is based on IP range/domain and port i.e. if user request comes in for domain.com/blog, the traffic will be sent to the backend that manages all the user requests for domain.com on port 80. Layer 7 - This mode is based on the content of the user's request in which load balancer send user request to the web servers according to the content of a request. This is the very advantageous way because users can run multiple web servers on the same domain and port.

meta-monitoring

Monitoring the monitor

Stream Control Transmission Protocol (SCTP)

Much like TCP but provides redundant paths and a heartbeat.

Regional external load balancing

Network load balancing distributes traffic among a pool of instances within a region. Network load balancing can balance any kind of TCP/UDP traffic.

layer

Network protocols normally work together in groups (called stacks because diagrams often depict protocols as boxes stacked on top of each other). Some protocols function at lower layers closely tied to how different types of wireless or network cabling physically works. Others work at higher layers linked to how network applications work, and some work at intermediate layers in between.

What are the three patterns discussed for storing state?

On one server Shared and replicated Master sends locations and requester gets the shards

Load Balancer Resiliency

One strategy is a simple failover. One load balancer (the primary) receives all traffic, and the other load balancer (the secondary) monitors the health of the primary by sending heartbeat messages to it. If a loss of heartbeat is detected, the secondary takes over and becomes the active load balancer. Any TCP connections that were "in flight" are disconnected since the primary is unaware of them. Another strategy is stateful failover. It resembles simple failover except that the two load balancers exchange enough information, or state, so that both know all existing TCP connections. As a consequence, those connections are not lost in case of failover.

Oauth

Open Authorization. When a client can take actions on a server on behalf of a user such as granting on app to read the contacts from another app like Linkedin.

Network header, payload, and footer

Packet headers and footers contain the contextual information required to support the network, including addresses of the sending and receiving devices, while payloads contain the actual data to be transmitted. Headers or footers also often include some special data to help improve the reliability and or performance of network connections, such as counters that keep track of the order in which messages were sent and checksums that help network applications detect data corruption or tampering.

Partition Tolerance

Partition tolerance means the system continues to operate despite arbitrary message loss or failure of part of the system. The simplest example of partition tolerance is when the system

GreSecurity vs. AppArmour vs. SELinux

Patches the Linux kernel with security updates, role based access control, and controls for software actions on the kernel. All three offers very good protection and I can select them based upon the following simple criteria: New user / ease of use : Grsecurity Easy to understand policy and tools : AppArmor Most powerful access control mechanism : SELinux

The CAP Principle

CAP stands for consistency, availability, and partition resistance. The CAP Principle states that it is not possible to build a distributed system that guarantees consistency, availability, and resistance to partitioning. Any one or two can be achieved but not all three simultaneously. When using such systems you must be aware of which are guaranteed.

Imagine that you are going to implement a configuration management system in your current environment. Which one would you choose and why?

CFEengine and Puppet are declarative I would choose the one we had most exposure too that was strongest in what we needed to achieve. The key advantages of a CM system are that SAs can define things concisely at a high level, and that it is easy to enshrine best practices in shared definitions and processes. The primary disadvantage is the steep learning curve for the domain-specific language and the need to initially create all the necessary definitions.

CRUD

CRUD specifies a minimal set of basic storage verbs for data reading and writing: create, read, update and delete. Then, you can build other operations by aggregating these. These are usually considered database operations, but what is considered a database is arbitrary (e.g., could be a relational DBMS, but could also be YAML files).

Network Address Private Ranges

Class A 10.0.0.0-10.255.255.255/8 is private not routable on the internet Class B 172.16.0.0-172.31.255.255/12 private not routable Class C 192.168.0.0-192.168.255.255/16 private not routable

Network Address Classes

Class A Range 1-126, CIDR /8 Class B Range 128-191, CIDR /16 Class C Range 192-223, CIDR /24 Class D Range 224-239 N/A for multicast Class E Range 240-255 N/A experimental

Consistency

Consistency means that all nodes see the same data at the same time. Systems that do not guarantee consistency may provide eventual consistency. For example, they may guarantee that any update will propagate to all replicas in a certain amount of time.

If you could monitor 3 aspects of a web server what would they be?

For example, much can be learned by performing an HTTPS GET: We learn whether the server is up, if the service is overloaded, and if there is network congestion. TCP timings indicate time to first byte and time to full payload. The SSL transaction can be analyzed to monitor SSL certificate validity and expiration. The other two metrics can be used to differentiate between those issues. Knowing CPU utilization can help differentiate between network congestion and an overloaded system. Monitoring the amount of free disk space can indicate runaway processes, logs filling the disk, and many other problems.

GET vs POST

GET POST 1) In case of Get request, only limited amount of data can be sent because data is sent in header. In case of post request, large amount of data can be sent because data is sent in body. 2) Get request is not secured because data is exposed in URL bar. Post request is secured because data is not exposed in URL bar. 3) Get request can be bookmarked. Post request cannot be bookmarked. 4) Get request is idempotent . It means second request will be ignored until response of first request is delivered Post request is non-idempotent. 5) Get request is more efficient and used more than Post. Post request is less efficient and used less than get.

Graceful degradation

Graceful degradation, discussed previously, means software is designed to survive failures or periods of high load by providing reduced functionality. For example, a movie streaming service might automatically reduce video resolution to conserve bandwidth when some of its internet connections are down or otherwise overloaded.

What are the most common scaling techniques and how do they work? When are they most appropriate to use?

Horizontal - Cloning to increase capacity Transaction Splitting - reads to read-only services. Writes to master. Data Splitting - sharding or geography or partitions

sharding

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

Queuing

With queueing, you are less likely to overload the machine since the number of worker threads is fixed and remains constant. There is also an advantage in retaining the same threads to service multiple requests. This avoids the overhead associated with new thread creation. Thread creation is lightweight, but on a massive scale the overhead can add up. Another benefit of the queuing model is that it is easier to implement a priority scheme. High-priority requests can go to the head of the queue. A plethora of queueing algorithms may be available depending on the type of priority scheme you want to implement. In fair queueing, the algorithm prevents a low-priority item from being "starved" by a flood of high-priority items. Other algorithms dynamically shift priorities so that bursty or intermittent traffic does not overload the system or starve other priorities from being processed.

YARN vs. Zookeeper

YARN manages cluster workload farming out jobs on the cluster. Zookeeper is a configuration, synchronization service, and a naming registry for distributed systems (used by YouTube and can manage microservice architectures)

HTTPS SSL vs. WS Security

http://geekswithblogs.net/claraoscura/archive/2006/11/17/97438.aspx SSL is easier to implement. SSL encrypts at transport level; WS-Security encrypts at message level. SSL provides in-transit security only. This means that the request is only encrypted while it is travelling from client to server (or back). If there is a proxy server in front of the web server, the request is decrypted as it reaches it, thus travelling inside the server in undecrypted mode. WS-Security maintains the encryption until the moment when the request is processed. Targeted security. SSL secures entire message, with WS-Security we can choose to encrypt only part of a message.

Pig vs. Hive

Pig is a simple scripting language for MapReduce programs then executed by YARN. Hive is a SQL language for Hadoop.

List three challenges that automation can introduce. For each of them, describe which steps you could take to address that challenge.

Problem propagation - Code reviews, test driven development, and alerting to detect problems. Complex code - strong development procedures and style guides. removing unused code Volume of incidents - incident tracking

Which scaling techniques also improve resiliency?

Queuing is a loosely coupled architecture that allows for a publisher or subscriber to fail and restart and pick up where they left off

REST vs. SOAP

Representational State Transfer - REST is an "architectural style" that accesses a resource for data and usually includes CRUD operations, it supports SSL and HTTPS but not WS-Security. It is very flexible in the payload allowing text, HTML, XML, and JSON. Simple Object Access Protocol - SOAP a "standards based protocol" based on XML that access a resource for a transaction. It is tightly coupled with the server allowing for more security and ACID compliance but a heaver envelop.

Schema-on-write vs. Schema-on-read

SOW is the classic data management technique to define the structure first then populate the structure. SOR is an unstructured data approach to load the data as-is then create tools for presenting the data as needed. Your not tied to a one-size-fits all schema Multiple datasets can easily consolidate Time to value is decreased you immediately access multiple data sources.

Slider

Slides long running services like HBase, Accumulo, and Phoenix onto YARN to elastically manage resources on the cluster.

Load Balancing Setup - sticky sessions, Healthcheck, HA

Sticky sessions ensure users will be routed to the same server Health check won't route traffic if server is not responsive HA active passive redundant LBs

What's the Difference Between TCP and UDP?

TCP/IP is a suite of protocols used by devices to communicate over the Internet and most local networks. It is named after two of it's original protocols—the Transmission Control Protocol (TCP) and the Internet Protocol (IP). TCP provides apps a way to deliver (and receive) an ordered and error-checked stream of information packets over the network. The User Datagram Protocol (UDP) is used by apps to deliver a faster stream of information by doing away with error-checking. When configuring some network hardware or software, you may need to know the difference. https://www.howtogeek.com/190014/htg-explains-what-is-the-difference-between-tcp-and-udp/

Capability Maturity Model

The CMM is a set of maturity levels for assessing processes: Initial (ad hoc), Repeatable (documented, automated), Defined (roles and responsibilities are agreed upon), Managed (decisions are data-driven), and Optimizing (improvements are made and the results measured).

curl vs wget

The main benefit of using the wget command is that it can be used to recursively download files. Therefore if you want to download an entire website you can do so with one simple command. The wget command is also good for downloading lots of files. The wget command can recover when a download fails whereas the curl command cannot. The curl command supports more protocols than the wget command, it also provides better support for SSL. It also supports more authentication methods than wget. The curl command also works on more platforms than the wget command.

When SATA vs SAS?

The quick rule of thumb is that sequential patterns are those with large or streaming files and are best suited to SATA drives. Random workloads are typically those with very small files or storage requests that have no consistent structure (virtual servers, virtual desktops, transactional databases and so on) and are best suited to SAS or possibly SSD.

Stateless vs. Stateful Protocol

A stateless protocol does not require the server to retain session information or status about each communications partner for the duration of multiple requests. In contrast, a protocol that requires keeping of the internal state on the server is known as a stateful protocol.

Imperative vs. Declarative Programming

"Imperative programming is like "How" you do something and declarative programming is more like "What" you do."

Network Bit Values

128-64-32-16-8-4-2-1

Cross-Site Request Forgery (CSRF Sea Surf)

CSRF is an attack that tricks the victim into submitting a malicious request. It inherits the identity and privileges of the victim to perform an undesired function on the victim's behalf. For most sites, browser requests automatically include any credentials associated with the site, such as the user's session cookie, IP address, Windows domain credentials, and so forth. Therefore, if the user is currently authenticated to the site, the site will have no way to distinguish between the forged request sent by the victim and a legitimate request sent by the victim.

Linux Namespaces with Containers

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Namespaces of various flavors are essential to the functioning of containers as we know them. For example, the PID namespace is what keeps processes in one container from seeing or interacting with processes in another container (or, for that matter, on the host system).

Forward Proxy

A proxy server acts on behalf of another, in the case of forward proxy the proxy server acts on behalf of the client. The client connects to a proxy to then forward traffic to the origin server. The client knows the origin server by purposely uses the proxy to obfuscate. TOR service.

Reverse Proxy

A proxy server acts on behalf of another, in the case of reverse proxy the proxy server is representing the origin server and mandates that a client communicate to the proxy server. In fact, the origin is transparent the client only sees the proxy. Used in CDNs typically to scale.

Google Global External load balancing

HTTP(S) load balancing distributes HTTP(S) traffic among groups of instances based on proximity to the user, the requested URL, or both. SSL Proxy load balancing distributes SSL traffic among groups of instances based on proximity to the user. TCP Proxy load balancing distributes TCP traffic among groups of instances based on proximity to the user.

Hadoop vs. HBase

Hadoop is the HDFS file system, HBase is the database. HBase is a columnar NoSQL store that runs on HDFS for storage and processing of data.

Heapster

Heapster is a cluster-wide aggregator of monitoring and event data. It supports Kubernetes natively and works on all Kubernetes setups, including our Deis Workflow setup. Heapster runs as a pod in the cluster, similar to how any other Kubernetes application would run.

Solr

High volume text search optimized for web traffic by indexing and returning XML, JSON, and CSV, or binary.

What are the options for scaling a service that is CPU bound?

Horizontal scaling, Caching or queuing.

Internet Control Message Protocol (ICMP)

Supporting protocol typically used to carry messages and status from the router, such as a ping request.

What Is a Network Switch versus a Router?

Switches create a network. Routers connect networks.

Which octet tells us the class of network?

The first, so 122.___.___.___ = Class A /8 for example.

Linux Control Groups with Containers

They implement resource accounting and limiting. They provide many useful metrics, but they also help ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources.

Load balancer algorithms

Weights parameter Round Robin This is the default algorithm that selects the servers in a rotational way. LeastconnMinimum numbers of connections are considered here for the server selection. For longer sessions, this algorithm is highly recommended. In the same way, backend servers turn in a round-robin way. Source The server selection is based on the hash of the source IP address i.e. user's IP. This makes sure that the user will connect to the matching server.

defense in depth

defense in depth, which means that all layers of design detect and respond the failures. This includes failures as small as a single process and as large as an entire datacenter.

distributed denial-of-service (DDoS)

distributed denial-of-service (DDoS) attack occurs when many computers around the Internet are used in a coordinated fashion to create an extremely large DoS attack.

Failure Domain

failure domain is the bounded area beyond which failure has no impact. For example, when a car fails on a highway, its failure does not make the entire highway unusable. The impact of the failure is bounded to its failure domain.

What is the loopback address?

127.0.0.1 but anything in the 127 range can be used for loopback. A loopback address is primarily used as a means to validate that the locally connected physical network card is working properly and the TCP/IP stack installed.

Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not - in other words, a query returns either "possibly in set" or "definitely not in set". The two problems are: Because the Bloom filters will "fill up" over time, they unfortunately do require periodic reconstruction. If you're using an on-disk hash as your backing storage layer, you're periodically rehashing every record anyway. The need to empirically determine optimal indexing parameters.

connection-oriented or connectionless

A connection-oriented network protocol exchanges (a process called a handshake) address information between two devices that allows them to carry on a conversation (called a session) with each other. Conversely, connection-less protocols deliver individual messages from one point to another without regard for any similar messages sent before or after (and without knowing whether messages are even successfully received).

Content Delivery Network (CDN)

A content delivery network (CDN) is a web-acceleration service that delivers content (web pages, images, video) more efficiently on behalf of your service. CDNs cache content on servers all over the world. Requests for content are serviced from the cache nearest the user. Geolocation techniques are used to identify the network location of the requesting web browser.

Hash Function

A hash function is an algorithm that maps data of varying lengths to a fixed-length value. The result is considered probabilistically unique.

Threading

Threading is a technique used by modern operating systems to allow sequences of instructions to execute independently. Threads are subsets of processes; it's typically faster to switch operations among threads than among processes.

CAP database examples. Is MongoDB CA, AP, or CP? Oracle or Casandra?

Traditional relational databases like Oracle, MySQL, and PostgreSQL are consistent and available (CA). They use transactions and other database techniques to assure that updates are atomic; they propagate completely or not at all. Thus they guarantee all users will see the same state at the same time. Newer storage systems such as Hbase, Redis, and Bigtable focus on consistency and partition tolerance (CP). When partitioned, they become read-only or refuse to respond to any requests rather than be inconsistent and permit some users to see old data while others see fresh data. Finally, systems such as Cassandra, Riak, and Dynamo focus on availability and partition tolerance (AP). They emphasize always being able to serve requests even if it means some clients receive outdated results. Such systems are often used in globally distributed networks where each replica talks to the others by less reliable media such as the Internet.

What is the UID and GID?

Unix-like operating systems identify a user within the kernel by a value called a user identifier, often abbreviated to user ID or UID. The UID, along with the group identifier (GID) and other access control criteria, is used to determine which system resources a user can access.

VLAN vs. SDN

VLANs are often an essential part of Software-Defined-Networking (SDN). Virtual Local Area Network (VLAN) is a network protocol to separate a physical network into different virtual networks. Where SDN at heart is a model, for separating at a much higher level. In an advanced SDN you would define your business logic like: departments, employees, services you produce and consume. In SDN terms this is called the control plane. When it comes to the actual configuration of all your network components, like switches, routers and access points we speak about the data plane. At the data plane level all the technologies which a SDN vendor support are incorporated to translate your business requirements into actual network configuration, like you would do in a None-SDN environment. But in a SDN system this translation is done for you automatically.

Spark

In-memory compute for ETL, ML, and data science.

ACID

ACID stands for Atomicity (transactions are "all or nothing"), Consistency (after each transaction the database is in a valid state), Isolation (concurrent transactions give the same results as if they were executed serially), and Durability (a committed transaction's data will not be lost in the event of a crash or other problem). Databases that provide weaker consistency models often refer to themselves as NoSQL and describe themselves as BASE: Basically Available Soft-state services with Eventual consistency.

AKF Scaling Cube

AKF Scaling Cube. The x-axis (horizontal scaling) is a power multiplier, cloning systems or increasing their capacities to achieve greater performance. The y-axis (vertical scaling) scales by isolating transactions by their type or scope, such as using read-only database replicas for read queries and sequestering writes to the master database only. Finally, the z-axis (lookup-based scaling) is about splitting data across servers so that the workload is distributed according to data usage or physical geography.

Regional internal load balancing

Internal load balancing distributes traffic from Google Cloud Platform virtual machine instances to a group of instances in the same region.

Amdahl's Law

Amdahl's law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour.

Queuing with processes

An example of queueing implemented with processes is the Prefork processing module for the Apache web server. On startup, Apache forks off a certain number of subprocesses. Requests are distributed to subprocesses by a master process. Requests are processed faster because the subprocess already exists, which hides the long process creation time. Processes are configured to die and be refreshed to every n requests so that memory leaks are averted. The number of subprocesses used can be adjusted dynamically.

Kafka vs. Storm

Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data. While Storm processes stream data at scale, Apache Kafka processes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees.

GCP Autoscaling

Applicable autoscaling policies include scaling based on CPU utilization, load balancing capacity, Stackdriver monitoring metrics, or by a queue-based workload like Google Cloud Pub/Sub.

Availability

Availability is a guarantee that every request receives a response about whether it was successful or failed.

Caching Layers

Browser > Internet CDN > Reverse Proxy Accelerator > Web Application > DB Cache

Accumulo

Built by the NSA based on Google's Bigtable it provides extremely fast access to massive tables but with fine grained access controls to control data permissions.

What is Hadoop?

It is 3 technologies: - HDFS the fault tolerant clustered file system - MapReduce distributed programming framework - YARN cluster resource management for Hadoop and extends Hadoop to incumbent new technologies

etcd

Distributed key-value store used by Kubernetes to store and replicate cluster state. Bootstrapping, maintaining quorum, reconfiguring cluster membership, creating backups, handling disaster recovery, and monitoring critical events are tedious work, and require etcd-specific expertise.

Describe the three major composition patterns in distributed computing.

Distributed systems are composed of many smaller systems. In this section, we explore three fundamental composition patterns in detail: • Load balancer with multiple backend replicas - transaction based requests • Server with multiple backends - query can be broken into smaller pieces • Server tree - large dataset sharding

Fact vs. Dimension Tables

Dimensions describe the objects involved in a business intelligence effort. While facts correspond to events, dimensions correspond to people, items, or other objects. For example, in the retail scenario, we discussed that purchases, returns, and calls are facts. On the other hand, customers, employees, items and stores are dimensions and should be contained in dimension tables.

Disadvantages of Sharding

Disadvantages include : A heavier reliance on the interconnect between servers. Increased latency when querying, especially where more than one shard must be searched. Data or indexes are often only sharded one way, so that some searches are optimal, and others are slow or impossible. Issues of consistency and durability due to the more complex failure modes of a set of servers, which often result in systems making no guarantees about cross-shard consistency or durability.

What is distributed computing?

Distributed computing is the art of building large systems that divide the work over many machines. Contrast this with traditional computing systems where a single computer runs software that provides a service, or client-server computing where many machines remotely access a centralized service. In distributed computing there are typically hundreds or thousands of machines working together to provide a large service.

IP multicast vs unicast

IP multicast is often used to send a message to many machines on the same subnet at the same time. IP unicast is used to transmit a message between subnets or when IP multicast isn't available. The principle behind Multicast is that any message is received by all subscribers to the Multicast address. So MS-1 only needs to send 1 network packet to alert all other cluster members of its status. This means a status or JNDI update requires only 1 packet per Cluster for Muticast vs 1 packet per Server (approximately) for Unicast. Multicast also requires no "master" election. Multicast is thus far simpler to code and creates less network traffic. So, Multicast is great? Not necessarily. It uses UDP datagrams which are inherently unreliable and un-acknowledged, so given the unreliable bearer protocol - Etherent - your message might never turn up (interpret: you fall out of the cluster). The whole concept of Multicast is based on subscription, it is not a 'routable' protocol in the normal sense, so by default routers must discard Multicast packets or risk a network storm. Hence the historical requirement for all cluster members to reside on the same network segment. These shortcomings of Multicast mean Unicast is the way to go if your cluster spans networks or you're losing too many multicast packets.

Examples of Stateless and Stateful Protocol

In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. A UDP connection-oriented session is a stateless connection because system doesn't maintain information about the session during its life. A stateless protocol does not require the server to retain session information or status about each communications partner for the duration of multiple requests. In contrast, a protocol that requires keeping of the internal state on the server is known as a stateful protocol. A TCP connection-oriented session is a 'stateful' connection because both systems maintain information about the session itself during its life. Examples of stateless protocols include the Internet Protocol (IP), which is the foundation for the Internet, and the Hypertext Transfer Protocol (HTTP), which is the foundation of data communication for the World Wide Web. The stateless design simplifies the server design because there is no need to dynamically allocate storage to deal with conversations in progress. If a client session dies in mid-transaction, no part of the system needs to be responsible for cleaning up the present state of the server. A disadvantage of statelessness is that it may be necessary to include additional information in every request, and this extra information will need to be interpreted by the server.

Multithreading

In multithreading, a main thread receives new requests. For each request, it creates a new thread, called a worker thread, to do the actual work and send the reply. Since thread creation is fast, the main thread can keep up with a flood of new requests and none will be dropped. Throughput is improved because requests are processed in parallel, multiple CPUs are utilized, and head of line blocking is reduced or eliminated.

rate vs. capability monitoring

In short, rate metrics are more important when event frequency is high and there are smooth, predictable trends. When there is low event frequency or an uneven rate, capability metrics are more important.

Storage IOPS my disk type

In the 100-3,000 IOPS range, SATA drives provide a very cost-effective platform, with pricing usually provided in dollars per GB. In the 3,000-10,000 IOPS range, SAS drives are usually the default technology, as reaching this performance level with SATA requires a vast number of spindles or amount of SSD caching. High-performance disks are typically priced higher per GB. In the 10,000+ IOPS range, SSD begins to make financial sense, as only a fraction of a customer's overall storage requires such levels of performance. But the best use of flash is as cache.

When would you use a scripting language instead of a compiled language, and why? In which circumstances would you use a compiled language for automation?

Scripting languages are interpreted languages designed for rapid development, often focusing on systems programming. Compiled languages can be a good choice for large-scale automation. Automation written in a compiled language typically scales better than the same automation written in a scripting language. Why, For example, scripting languages are not strongly typed. That is, the type of a variable (integer, string, or object) is not checked until the variable is used.

What are the options for scaling a service whose storage requirements are growing?

Sharding

RAID write penalties

These are the RAID protection parity penalties: RAID0: ~0% overhead vs. reads RAID1+0: ~50% overhead vs. reads RAID5: ~75% overhead vs. reads RAID6: ~85% overhead vs. reads

Layer 1 - Physical Layer

This layer conveys the bit stream through the network at the electrical, optical or radio level. It provides the hardware means of sending and receiving data on a carrier network.

Layer 3: Network

This layer handles the addressing and routing of the data (sending it in the right direction to the right destination on outgoing transmissions and receiving incoming transmissions at the packet level). IP is the network layer for the Internet

Layer 7: Application

This layer is not the application itself, it is the set of services an application should be able to make use of directly, although some applications may perform application layer functions.

Layer 6: Presentation

This layer is usually part of an operating system (OS) and converts incoming and outgoing data from one presentation format to another (for example, from clear text to encrypted text at one end and back to clear text at the other).

Layer 4 - Transport Layer

This layer manages packetization of data, then the delivery of the packets, including checking for errors in the data once it arrives. On the Internet, TCP and UDP provide these services for most applications as well.

Layer 2 - Data Link Layer

This layer sets up links across the physical network, putting packets into network frames. This layer has two sub-layers, the Logical Link Control Layer and the Media Access Control Layer. Ethernet is the main data link layer in use.

Layer 5: Session

This layer sets up, coordinates and terminates conversations. Services include authentication and reconnection after an interruption. On the Internet, Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) provide these services for most applications.

OSI Layers

application, presentation, session, transport, network, data link, physical

BGP Router

border gateway protocol allows sharing across networks for new IPs so you don't have to manually update the routing tables


Related study sets

Responces to Cubism, Futurism and Beyond

View Set

Insights 8 Glossary: Development aid

View Set

Pro Selling Final Exam- CH 13-17

View Set