Backend

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

PACELC theorem

"In the case of a network partition (P), the system has to choose between availability (A) and consistency (C) but else (E), when the system operates normally in the absence of network partitions, the system has to choose between latency (L) and consistency (C)."

SQL relationships

1 - 1 - one match in second table. Example - a person table and passport # table 1 - Many - one table could have multiple matches on second rable. Example - a book and authors table. Many - Many: A many-to-many relationship occurs when multiple records in a table are associated with multiple records in another table. For example, a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers. Student - classes is another example

Share nothing Architecture

> Distributing load across multiple machines is also known as a shared-nothing architecture

Impedence Mismatch

> if data is stored in relational tables, an awkward translation layer is required between the objects in the application code and the database model of tables, rows, and columns.

Fault versus failure

A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user.

Horizontal vs. Vertical Scaling

A system can be scaled one of two ways. Vertical scaling means increasing the resources of a specific node. For example, you might add additional memory to a server to improve its ability to handle load changes. Horizontal scaling means increasing the number of nodes. For example, you might add additional servers, thus decreasing the load on any one server. Vertical scaling is generally easier than horizontal scaling, but it's limited. You can only add so much memory or disk space.

CAP Theorem

According to the initial statement of the CAP theorem, it is impossible for a distributed data store to provide more than two of the following properties simultaneously: consistency, availability, and partition tolerance. Consistency# Consistency means that every successful read request receives the result of the most recent write request. Availability# Availability means that every request receives a non-error response, without any guarantees on whether it reflects the most recent write request.

HTTP Request

An HTTP request is made by a client, to a named host, which is located on a server. The aim of the request is to access a resource on the server. To make the request, the client uses components of a URL (Uniform Resource Locator)

Scalability

As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth

ACID Compliancy

Atomicity: guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. Consistency, ensures that a transaction can only bring the database from one valid state to another, maintaining database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers Isolation, ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash).

Availability

By definition, availability is the time a system remains operational to perform its required function in a specific period. It is a simple measure of the percentage of time that a system, service, or a machine remains operational under normal conditions.

Denormalization

Denormalization means adding redundant information into a database to speed up reads.

What is NOSQL and what are the types?

Dynamic schemas Columns added on the fly Collection of documents Sacrifce acid compliance for performance and scalability Types Key value stores Redis dyanmo Document - Mongo, Couch Wide Column Cassandra, Hbase Column famlies Best suited for analyzing large dataset Graph No one cares

Cache Eviction

FIFO LIFO LRU (Last Recently Used) Most Recently Used Last Frequently Used Random replacement

Partitioning Types

Horizontal partitioning: In this scheme, we put different rows into different tables. For example, if we are storing different places in a table, we can decide that locations with ZIP codes less than 10000 are stored in one table and places with ZIP codes greater than 10000 are stored in a separate table .Vertical Partitioning: In this scheme, we divide our data to store tables related to a specific feature in their own server. For example, if we are building Instagram like application - where we need to store data related to users, photos they upload, and people they follow - we can decide to place user profile information on one DB server, friend lists on another, and photos on a third server. Directory Based Partitioning: A loosely coupled approach to work around issues mentioned in the above schemes is to create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code. So, to find out where a particular data entity resides, we query the directory server that holds the mapping between each tuple key to its DB server. This loosely coupled approach means we can perform tasks like adding

Partitioning Criteria

Key or Hash-based partitioning: Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be 'ID % 100' List partitioning: In this scheme, each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and then store it there. Round-robin partitioning: This is a very simple strategy that ensures uniform data distribution. With 'n' partitions, the 'i' tuple is assigned to partition (i mod n). d. Composite partitioning: Under this scheme, we combine any of the above partitioning schemes to devise a new scheme. For example, first applying a list partitioning scheme and then a hash based partitioning.

Load Balancer

Load Balancer (LB) is another critical component of any distributed system. It helps to spread the traffic across a cluster of servers to improve responsiveness and availability of applications, websites or databases. LB also keeps track of the status of all the resources while distributing requests. If a server is not available to take new requests or is not responding or has elevated error rate, LB will stop sending traffic to such a server.

Maintainability

Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively.

Load Balancer Types

Round robin Round robin with Weighted Server Least connections Least response time Source IP hash URL Hash

Asynchronous processing/queues

Slow operations should ideally be done asynchronously. Otherwise, a user might get stuck waiting and waiting for a process to complete. In some cases, we can do this in advance (i.e., we can pre-process). For example, we might have a queue of jobs to be done that update some part of the website. If we were running a forum, one of these jobs might be to re-render a page that lists the most popular posts and the number of comments. That list might end up being slightly out of date, but that's perhaps okay. It's better than a user stuck waiting on the website to load simply because someone added a new comment and invalidated the cached version of this page. In other cases, we might tell the user to wait and notify them when the process is done. You've probably seen this on websites before. Perhaps you enabled some new part of a website and it says it needs a few minutes to import your data, but you'll get a notification when it's done.

What is SQL?

Stores rows and columns Each record has fixed schema, need to alter whole table Vertically scalable

Bandwidth

The amount of data that can be transmitted over a network in a given amount of time.

DNS

The domain name system (DNS) is the Internet's naming service that maps human-friendly domain names to machine-readable IP addresses. The service of DNS is transparent to users. When a user enters a domain name in the browser, the browser has to translate the domain name to IP address by asking the DNS infrastructure.

Normalization

The process of applying rules to a database design to ensure that information is divided into the appropriate tables.

Reliability

The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)

HTTP Long-Polling

This is a variation of the traditional polling technique that allows the server to push information to a client whenever the data is available. With Long- Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. That's why this technique is sometimes referred to as a "Hanging GET". • If the server does not have any data available for the client, instead of sending an empty response, the server holds the request and waits until some data becomes available. • Once the data becomes available, a full response is sent to the client. The client then immediately re-request information from the server so that the server will almost always have an available waiting request that it can use to deliver data in response to an event.

Latency

This is how long it takes data to go from one end to the other. That is, it is the delay between the sender sending information (even a very small chunk of data) and the receiver receiving it.

Server-Sent Events (SSEs

Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so. 1. Client requests data from a server using regular HTTP. 2. The requested webpage opens a connection to the server. 3. The server sends the data to the client whenever there's new information available. SSEs are best when we need real-time traffic from the server to the client or if the server is generating data in a loop and will be sending multiple events to the client

WebSockets

WebSocket provides Full duplex communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time. The client establishes a WebSocket connection through a process known as the WebSocket handshake. If the process succeeds, then the server and client can exchange data in both directions at any time. The WebSocket protocol enables communication between a client and a server with lower overheads, facilitating real-time data transfer from and to the server. This is made possible by providing a standardized way for the server to send content to the browser without being asked by the client and allowing for messages to be passed back and forth while keeping the connection open.

What is a cache and what are the types?

What is cache? Caching works on locatiliy of reference principle - recently requested data is liketly to be requested again -It's like short term memory which has limited space but is faster and contains most recently accessed items § Application Server Cache □ Placing cache on request layer node enables local storage □ But when you've got a load balancer, it can send request to any node which can increase cache miss § Distribute □ Each of its nodes own part of the chached data □ The cache is divded up uysing a consistent hashing function, such that if a requst node is loking for a certain piece of data, it can quickly know where to look within distributed cache to check if data is available □ Easily we can increase chache space by adding nodes to the request pool § Global □ All nodes use the same cache □ Types ® When a chached response is not found, cache itself becomes responsible for retreving missing data ® Responsibility of request nodes to retrive any data that is not found § CDN □ Content distribution network for serving large amount of static media First request asks CDN for data. If not, CDN will query backend servers and the cache it locally

RabbitMQ

What is it? Async service to service communication used in serverless and microservices architectures How it works? Message stored in queue by Producer until they are process by consumer and delete Each message is porocessed only once by a single consumer Queues are also used as fault tolerace, in case of any service outage, they can retry requests

Thoughput

Whereas bandwidth is the maximum data that can be transferred in a unit of time, throughput is the actual amount of data that is transferred.

Sharding

o What is sharding · Dividing data in to smaller chunks · Horizontal scaling means adding more machines, which is cheaper and more feasable · Vertical scaling means improving servers o Methods Horizontal Putting different rows into different dbs Ranged based Sharding, based on last names · Vertical partitioning Dividing tables based upon features - 1 for user 1 for location · Directory We query a directory server that holds the mapping between each key and its server o Criteria · Hash Based Using hash on any entity value · List partitioning Based on List, i.e. regions · Round robin paritioning · Composite partitioning, combing any above schemes

locality of reference principle

recently requested data is likely to be requested again

Replication

sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. Replication is widely used in many database management systems (DBMS), usually with a master-slave relationship between the original and the copies. The master gets all the updates, which then ripple through to the

redundancy

the duplication of critical components or functions of a system with the intention of increasing the reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance. For example, if there is only one copy of a file stored on a single server, then losing that server means losing the file. Since losing data is seldom a good thing, we can create duplicate or redundant copies of the file to solve this problem. Redundancy plays a key role in removing the single points of failure in the system and provides backups if needed in a crisis. For example, if we have two instances of a service running in production and one fails, the system can failover to the other one.

Replication Types

§ Single master slave replication : All writes go to one node, called master. The changes in master node are replicated to other nodes, called slaves. Read requests can go to either master or slave. § Many master replication: Instead of writes going to one master, they can go to many masters. Masters can then replicate changes made to their data stores to other slaves nodes. Read requests can go to either master or slave. § No master replication: There are no master and slaves in this setup. Writes and reads can be sent to any node. Types Synchronous replication means that master sends its writes to all slaves and waits for write confirmation from 'all' slaves. Slow Async In asynchronous replication, a master confirms write to the client, after a successful write on master node. It also sends changes to other slave replicas

Cache Invalidation

§ When data is modified in DB, it should be invalidated § Types □ Write through cache ® Data is written to the cache and DB at the same time ® Mimizes risk of data loss, but has higher latency for write □ Write around cache ® Written to storage ® Reduces write time but read request will create a cahce miss and must be read from slower backend □ Write back cache ® Written to cache alone and then written to perm storage Risk of data loss in crash

Normalization v denormalization

• Data integrity • Search (joins are slow) • Disk space • Duplicate data

SQL v NoSQL

○ SQL requires that you use predefined schemas to determine the structure of your data before you work with it. In addition, all of your data must follow the same structure. ○ A NoSQL database, on the other hand, has dynamic schema for unstructured data, and data is stored in many ways • You can create documents without having to first define their structure • Each document can have its own unique structure • The syntax can vary from database to database, and You can add fields as you go. The main arguments in favor of the document data model are schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application. The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.


Ensembles d'études connexes

Indiana CNA Skills/RCPs -- 72 SKILLS

View Set

Political Geography - Colonialism and post-colonialism

View Set

CHAPTER 1: CALIFORNIA REAL ESTATE LICENSE VOCAB

View Set

Chapter 2.7 Visual Communication Design Art 1301

View Set