Networking - Chapter 2
Traffic Intensity
(requests per second) * (bits per request) / (access rate) one way to handle undesirable traffic intensity is to increase the access rate - this can be a costly proposition; instead of upgrading the access link, a web cache can be installed in the network
Reminders:
- Network-core devices do not function at the application layer but rather at the network layer and below - Application software is confined to the end system
MX Record
permits a company's mail server and web server to have identical (aliased) hostnames
DNS Vulnerabilities
DDoS Bandwidth flooding attack against DNS servers
DNS Messages
DNS messages = query and reply have same format first 12 bytes = header section; includes idetnification, flags the question section contains information about the query that is being made; 1. a name field that contains the name that is being queried; 2. a type field that indicates the type of question being asked about the name in a reply from a DNS server, the answer section contains the resource records for the name that was originally queried ; a reply can return multiple RRs in the answer as a hostname can have multiple IP addresses the authority section contains records of other authoritative servers the additional section contains additional help for records such as the IP address for canonical hostname of a mail server
Web Browsers
Explorer, Firefox, Chome, Opera, Safari implement the client side of HTTP browser and client are used interchangeably
General Format of an HTTP Req. Message
RL: method URL Version cr lf HL: header field name: value cr lf -header lines- cr lf -entity body- RL = request line HL = header line cr = carriage return lf = line feed
HTTP Response Message Format
SL: version status code phrase cr If HL: header field name: value cr if -header lines- cr If -entity body- SL = status line HL = header line cr = carriage return lf = line feed
Timing Guarantee
Transport-layer protocol can also provide timing guarantees i.e. - that every bit that the sender pumps into the socket arrives at the receiver's socket no more than 100 msec later appealing in interactive, real-time applications such as internet telephony, teleconferencing and multi-player games
IP Address
a 32-bit quantity that we can think of as uniquely identifying the host in order for a process running on one host to send packets to a process running on another host, the receiving process needs to have an address in the internet, the host is identified by its IP address
Registrar
a commercial entity that verifies the uniqueness of the domain name, enters the domain name into the DNS database and collects a small fee for its services
clientSocket.close() (TCPClient.py/client side)
after printing the capitalized sentence, we close the client's socket closes the TCP connection between the client and the server
Distributed Hash Table (DHT)
another application of P2P a simple database, with database records being distributed over the peers in a P2P system widely implemented, subject of extensive research
BitTorrent
as of 2016, the most popular p2p file distribution protocol originally developed by Bram Cohen now many different independent BitTorrent clients conforming to the BitTorrent protocol complicated protocol and system ecosystem is wildly successful with millions of simultaneous peers actively sharing files in hundreds of thousands of torrents
serverSocket=socket(AF_INET, SOCK_STREAM) (TCPServer.py/server side)
creates a TCP socket
Electronic Mail
has three major components: user agents, mail servers and the simple mail transfer protocol (SMTP)
Application Architecture
is distinct from from network architecture designed by the application developer and dictates how the application is structured over the various end systems an application developer will likely draw on one of the two predominant architectural paradigms used in modern network applications: the client-server architecture or the peer-to-peer (P2P) architecture
Root DNS Server
over 400 scattered all over the world; managed by 13 different organizations provide the IP addresses of the TLD servers
SMTP Limitations
restricts the body, not just the headers of all mail messages to simple 7-bit ASCII which made sense in the 80s when transmission capacity was scarce requires binary multimedia data to be encoded to ASCII before being sent over SMTP (and the corresponding message to be decoded back to binary after SMTP transport)
severPort = 12000 (UDPClient.py/client side)
sets the integer variable serverPort to 12000
Total Response Time
the time from the browser's request of an object until its receipt of an object the sum of the LAN delay, the access delay and the internet delay = access delay + LAN delay + internet delay
message, clientAddress = serverSocket.recvfrom(2048) (UDPSever.py/server side)
when a packet arrives at the servers socket, the packet's data is put into the variable message and the packet's source address is put into the variable clientAddress clientAddress contains both the clients IP address and the client's port number ; UDPServer makes use of this address information as it provides a return address where the server should direct its reply
Network Architecture
(from a dev's. perspective) network architecture is fixed and provides a specific set of services to applications
Web Cache/Hit Rate Consideration
(hit rate) * (?) + (not hit rate) * (i?)
Cluster Selection Strategies
1. geographically closest; woks reasonably well for a large fraction of clients; may perform poorly for some clients as the geographically closest cluster may not be the closet cluster in terms of the length or number of hops of the network path; some end users are configured to use remotely located LDNs and those LDNs may be far from the client's location' ignores the variation in delay and the available bandwidth over time of internet paths 2. determine the best cluster for a client based on the current traffic conditions ; CDNS perform periodic real-time measurements of delay and loss of performance between clusters and clients; many LDNSs are configured to not respond to probes
Common Status Codes
202: OK = request succeeded and the information is returned in the response 301: Moved Permanently = req. obj. has been perm. moved; the new URL is specified in the Location: header of the response message; the client software will automatically retrieve the new URL 400: Bad Request = this is a generic error code indicating that the req. could not be understood by the server 404: Not Found = The req. docu does not exist on this server 505: HTTP Version Not Supported = the req. HTTP protocol version is not supported by the server
Unchoked
BitTorrent uses a clever trading algorithm a user give priority to the neighbors currently supplying them with the highest rate; that rate is continually measured and by sending chunks to those same peers these neighbors/peers are said to be unchoked
Resource Records
DNS servers store resource records including RRs that provide hostname-to-IP address mappings each DNS reply message carries one or more resource records four-tuple that contains the following fields: (Name, Value, Type, TTL) Name = depends on type Value = depends on type Type = examples: A, NS, CNAME, MX TTL = time to live; determines when a resource should be removed from a cache A = standard hostname-to-IP address mapping NS = name server CNAME = conical name MX = canonical name of mail server that has an alias hostname
Stateless Protocol
HTTP server maintains no information about the clients; sends requested files to clients w/o storing any state info about the client simplifies server design; permits engineers to develop high-performance web servers that can handle thousands of simultaneous TCP connections HTTP server is stateless
HTTP Response Message
HTTP/1.1 200 OK Connection: close Date: Tue, 12 Aug 2018 15:44:04 GMT Server: Apache/2.2.3 (CentOS) Last-Modified: Tue, 11 Aug 2018, 12:03:02 GMT Content-Length: 6821 Content-Type: text/html (data data data data) has three sections: an initial status line, six header lines and then the entity body status line contains three fields: the protocol version field, status code and the corresponding status message date = date when the HTTP response was created and sent to by the server; not the time when the object was created or last modified; it is the time the server retrieves the object from its file system, inserts it into a response message and sends the response message last-modified = date created or last modified, critical for object caching in both local client and in network cache servers (proxy server content-length = number of bytes in the object being sent; also how HTTP determines when a message ends content-type = officially indicates the object type NOT the file extension
Header Lines
Host: www.example.com = specifies the host on which the object resides; required by web proxy caches connection: close = telling the server what type of connection it wants after sending the request object (in this case, not persistent, wants server to close connection) User-agent: = specifies the user agent/browser type that is making the request to the server(i.e. - mozilla 5.0) accept-language = indicates which language (english, french, etc.) that the user prefers to receive of the object, otherwise - it sends the default version a browser will generate header lines as a function of the browser type and version, the user configuration of the browser and whether the browser currently has a cached but possibly out-of-date version of the object; web servers behave similarly *many, many more header lines can be inserted by browsers, we have covered only a small number of the totality of header lines
Secure Sockets Layer (SSL)
NOT a third internet transport protocol on the same leve as TCP and UPD but is is AN ENHANCEMENT of TCP, with enhancements being implemented in the application layer TCP-enhanced-with-SLL does everything that traditional TCP does but also provides critical process-to-process security services including: encryption, data integrity and end-point authentication application would need to include SSL code (existing, highly optimized libraries and classes) in both the client and server sides of the application has its own socket API similar to traditional TCP socket 1. sending process passes clear-text data to SSL socket 2. SSL in sending host then encrypts the data 3. passes encrypted data to TCP socket 4. encrypted data travels over the internet to the TCP in the receiving process 5. receiving socket passes encrypted data to SSL 6. SSL decrypts the data 7. SSL passes clear-text data through SSL socket to receiving process neither TCP nor UDP provide any encryption
Transport-Layer Protocol
Networks, including the Internet, provide more than one transport-layer protocol a developer chooses which protocol to implement based on the services provided by said protocol and if those services best fit the application's needs
HTTP Message Formats: Request Message
RFC 1945 2616 7540 GET /somedir/page.html HTTP/1.1 Host: www.example.com Connection: close User-Agent: Mozilla/5.0 Accept-language: fr First line of an HTTP request message is called the request line request line has three fields: method field, the URL field and the HTTP version field the subsequent lines are called the header lines
IMAP
RFC 3501 mail access protocol many more features than POP3 and significantly more complex; client/server side implementations are sig. more complex provides commands that allow users to create folders and move messages from one folder to another provides commands to allow users to search remote folders for messages matching specific criteria server maintains user state information across IMAP sessions has commands that permit a user agent to obtain components of a message (i.e. just the header or jue one part of a multi-part MIME message) which is useful for low-bandwidth connection between the user agent and its mail server
Top-level Domain (TLD) Servers
TLD = .com, .org, .net, .edu, .gov and all the country top-level domains such as uk, fr, ca, and jp for each one, there is a TLD server or server cluster Verisign Global Registry Services maintains the TLD for com Educause maintains edu
Security Guarantee
Transport protocol can provide an application with one or more security services i.e. - in the sending host, a transport protocol can encrypt all data transmitted by the sending processes and in the receiving host, the transport-layer protocol can decrypt the data before delivery the data to the receiving process provides confidentiality between the two processes, even if the data is somehow observed between sending/receiving processes can also include: data integrity and end-to-end authentication (Ch. 8)
Network Applications
Two Types: 1. one whose operation is specified in a protocol standard such as an RFC or some other document; sometimes referred to as open since the rules specifying its operation are known to all; for such an implementation, the client and server programs must conform to the rules dictated by the RFC 2. proprietary = in this case, the client and server programs employ an application layer protocol that has not been openly published in an RFC or elsewhere; a single developer or team creates both the client and server programs and they have complete control over what goes into the code ; because the code does not implement an open protocol, other independent developers will not be able to develop code that inter-operates with the application Things developers have to decide/keep in mind: 1. use TCP or UDP? 2. using or avoiding well known port numbers associated with protocol
HTTP Request and Response
User requests a web page browser sends HTTP request messages for objects in the page to the server the server receives the requests and responds with HTTP response messages that contain the objects - HTTP uses TCP as its underlying transport protocol - HTTP client first initiates a TCP connection with the server - once connection is established, the browser and the server processes access TCP through their socket interfaces - the client sends HTTP request messages into its socket interface and receives HTTP response messages from its socket interface - the HTTP server receives request messages from its socket interface and sends response messages into its socket interface - once the client sends a message into its socket interface the message is out of the clients hands and is in the hands of TCP - TCP provides reliable data transfer service to HTTP, ensuring that each HTTP request message sent by the client process eventually arrives intact at the server and each HTTP response message sent by the server process eventually arrives intacat at the client (socket interface is the door between the client process and the TCP connection; on the server side, it is the door between the server process and the TCP connection)
Domain Name System (DNS)
a directory service that translates hostnames to IP addresses DNS is: 1. a distributed database implemented in a hierarchy of DNS servers 2. application-layer protocol that allows hosts to query the distributed database often UNIX machines running the Berkeley Internet Name Domain (BIND) software commonly employed by other application-layer protocols including HTTP and SMTP to translate user-supplied hostnames to IP addresses runs over UDP and uses port 53 DNS adds additional delay (sometimes substantial) to the internet applications that use it; IP addresses are often cached in a "nearby" DNS server which helps to reduce DNS network traffic and the average DNS delay RFC 1034 and 1035 complex system; we can only touch on key aspects NOT an application with which a user directly interacts ; provides a core internet function application-layer protocol as it runs between communicating end systems using the client-server paradigm and relies on an underlying end-to-end transport protocol to transfer DNS messages between communicating end systems DNS servers are distributed around the globe
Host Aliasing
a host with a complicated hostname like relay1.west-coast.enterprise.com is said to be canonical alias hostnames are typically more mnemonic than canonical hostnames DNS can be invoked by an application to obtain the canonical hostname for a supplied alias hostname as well as the IP address
User Datagram Protocol (UDP)
a no-frills, lightweight transport protocol, providing minimal services connection-less - meaning there is no handshaking before the two processes start to communicate provides an unreliable data transfer service does not include a congestion-control mechanism, so the sending side of UDP can pump data into the layer below (the network layer) at any rate it pleases (although actual end-to-end throughput may be less than this rate due to limited transmission capacity of intervening links or due to congestion) many firewalls are configured to block most type of UDP traffic and some applications use TCP as a backup if UDP fails
Bandwidth Bottleneck
a phenomenon where the performance of a network is limited because not enough bandwidth is available to ensure that all data packets in the network reach their destination in a timely fashion.
Conditional GET
a problem with caching is that objects housed in the web server may have been modified since the copy was last cached at the client an HTTP mechanism known as the conditional GET, allows a cache to verify that its objects are up to date an HTTP req. message is a so-called conditional GET message if: 1. the req. message uses the GET method and 2. the req. message includes an If-Modified-Since: header line
Video
a sequence of images, typically being displayed at a constant rate per second an uncompressed, digitally encoded image consists of an array of pixels, with each pixel encoded into a number of bits to represent luminance and color can be compressed, trading quality for bit rate the higher the bit rate, the better the image quality and the better overall user viewing experience
Guaranteed Available Throughput
a service provided by the transport-layer wherein an application could request a guaranteed throughput of r bits/sec and the transport protocol would then ensure that the available throughput is always at least r bits/sec if the transport protocol is unable to provide this throughput, the application would need to encode a lower rate or give up
Special Mail Access Protocols
a user agent running on a local PC can't use SMTP to obtain messages as SMTP is a push protocol a special mail access protocol that transfers messages from a user's mail server to their local PC is needed to complete this task there are currently a number of popular mail access protocols including Post Office Protocol - Version 3 (POP3), Internet Mail Access Protocol (IMAP) and HTTP
HTTP Streaming
a video is simply stored at an HTTP server as an ordinary file with a specific URL; when the user wants to see the video, the client establishes a TCP connection with the server and issues an HTTP GET request for that URL the server then sends the video file, within an HTTP response message on the client side, the bytes are collected in a client application buffer once the number of bytes in this buffer exceeds a predetermined threshold, the client application begins playback the video streaming application is displaying video as it is receiving and buffering frames corresponding to latter parts of the video
connectionSocket.close() (TCPServer.py/server side)
after sending the modified sentence to the client, we close the connection socket but since serverSocket remains open, another client can now knock on the door and send the server a sentence to modify
TCP: Connection-Oriented Service
after the handshaking phase, a TCP connection is said to exist between the sockets of the two processes the connection is a full-duplex connection in that the two processes can send messages to each other over the connection at the same time when the application finishes sending messages, it must tear down the connection
User Agents
allow users to read, reply to, forward, save and compose messages microsoft outlook and apple mail are examples of user agents for e-mail
DELETE Method
allows a user, or an application, to delete an object on a web server
Web Page
also called a document consists of objects most consist of a base HTML file and several reference objects
Web Cache
also called a proxy server a network entity that satisfies HTTP requests on the behalf of an origin web server has its own disk storage and keeps copies of recently request objects in this storage both a client and a server at the same time typically purchased and installed by an ISP user req. > proxy > origin server server resp > proxy > user
Server
always-on host has a fixed, well-known address tend to be powerful machines that store and distribute web pages, stream video, relay email, etc. and reside in large data centers services request from many other hosts called clients (desktops, laptops, smartphones, etc.)
Internet Delay
amount of time it takes from when a router on the internet side of the access link forwards an HTTP request until it receives the response
Network Applications vs Application Layer Protocols
an application-layer protocol is only one piece of a network application The Web is a network application and HTTP is an application-layer protocol E-mail = application, SMTP = protocol
Tracker
an infrastructure node in each torrent that keeps track of the peers that are participating in said torrent when a peer joins a torrent, it registers itself with the tracker and periodically informs the tracker that it is still in the torrent
Socket
any message sent from one process to another must go through the underlying network a process sends messages into and receives messages from, the network through a software interface called a socket the interface between the application layer and the transport layer within a host also referred to as the Application Programming Interface (API) between the application and the network, since the socket is the programming interface with which network applications are built an application developer has control of everything on the app-layer side of the socket but has little control of the transport-layer side of the socket except 1. the choice of transport protocol and 2. perhaps the ability to fix a few transport-layer parameters such as max buffer and max segment size the application at the sending side pushes messages through the socket and at the other side of the socket, the transport-layer protocol has the responsibility of getting messages to the socket of the receiving processes *a process is analogous to a house and its socket is analogous to its door
Bandwidth-Sensitive Applications
applications that have throughput requirements many current multimedia applications are bandwidth sensitive, although some may use adaptive coding techniques to encode digitized voice or video at a rate that matches the currently available throughput if the transport protocol cannot provide this throughput, the application would need to encode at a lower rate (and receive enough throughput to sustain this lower coding rate) or may have to give up
The World Wide Web
arrived in 1990s first internet application that caught the general public's eye; dramatically changed and continues to change how people interact inside and outside of their work envir. operates on demand - users receive what they want when they want it
serverSocket.bind(('', serverPort)) (TCPServer.py/server side)
associates the server port number, serverPort with socket
Cluster Selection Strategy
at the core of any CDN deployment a mechanism for dynamically directing clients to a server cluster or data center within the CDN
serverSocket.sendto(modifiedMessage.encode(), clientAddress) (UDPSever.py/server side)
attaches the client's address (IP and port number) to the capitalized message (after converting the string to bytes) and sends the resulting packet into the server's socket
Nslookup Program
available for most window and unix platforms allows you to send a DNS query message directly from the host you're working on to some DNS server from command prompt after invoking nslookup, you can send a DNS query to any DNS server (root, TLD, authoritative)
SMTP vs HTTP
both are used to transfer files from one host to another HTTP transfers files (objects) from a web server to a web client (browser) SMTP transfers files (e-mail messages) from one mail server to another mail server when transferring files, both persistent HTTP and SMTP use persistent connections HTTP is a pull protocol - someone loads info on a web server and users use HTTP to pull the information from the server at their convenience (TCP connection is initiated by the machine that wants to receive the file) SMTP is a push protocol - the sending mail server pushes the file to the receiving mail server (TCP connection is initiated by the machine that wants to send the file) SMTP requires each message, to be in 7-bit ASCII format, if a message contains binary data or characters that are not 7-bit ASCII, the message is encoded; HTTP does not impose this restriction concerning documents consisting of text and images (and possibly other media types), HTTP encapsulates each object in its own HTTP response where as SMTP places all the message's objects into one message
Load Distribution
busy websites can be replicated over multiple web servers with each running on a different end system and each having a different IP address; a set of IP address is then associated with one canonical hostname when a client makes a DNS query for a name mapped to a set of addresses, the server responds with the entire set of IP addresses but rotates the ordering of the addresses with each reply a client usually sends its HTTP request to the address listed first and DNS rotation distributes traffic among the replicated servers also used for email so that multiple mail servers can have the same alias name also used to provide web content distribution
Elastic Applications
can make use of as much or as little throughput as happens to be available (vs bandwidth sensitive applications) e-mail, file transfer and web transfers = elastic applications
TCP: Reliable Data Transfer Service
communicating processes can rely on TCP to deliver all data sent w/o error and in proper order
Network Application
consists of pairs of processes that send messages to each other over a network for each pair of communicating processes, we typically label one of the two processes as the client and the other as the server with the web, a browser is a client process and a web server is a server process with P2P sharing, the peer that is downloading the file is the client and the peer that is uploading the file is the server
Uniform Resource Locator (URL)
consists of two components: the host name of the server that houses the object and the objects path name www.ex.com = hostname /path/image.gif = path name
HTTP Message Method Field
contained inside the request line of an HTTP request message method field can take several different values including: GET, POST, HEAD, PUT and DELETE majority of request messages use the GET method
sentence = input('Input lowercase setence:') (TCPClient.py/client side)
contains a sentence from the user; continues to gather characters until the user ends the line by typing a carriage return
Client-Server Architecture
contains an always-on host, called the server clients do not directly communicate with each other typically, a single-server host is incapable of keeping up with all the requests from clients and relies on data centers some of the better-known applications w/client-server architecture includes: web, FTP, Telnet and e-mail
serverSocket = socket(AF_INET, SOCK_DGRAM) (UDPSever.py/server side)
creates a socket type of SOCK_DGRAM (a UDP socket)
clientSocket = socket(AF_INET, SOCKDGRAM) (UDPClient.py/client side)
creates the client's socket called clientSocket AF_INET = indicates that the underlying network is using IPv4 SOCK_DGRAM = UDP socket *note: we are not specifying the port number of the client socket when we create it, we are letting the OS do this for us
clientSocket = socket(AF_INET, SOCKSTREAM) (TCPClient.py/client side)
creates the client's socket, called clientSocket the first parameter indicates that the underlying network is using IPv4 the second parameter indicates that the socket is a type of SOCK_STREAM - meaning it is a TCP socket rather than UDP
DNS Caching
critically important feature of the DNS system DNS exploits DNS caching in order to improve the delay performance and to reduce the number of DNS messages ricocheting around the internet in a query chain, when a DNS server receives a DNS reply it can cache the mapping in its local memory DNS servers discard cached information after a period of time (often set to two days because information is cached locally, the DNS saves time/resources by not having to query any other DNS servers local DNS servers can also cache the IP addresses of TLD servers thereby allowing local DNS servers to bypass the root DNS servers in a query chain; root servers are bypassed for all but a very small fraction of DNS queries
Simple Mail Transfer Protocol (SMTP)
defined in RFC 5321 much older than HTTP - RFC dates back to 1982 while SMTP was around long before that legacy technology that possesses certain archaic characteristics the principal application-layer protocol for internet electronic mail uses the reliable data transfer service of TCP to transfer mail from the sender's mail server to the receipient's mail server has two sides: a client side = executes on the sender's mail server a server side = executes on the recipient's mail server both run on every mail server when sending = client when receiving = server does not use intermediate mail servers, regardless of distance between mail servers of sender and receiver uses port 25
Cookies
defined in RFC 6265 allows sites to keep track of users as an HTTP server is stateless (keeps no information), HTTP uses cookies to identify users for various reasons most major commercial web sites use cookies today can be used to create a user session layer on top of the stateless HTTP can enhance a user's experience but are controversial and often considered an invasion of privacy
Application-Layer Protocol
defines how an application's processes, running on different end systems, pass messages to each ohter Application-layer protocol defines: 1. the types of messages exchanged (i.e. request or response) 2. syntax of the various message types 3. the semantics of the fields (meaning of info in the fields) 4. rules for determining when and how a process sends messages and responds to messages
Access Delay
delay between routers Δ / (1 − Δβ) Δ = L/R (transmission delay) β = arrival rate (req/sec)
Kankan
deploys P2P video delivery with tens of millions of users every month similar to BitTorrent file downloading has recently migrated to a hybrid CDN-P2P streaming system with a few hundred servers in China a user client requests the beginning of the content from the CDN server and in parallel requests content from peers; when the total P2P traffic is sufficient for video playback, the client will cease streaming from the CDN and only stream from peers ; if P2P streaming traffic becomes insufficient, the client will restart CDN connections and return to the hybrid CDN-P2P streaming
Local DNS Server
does not strictly belong to the hierarchy of servers but is central to DNS architecture each ISP has a local DNS server also known as a default name server when a host connects to an ISP, the ISP provides the host with the IP addresses of one or more of its local DNS servers, typically though DHCP a host's local DNS server is typically close to the host
Non-Persistent Connections
each request/response pair is sent over a separate TCP connection shortcomings: a brand new connection must be established and maintained for EACH request object TCP buffers must be allocated and TCP variables must be kept in both the client and server can place a significant burden on the web server each object suffers a delivery delay of two RTTs - one RTT to establish the TCP connection and one RTT to request and receive an object each TCP connection is closed after the server sends the object - the connection does not persist for other objects; each TCP connection transports exactly one request message and one response message (when a user requests a webpage with 10 images, 11 TCP connections are generated) shortcomings: a brand-new connection must be established and maintained for each requested object
Optimistically Unchoked
every N seconds, an additional neighbor is chosen at random and is sent chunks
Authoritative DNS Servers
every organization with publicly accessible hosts (mail servers, web servers) on the internet must provide publicly accessible DNS records that map the names of those hosts to IP addresses an organization's authoritative DNS server houses these DNS records can pay to have these records stored or house the records themselves most universities and large companies implement and maintain their own primary and secondary authoritative DNS servers
serverSocket.bind(('', serverPort)) (UDPSever.py/server side)
first line of code that is significantly different from UDPClient binds (assigns) the port number 12000 to the server's socket the application developer is explictly assigning a port number to the socket ; when anyone sends a packet to port 12000 at the IP address of the server, the packet will be directed to this socket; then enters a while loop and the loop allows the UDPServer to receive and process packets from clients indefinitely; UPDServer waits for a packet to arrive
Reliable Data Transfer
for many applications such as e-mail, file transfer, remote host access and financial applications, data loss can have devastating consequences to support these applications, something has to be done to guarantee that the data being sent by one end of the application is delivered correctly/completely to the other end of the application if a protocol provides such a guaranteed data delivery service, it is said to provide reliable data transfer when a transport protocol provides this service, the sending process can just pass its data into the socket and know with complete confidence that the data will arrive without errors at the receiving process
Mail Servers
form the core of the e-mail infrastructure each user, has a mailbox located in one of the mail servers it manages and maintains the messages that have been sent to/from the user
Netflix
generates 37% of downstream traffic in residential ISPs in N.A. 2015, NF became the leading service provider for online movies and TV services in the US has two major components: the amazon cloud and its own private CDN infrastructure the NF website and its associated back-end databases run entirely on Amazon servers in the Amazon cloud which handles the following critical functions: content ingestion, content processing, uploading versions to its CDN NF now uses its own private CDN for video content (still uses Akami for its website) NF has server racks in over 50 IXP locations and hundreds of ISPs locations housing Netflix racks NF does not employ DNS redirect, instead NF directly tells the client to use a particular CDN server NF uses push cashing rather than pull cashing = content is pushed into servers at scheduled times at off-peak hours rather than dynamically during cache misses
Cookie Technology
has 4 components 1. cookie header line in the HTTP response message 2. a cookie header line in the HTTP request message 3. a cookie file kept on the user's end system and managed by the user's browser 4. a back-end database at the website when a user contacts a web server, the server creates a unique identification number and creates an entry in its back-end database that is indexed by the identification number the web page server then responds to the user's browser, including in the HTTP response a set-cookie: header which contains the identification number: Set-cookie: 65155 when the browser receives the HTTP response message, it sees the set-cookie header and appends a line to the special cookie file that it manages each time the user requests a web page on the site, the browser consults the cookie file, extracts the id number for the site and puts a cookie header line that includes the identification number in the HTTP req. cookie: 65155 in this manner, the website is able to track the user's activity used to provide services such as shopping carts, recommendation cookie id number is associated with user if the user registers to the site (able to provide one-click shopping)
P2P Architecture
has minimal or no reliance on dedicated servers in data centers instead, the application exploits direct communication between pairs of intermittently connected hosts called peers that communicate without passing through a dedicated server many of today's most popular and traffic-intensive applications are based on P2P architecture including: file sharing (i.e. BitTorrent), peer-assisted download acceleration (i.e. Xunlei) and internet telephony and video conference(i.e. Skype) one of the most compelling features of P2P arch. is the self-scalability - each peer contributes to the workload by requesting files but also adds service capacity to the system by distributing files to other peers cost effective since they generally do not require significant server infrastructure and bandwidth challenges include: security, performance and reliability due to their highly decentralized structure
Mail Message Format
header lines and body are separated by a blank line CRLF (carriage return, line feed) Every header must have a From: header line and a To: line may include a subject: header line as well as other optional header lines
modifiedMessage = message.decode().upper() (UDPSever.py/server side)
heart of our simple application; takes the line sent by the client and after converting the message to a string, uses the method upper() to capitalize it
Data Center
houses a large number of hosts is often used to create a powerful virtual server the most popular internet services such as search engines, internet commerce, web based email and social networking employ one or more data centers can have hundreds of thousands of servers, all of which must be powered and maintained service providers must pay recurring interconnection and bandwidth costs for sending data from their data centers
Hostname
identifier for internet host www.facebook.com mnemonic and are therefore appreciated by humans
Message Queue
if the sender's server cannot delivery the mail to the recepient's mailbox, the message is held in a message queue and a later attempt is made to send it reattempts are often done every 30 minutes or so; if no success after several days, the server removes the message and notifies the sender with an e-mail message
Web Servers
implement the server side of HTTP, house web objects, each addressable by a URL include: Apache and Microsoft Information Server
from socket import * (UDPSever.py/server side)
imports the socket module
Torrent
in BitTorrent lingo, the collection of all peers participating in the distribution of a particular file a given torrent may have fewer than ten or more than a thousand peers participating at any instant of time
Rarest First
in P2P the idea is to determine from among the chunks not yet received, which are the rarest among neighboring peers (that is, the chunks that have the fewest repeated copies among neighbors) and then request those chunks first in this manner, the rarest chunks gets more quickly re-distributed, aiming to roughly equalize the number of copies of each chunk in the torrent
Port Number
in addition to knowing the address of the host to which a message is destined, the sending process must also identify the receiving process (more specifically, the receiving socket) running in the host; this information is needed because in general, a host could be running many network applications (a destination port number serves this purpose) popular applications have been assigned specific port numbers web server = 80 main server (SMTP) = 25
Other DNS services
in addition to translating hostnames to IP addresses, DNS provides: host aliasing mail server aliasing load distribution
Client and Server Processes
in the context of a communication session between a pair of processes, the process that initiates the communication (that is, initially contacts the other processes at the beginning of the session) is labeled as the client. the process that waits to be contact to begin the session is the server
Throughput
in the context of a communication session between two processes along a network path, is the rate at which the sending process can deliver bits to the receiving process fluctuates with time as other sessions share bandwidth along the network path transport-layer protocols can also offer another services guaranteeing available throughput at some specified rate
Tit-for-tat
incentive mechanism for trading in a P2P network
TCP Service Model
includes a connection-oriented service and a reliable data transfer service application using TCP receives both services
clientSocket.connect((serverName, serverPort)) (TCPClient.py/client side)
initiates the TCP connection between the client and server the connect() method is the address of the server side of the connection; after this line is executed, the three way handshake is performed and a TCP connection is established between the client and server
message = input(`Input lowercase sentence:`) (UDPClient.py/client side)
input() is a built-in function in Python; when executed, the user at the client is prompted to input a lowercase sentence is assigned to the variable message
Mail Server Aliasing
it is desirable that email addresses be mnemonic; however, the hostname of a mail server is more complicated and less mnemonic DNS can be invoked by a mail application to obtain the canonical hostname for a supplied alias hostname as well as the IP address of the host
Processes
it is not actually programs that communicate but processes can be thought of as a program that is running within an end system when running on the same end system, they can communicate with each other with inter-process communication, using rules that are governed by the end system's operating system processes on two different end systems communicate with each other by exchanging messages across the computer network - a sending process creates and sends messages into the network and a receiving process receives these messages and possibly responds by sending messages back
Content Distribution Networks (CDNs)
manages servers in multiple geographically distributed locations, stores copies of videos and other type of web content in its servers and attempts to direct each user request to a CDN location that will provide the best user experience may be a private CDN (owned by the content provide itself) or third-party CDN that distributes content on behalf of multiple content provider in order to meet the challenge of distributing massive amounts of video data to users distributed around the world, almost all major video-streaming companies make use of CDNs
Web-Based E-Mail
more and more users are sending/accessing their e-mail through web browsers (i.e. hotmail, google, yahoo, etc.) with this service, messages are accessed and sent over HTTP protocol rather than SMTP, POP3 or IMAP
DNS Server Classifications
no server has all the mappings; in order to deal with scaling issues, DNS uses a large number of servers, organized in a hierarchical fashion and distributed across the word: root DNS servers top-level domain (TLD) DNS servers authoritative DNS servers
Peers
not owned by the service provider but are instead desktops and laptops controlled by users, with most of the peers residing in homes, universities and offices communicate without passing through a dedicated server
PUT Method
often used in conjunction with web publishing tools allows a user to upload an object to a specific path (directory) on a specific web server also used by applications that need to upload objects to web servers
CDN Server Placement Philosophies: Enter Deep
pioneered by Akami to enter deep into access networks of ISP by deplying server clusters in access ISPs all over the world the goal is to get close to end users, thereby improving user-perceived delay and throughput by decreasing the number of links and routers between the end user and the CDN server from which it receives content the task of managing the clusters in this highly distributed design becomes challenging
DNS Servers
positioned all over the globe; a centralized DNS design would sound attractive but would includes problems such as: a single point of failure - if DNS crashes, so does the internet traffic volume - one server would not be able to handle all queries distant centralized DB - one location would cause slow and congested links and lead to significant delays maintenance - would be huge and need updated frequently simply doesn't scale
print(modifiedMessage.decode())
prints out modifiedMessage on the user's display, after converting the message from bytes to string, it should be the original line that the user typed but now captalized
clientSocket.send(sentence.encode()) (TCPClient.py/client side)
sends the sentence through the client's socket and into the TCP connection the program does not explicitly create a packet and attach the detsination address to the packet as it does in UDP sockets; instead, the client program simply drops the bytes in a string sentence into the TCP connection; the client then waits to receive bytes from the server
Throughput or Timing Guarantees
services not provided by today's Internet transport protocols internet applications are often designed to cope (to the greatest extent possible) with this lack of guarantee
serverPort = 12000 (UDPSever.py/server side)
sets the integer variable to 12000
serverName = `hostname` (UDPClient.py/client side)
sets the variable serverName to the string `hostname`whee we provide either the IP address of the server or the hostname of the server (cis.poly.edu) if we use the hostname, then a DNS lookup will automatically be performed to get the IP address
HEAD Method
similar to the GET method when a server receives a request with the HEAD method, it responds with an HTTP message but it leaves out the request object app developers often use the HEAD method for debugging
Post Office Protocol - Version 3 (POP3)
simple mail access protocol; defined in RFC 1939 functionality is rather limited begins when the user agent (client) opens a TCP connection to the mail server (server) on port 110 progresses through three phases: authorization, transaction and update authentication = user agent sends a username and a password to authenticate the user; two principal commands: user<username> and pass<password> transaction = user agent retrieves messages or the user agent can mark messages for deletion, remove deletion and obtain mail statistics; server responds to each command with a reply either +OK (sometimes followed by server-to-client data) which indicates the previous command was fine and -ERR used to indicate that something was wrong with the previous command update = occurs after the client has issued the quit command, ending the POP3 session; mail server deletes any messages that were marked for deletion a user agent using POP3 can often be configured (by the user) to "download and delete" or "download adn keep"; commands issued by a POP3 user agent depends on which of thse two modes the user agent is operating in downloand-and-delete = messages are downloaded onto a local machine would not be able to maintain a folder/message hierarchy on a remote server that can be accessed from any computer during a POP3 session between user agent and mail server, the server maintains some state information to keep track of which messages have been marked for deletion; however, the POP3 server does not carry state information across POP3 sessions; this greatly simplifies the implementation of a POP3 server does not provide any means for a user to create remote folders and assign messages to folders
Objects
simply a file, such as an HTML file, a JPEG image, a Java applet or a video clip, that is addressable by a single URL
IP Address
since host-names provide little if any information about the location within the internet of the host www.eurecom.fr - ends with the country code fr which tells us that the host is probably in France but doesn't say much more because host-names can consists of variable-length alphanumeric characters they would be difficult to process by routers consists of four bytes and has a rigid hierarchical structure 127.7.106.83 - where each period separates one of the bytes expressed in decimal notation from 0 to 255 hierarchical because as we scan the address from left to right, we obtain specific information about where the host is located in the internet (which network)
from socket import * (UDPClient.py/client side)
socket module forms the basis of all network communications in Python; will enable the creation of sockets within our program
Client-Server and P2P Hybrids
some applications have hybrid architecture, combining both client-server and P2P elements for example: many instant messaging applications where servers are used to track the IP address of users but user-to-user messages are sent directly between user hosts (w/o passing through intermediate servers)
TCP: Handshaking (COS)
tcp has the client and server exchange transport-layer control information with each other before the application-level messages begin to flow (aka handshake) this procedure alerts the client and server, allowing them time to prepare for an onslaught of packets
Hit Rates
the fraction of requests that are satisfied by a cache typically from 0.2 to 0.7 in practice i.e. - a hit rate of 0.4 means that 40% of requests will be satisfied almost immediately by the cache the remaining 60% of the requests will need to be satisfied by the origin servers
gethostby()
the function call that an application calls in order to perform DNS hostname to IP address translation
Internet Transport Protocols
the internet, more generally, TCP/IP networks, makes two transport protocols available to applications: UDP = user datagram protocol And TCP = transfer control protocol each offer different set of services, implementation must be determined by developer
clientSocket.sendto(message.endcode(), (serverName, serverPort)) (UDPClient.py/client side)
the message is first converted from string type to byte to as bytes are needed to send into a socket, this is done with encode() method the sendto() method attaches the destination address (serverName, serverPort) to the message and sends the resulting packet into the process's socket, clientSocket
Persistent Connections
the server leaves the TCP connection open after sending a response; all of the request and their corresponding responses be sent over the same TCP connection for example, an entire web page, containing multiple objects - can be sent over a single persistent TCP connection multiple web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection can be made back to back without waiting for replies to pending request (pipelining) typically, the HTTP server closes a connection when it isn't used for a certain time (a configurable timeout interval) the default mode of HTTP uses persistent connections with pipelining
Round-Trip Time (RTT)
the time it takes fr a small packet to travel from client to server and then back to the client includes propagation delays, packet-queuing delays in intermediate routers and switches and packet-processing delays
Distribution Time
the time it takes to get a copy of the file to all N peers
POST Method
the user requests a web page from a server but the specific contents of the webpage depends on what the user entered into the form fields if the value of the method field is POST, then the entity body contains what the user entered into the form fields; HTTP client often uses the POST method when the user fills out a form; an http client often uses the POST method when a user fills out a form* - i.e. searching information in a search engine. *the GET method can also be used in forms
HyperText Transfer Protocol (HTTP)
the web's application-layer protocol; heart of the web; defined in rfc 1945 and 2616 Implemented in two programs: a client program and a server program - which are on different end systems talk to each other by exchanging HTTP messages defines the structure of these messages and how the client and server exchange the messages defines how web clients request web pages from web servers and how servers transfer web pages to clients uses TCP as its underlying transport protocol uses persistent connections in default mode can use both non-persistent and persistent connections
Youtube
the world's largest video-sharing site 300 hours of video is uploaded to YT every minute and several billion video views per day acquired by google in 2006; google/yt protocols are proprietary makes extensive use of CDN technology to distribute its videos and has installed sever clusters in many hundreds of different IXP and ISP locations uses pull caching google's cluster-selection strategy usually directs the client to the cluster for which the RTT (round-trip time) between client and cluster is the lowest ; sometimes directed (via DNS) to a more distant cluster in order to balance the load across clusters uses HTTP streaming; does not employ adaptive streaming such as DASH; instead, a user has to manually select a version content processing takes place entirely within Google data centers
clientSocket.close()
this line closes the socket, the process then terminates
TCP: Congestion-Control Mechanism
throttles a sending process (client or server) when the network is congested between sender and receiver also attempts to limit each TCP connection to its fair share of network bandwidth a service for the general welfare of the Internett rather than direct benefit to the communicating processes
Content Distribution Networks (CDNs)
through the use of CDNs, web caches are increasingly playing an important role in the internet installs many geographically distributed caches throughout the internet, thereby localizing much of the traffic can be shared or dedicated
Mail Access Protocols
today, mail access uses a client-server architecture with the user reading emails with a client that executes on the user's end system a typical user runs a user agent on the local PC but accesses its mailbox stored on an always-on shared mail server - this mail server is shared with other users and is typically maintained by the user's ISP (i.e. their university or company) typically, the sender's user agent does not dialogue directly with the recipient's mail server, instead the user agent uses SMTP to push the e-mail message into their mail server, then, the mail server uses SMTP (as an SMTP client) to relay the e-mail message to the recipient's mail server this two step procedure allows for recourse to an unreachable destination mail server (i.e. attempting to resend an email)
P2P Architecture
unlike client-server architectures, P2P requires minimal or no reliance on always-on infrastructure servers instead, pairs of intermittently connected hosts, called peers communicate directly with each other peers are not owned by a service provider but are instead desktops and laptops controlled by users in P2P distribution, each peer can re-distribute any portion of the file it has received to any other peers, thereby assisting the server in the distribution process inherent self-scalability
CDN Server Placement Philosophies: Bring Home
used by Limelight and many other CDN companies brings the ISPs home by building large clusters at a smaller number (maybe tens) of sites instead of getting inside the access ISPs, these CDNs typically place their clusters in IXPs compared to enter-deep, bring-home typically results in lower maintenance and management overhead, possibly at the expense of higher delay and lower throughput to end users
GET Method
used when the browser requests an object, with the request object identified in the URL field note that forms do not always use the POST method, they often use the GET method and include the inputted data in the requested URL www.somesite.com/search?monkey?bananas entity body is empty with GET method
Web Caching Deployment
useful for two reasons: 1. a web cache can substantially reduce the response time for a client req. especially if the (bottleneck) bandwidth between client and origin is much less than the bandwidth between client and cache and if a high speed internet connection exists between client and cache 2. can substantially reduce traffic on an institution's access link to the internet - the institution won't have to upgrade bandwidth as quickly thereby, reducing costs * web caches can substantially reduce web traffic in the internet as a whole, thereby improving performance for all applications low cost, many caches use public-domain software that runs on inexpensive PCs
Parallel Connections
users can configure modern browsers to control the degree of parallelism, in their default modes, most browsers open 5 to 10 parallel TCP connections and each these connections handles one request-response transaction
Dynamic Adaptive Streaming Over HTTP (DASH)
video is encoded into several different version, with each version having a different bit rate and correspondingly, a different quality level the client dynamically requests chunks of video segments of a few seconds in length; when the amount of available bandwidth is high, the client selects chunks from a high-rate version; when bandwidth is low, it selects from a low-rate version the client selects different chunks one at a time with HTTP GET request messages allows clients with different internet access rates to stream in video at different encoding rates also allows clients to adapt to the available bandwidth if the available end-to-end bandwidth changes during the session with DASH, each video is stored in the HTTP server, each with a different URL the HTTP server also has a manifest file, which provides a URL for each version along with its bit rate
CDN Operation
when a browser's user's host is instructed to retrieve a specific video, the CDN must intercept the request so that it can 1. determine a suitable CDN server cluster for that client at the time and 2. redirect the client's request to a server in that cluster most CDNs take advantage of DNS to intercept and redirect requests;
connectionSocket, addr = serverSocket.accept() (TCPServer.py/server side)
when a client 'knocks', the program invokes the accept() method for serverSocket, which creates a new socket in the server called connectionSocket, dedicated to this particular client the client and server then complete the handshaking, creating a TCP connection between the client's clientSocket and the server's connectionSocket with the connection established, the client and server can now sends bytes to each other over the connection
modifiedMessage, serverAddress = clientSocket.recvfrom(2048) (UDPClient.py/client side)
when a packet arrives from the internet at the client's socket, the packet's data is put into the variable modifiedMessage and the packet's source address is put into the variable serverAddress the program UDP Client doesn't acctually need this server address information since it already knows the server address from the outset but python still provides the server address the method recvfrom also takes the buffer size 2048 as input (this buffer size works for most purposes)
Unreliable Data Transfer Service
when a process sends a message into a UDP socket, UDP provides no guarantee that the message will ever reach the receiving process furthermore, messages that do arrive at the receiving process may arrive out of order
Loss-Tolerant Applications
when a transport-layer protocol doesn't provide reliable data transfer, some of the data sent by the sending process may never arrive at the receiving process applications, that are able to withstand these data losses such as multimedia applications (audio/video) are called loss tolerant applications
modifiedSentence = clientSocket.recv(2048) (TCPClient.py/client side)
when characters arrive from the server, they get placed into the string modifiedSentence; characters continue to accumulate until the line ends with a carriage return character
serverSocket.list(1) (TCPServer.py/server side)
with TCP serverSocket will be our welcoming socket, after establishing the welcoming door, we wait and listen for some client to knock on the door server listens for TCP connection requests from the client; parameter specifies the maximum number of queued connections (at least 1)