Distributed Systems: Naming
Explain the difference between a hard link and a soft link in UNIX systems.
A hard link is a named entry in a directory file pointing to the same file descriptor as another named entry (in possibly a different directory). A symbolic link is a file containing the (character string) name of another file. Hard link: A file record which points to the file contents of an existing file. Soft link: A pointer to the file record of an existing file.
Define the terms: access point, address, naming, location independence, and identifier.
Access point: The means by which a distributed resource is accessed for use. Address: The name of an access point. Naming: A name is a string of bits or characters that is used to refer to an entity Location independence: An entity that is independent from its address. Identifier: A label that is used to specifically and uniquely recognize or establish an entity as being a particular person or thing.
List and describe five types of DNS records.
An SOA (start of authority) resource record contains information such as an e-mail address of the email address of the system administrator responsible for the represented zone and the name of the host where data on the zone can be fetched. An A (address) record represents a particular host in the internet and contains an IP address for that host to allow communication. The MX (mail exchange) record is similar to a symbolic link to a node representing a mail server. There may be several MX records stored in a single node. SRV records are related to MX records and contain the name of a server for a specific service. With the use of SRV records, the client no longer needs to know the DNS name of the host providing a specific service. Instead, only service names need to be standardized, after which the providing host can be looked up. Nodes that represent a zone contain one or more NS (name server) records. An NS record contains the name of a name server that implements the zone represented by the node.
List 3 properties of an identifier.
An identifier refers to at most one entity. Each entity is referred to by at most one identifier. An identifier always refers to the same entity.
The root node in hierarchical location services may become a potential bottleneck. How can this problem be effectively circumvented?
An important observation is that we are using only random bit strings as identifiers. As a result, we can easily partition the identifier space and install a separate root node for each part. In addition, the partitioned root node should be spread across the network so that accesses to it will also be spread.
Define Attribute-based naming, gossiping, and semantic overlay networks.
Attribute-based naming describes an entity in terms of the relationship between attributes and values (LDAP example). Gossiping is a method of communicating in which nodes, having received some message or update, will connect to another arbitrary node and attempt to push the update. If that node has already been updated by another node, then the sending node may either "lose interest" in spreading the update further with probability 1/k (where k is a design parameter that represents the number of neighbors) or try to push the update to another node. A semantic overlay network is a network that keeps track of nodes that provide similar resources by providing such nodes with links to semantically proximal neighbors. In other words, links between nodes are based on semantic relationships instead of physical proximity.
Would you consider a URL such as http://www.acme.org/index.html to be location independent? What about http://www.acme.nllindex.html?
Both names can be location independent, although the first one gives fewer hints on the location of the named entity. Location independent means that the name of the entity is independent of its address. By just considering a name, nothing can be said about the address of the associated entity.
Compare and contrast broadcast and multicast.
Broadcast: Transmit messages to all neighbors. Multicast: One to many transmission of messages using a point-to-point address. Compare: Both send messages to more than one neighbors. Contrast: In broadcast all neighbors receive messages but in multicast selected neighbors receive messages.
List and describe five "flat naming" solutions.
Broadcasting and multicasting: A message containing the identity of the entity is broadcast to each machine. Only the machines that can offer access to that entity send a reply containing the address of the access point. Forwarding pointers: When an entity moves from A to B, it leaves behind in A a reference to its new location B. The main advantage to this approach is simplicity. as soon as an entity has been located using a traditional naming service, a client can look up the current address by following the chain of forwarding pointers. Home based approaches: popular for supporting mobile entities in large-scale networks, methods of implementation are: 1.Forwarding from home location (the place where the entity was created). 2. Home agents (intelligent forwarding entity). Distributed hash tables: Generate a finger table which is stored key-value pairs, then look up the key and return the successor node. Hierarchical approaches: Hierarchical organization of a location service into domains, each having an associated directory node. division of the whole network is: top level( reach entire networks), leaf domain (lowest level domain which typically corresponds to a local area network, root node (the node which tracks the entries in the domain)
How is a mounting point looked up in most UNIX systems?
By means of a mount table that contains an entry pointing to the mount point. This means that when a mounting point is to be looked up, we need to go through the mount table to see which entry matches a given mount point.
In a hierarchical location service with a depth of k, how many location records need to be updated at most when a mobile entity changes its location?
Changing location can be described as the combination of an insert and a delete operation. An insert operation requires that at worst k +1 location records are to be changed. Likewise, a delete operation also requires changing k +1 records, where the record in the root is shared between the two operations. This leads to a total of 2k +1 records.
Define the terms: domain, top-level domain, leaf domain, root domain.
Domain: A division of the whole network. Top-level: Spans entire network Leaf domain: lowest level domain which typically corresponds to a local area network. Root domain: The node which tracks the entries in the domain.
Explain how a domain is used to implement a hierarchical naming scheme in distributed systems.
Each domain has an associated directory node that keeps track of the entities in that domain. Each domain can be subdivided into multiple smaller subdomains. This leads to a tree of directory nodes. The directory of the top-level domain called the root (directory) node, knows about all entities. A leaf domain corresponds to a local area network in a distributed system.
List and explain three naming models.
Flat naming: The identifier is simply a random bit string. It does not contain any information whatsoever on how to locate an access point of its associated entity. Good for machines. Structured naming: Composed of simple human-readable names. Examples are file system naming and host naming on the Internet. Attribute-based naming: Allows an entity to be described by (attribute, value) pairs. This allows a user to search more effectively by constraining some of the attributes.
Provide a general and a computer science specific definition for "name."
Generally, a name is a word or combination of words by which a person, place, thing, or any object of thought is designated, called, or known. You might think of it as a label, identifier, or human friendly address. In computer science, a name is a string of bits or characters that is used to refer to an entity (resources such as hosts, printers, disks, and files).
Give an example of where an address of an entity E needs to be further resolved into another address to actually access E.
IP addresses in the Internet are used to address hosts. However, to access a host, its IP address needs to be resolved to, for example, an Ethernet address.
Explain the process of iterative name resolution using DNS.
In iterative name resolution, a name resolver hands over the complete name to the root name server. The root server will resolve the path name as far as it can and return the result to the client. The client then passes the remaining path name to that name server, which resolves the path name as far is it can and returns the result to the client. The characteristic feature of iterative name resolution is that each intermediate name server returns a response to the client's name resolver, which is then responsible for contacting and receiving a response from the next name server until the path is resolved.
Explain the hierarchical organization of LDAP.
LDAP consists of a number of records, also called directory entries, that are each made up of a collection of (attribute, value) pairs. Each record has a globally unique name that appears as a sequence of naming attributes. Each naming attribute is called a relative distinguished name (RDN). It is this use of globally unique names by listing RDNs in sequence that leads to a hierarchy of the collection of directory entries, which is referred to as a directory information tree (DIT). A DIT essentially forms the naming graph of an LDAP directory service in which each node represents a directory entry.
Explain how leaf nodes and directory nodes are used to implement name spaces.
Leaf node represents the name entity and has the property that it has no outgoing edge. It stores information on the entity it is representing and it can store the state of entities in the case of file systems. Directory node has a number of outgoing edges, each labeled with a name. It stores a table in which an outgoing edge is represented as a pair (edge label, node identifier), such a table is called a directory table.
Define the terms: name resolution and closure mechanism.
Name resolution: Process of looking up a name. Closure mechanism: Knowledge of how and where to start name resolution.
Define the terms: name space, directory node, and leaf node.
Name space: A labeled and directed graph consisting of leaf nodes and directory nodes used to organize name schemes. Directory node: As a number of outgoing edges. Leaf node: represents the names entity and has the property that it has no outgoing edges.
Define the terms: name-address binding and human-friendly names.
Name-address binding: The association of a name and address. Human-friendly names: Name that is tailored to be used by humans.
Explain the relationship of names to identifiers to address in distributed systems.
Names have one to many relationship with identifiers. Identifiers have a one to one relationship with address at any given time. (Over time, the identifier/address relationship may vary)
Counting common files is a rather naive way of defining semantic proximity. Assume you were to build semantic overlay networks based on text documents, what other semantic proximity function can you think of?
One intriguing one is to have a look at actual content when possible. In the case of documents, one could look at similarity functions derived from information retrieval, such as the Vector Space Model (VSM).
High-level name servers in DNS, that is, name servers implementing nodes in the DNS name space that are close to the root, generally do not support recursive name resolution. Can we expect much performance improvement if they did?
Probably not: because the high-level name servers constitute the global layer of the DNS name space, it can be expected that changes to that part of the name space do not occur often. Consequently, caching will be highly effective, and much long-haul communication will be avoided anyway. Note that recursive name resolution for low-level name servers is important, because in that case, name resolution can be kept local at the lower-level domain in which the resolution is taking place.
Explain how a semantic overlay network functions using gossiping.
Semantic overlay networks use a two-layered gossiping scheme in which the bottom layer consists of an epidemic protocol that aims at maintaining a partial view of uniform randomly-selected nodes. The top layer maintains a list of semantically proximal neighbors through gossiping. To initiate an exchange, a node P can randomly select a neighbor Q from its current list. Then P is allowed to send only those entries that are semantically closest to Q, and when P receives entries from Q, it will eventually keep a partial view consisting of only the semantically closest nodes.
Outline an efficient implementation of globally unique identifiers.
Such identifiers can be generated locally in the following way. Take the network address of the machine where the identifier is generated, append the local time to that address, along with a generated pseudo-random number. Although, in theory, it is possible that another machine in the world can generate the same number, chances that this happens are negligible.
Explain how DNS can be used to implement a home-based approach to locating mobile hosts.
The DNS name of a mobile host would be used as (rather poor) identifier for that host. Each time the name is resolved, it should return the current IP address of the host. This implies that the DNS server responsible for providing that IP address will act as the host's name server. Each time the host moves, it contacts this home server and provides it with its current address. Note that a mechanism should be available to avoid caching of the address. In other words, other servers should be told not to cache the address found.
Compare and contrast the global, administrational, and managerial layers of name space distributions.
The global layer is formed by the highest-level nodes such as the root node and other directory nodes close to the root, such as the root's children. The administrational layer is formed by directory nodes that together are managed within a single organization. The managerial layer consists of nodes that typically change regularly and represent hosts in the local network, shared library and binary files, and user-defined directories and files. These layers are alike because they all represent nodes in the namespace tree. These layers differ in their stability. Nodes in the global layer rarely change. Nodes in the administrational layer change more frequently than nodes in the global layer but are still relatively stable. Nodes in the managerial layer change frequently.
List the information that is required to mount a foreign name space in a distributed system.
The name of an access protocol. The name of the server. The name of the mounting point in the foreign name space.
List and explain 3 benefits of location independence.
Three benefits of location independence are transparency, flexibility, and human-friendliness. Location independence provides transparency in that a name is general enough for a user to be unaware of the system behind the name, and for instance wouldn't know where the system's servers are located. It provides flexibility in that servers can be easily replaced without changing the name. It also provides human-friendliness in that the user only needs to remember the easy location-dependent name as opposed to the specific address of the server.
List three logical layers found in typical name space distributions.
Three logical layers found in typical name space distributions are the global layer, the administrational layer, and the managerial layer. (See 21 for definitions)
List and explain three methods of exploiting network proximity.
Topology-based assignment of node identifiers: Assign identifiers such that two nearby nodes will have identifiers that are also close to each other. Proximity routing: Nodes maintain a list of alternative targets to forward a request. Proximity neighbor selection: Optimize routing tables such that the nearest node is selected a neighbor.
Define forwarding pointers. Provide a real world example of its use.
When an entity moves from A to B, it leaves behind in A a reference to its new location B. Call forwarding, moving from Memphis to Denver
Consider DNS. To refer to a node N in a subdomain implemented as a different zone than the current domain, a name server for that zone needs to be specified. Is it always necessary to include a resource record for that server's address, or is it sometimes sufficient to provide only its domain name?
When the name server is represented by a node NS in a domain other than the one in which N is contained, it is enough to give only its domain name. In that case, the name can be looked up by a separate DNS query. This is not possible when NS lies in the same subdomain as N, for in that case, you would need to contact the name server to find out its address.
Are there things that can be done with a hard link that cannot be done with a soft link or vice versa?
With a soft link you can link to a different disk partition or even to a different machine.
Is an identifier allowed to contain information on the entity it refers to?
Yes, but that information is not allowed to change, because that would imply changing the identifier. The old identifier should remain valid, so that changing it would imply that an entity has two identifiers, violating the second property of identifiers.
Consider a distributed file system that uses per-user name spaces. In other words, each user has his own, private name space. Can names from such name spaces be used to share resources between two different users?
Yes, provided names in the per-user name spaces can be resolved to names in a shared, global name space. For example, two identical names in different name spaces are, in principle, completely independent and may refer to different entities. To share entities, it is necessary to refer to them by names from a shared name space. For example, Jade relies on DNS names and IP addresses that can be used to refer to shared entities such as FTP sites.
Identify the closure mechanism of accessing a web server.
closure mechanism = the knowledge of how and where to start name resolution DNS is the closure mechanism of accessing a web server. The closure mechanism of accessing a web server is the local name server. A network host is configured with an initial cache which are hints of the known addresses of the root name servers.
*Compare and contrast linking and mounting.
in fs space, two actions relevant to creating a view of local fs and some other fs Linking: way to create a view from a dir you see on your local fs to another dir whether on local or remote fs, pointer from one dir location to another directory location, if use that pointer it TRANSPARENTLY moves you to another part of filesystem. Mounting: not a pointer in the strictest sense, it doesn't redirect you, it connects you to a piece of a file system permanently Compare: Both give the user a view of a file system that may or not be local, could be remote Contrast: one is a pointer to another location, one actually imports the other FS into local computer system.
Explain the process of recursive name resolution using DNS.
you can decompose any name on the internet to the highlevel domain, iterative or top-down recursive: strips off the www and memphis.edu, so just checks edu www.memphis.edu .edu top level, memphis -> ,host www. In recursive name resolution, each name server passes the request result to the next name server it finds instead of back to the client name resolver. When the complete path is eventually resolved, the result is passed back through each name server that made a request along the way until it reaches the root name server, which then passes the result back to the client's name resolver.