CS14 Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

4.3 The Search Tree ADT - Binary Search Trees

*Binary Search Tree Properties* - The property that makes a binary tree into a binary search tree is that for every node, X, in the tree, the values of all the items in its left subtree are smaller than the item in X, and the values of all the items in its right subtree are larger than the item in X. - Because the average depth of a binary search tree turns out to be O(logN), we generally do not need to worry about running out of stack space. - Searching is based on the < operator that must be defined for the particular *Comparable type*. Specifically, item x matches y if both x<y and y<x are false. This allows Comparable to be a complex type (such as an employee record), with a comparison function defined on only part of the type (such as the social security number data member or salary). - Several of the private member functions use the technique of passing a pointer variable using call-by-reference. This allows the public member functions to pass a pointer to the root to the private recursive member functions. The recursive functions can then change the value of the root so that the root points to another node. *4.3.1 CONTAINS* - This operation requires returning true if there is a node in tree T that has item X, or false if there is no such node. - If T is empty, then we can just return false. Otherwise, if the item stored at T is X, we can return true. Otherwise, we make a recursive call on a subtree of T, either left or right, depending on the relationship of X to the item stored in T. - the amount of stack space used is expected to be only O(log N). *4.3.2 FIND MIN AND FIND MAX* - These private routines return a pointer to the node containing the smallest and largest elements in the tree, respectively - To perform a findMin, start at the root and go left as long as there is a left child. The stopping point is the smallest element. - The findMax routine is the same, except that branching is to the right child *4.3.3 INSERT* - To insert X into tree T, proceed down the tree as you would with a contains. If X is found, do nothing (because no duplicates allowed). Otherwise, insert X at the last spot on the path traversed - Duplicates can be handled by keeping an extra field in the node record indicating the frequency of occurrence - To insert 5, we traverse the tree as though a contains were occurring. At the node with item 4, we need to go right, but there is no subtree, so 5 is not in the tree, and this is the correct spot to place 5. *4.3.4 REMOVE* - *Deleting a Node with No Children:* If the node is a leaf, it can be deleted immediately. - *Deleting a Node with One Child:* If the node has one child, the node can be deleted after its parent adjusts a link to bypass the node - *Deleting a Parent with Two Children:* The complicated case deals with a node with two children. The general strategy is to replace the data of this node with the smallest data of the right subtree (which is easily found) and recursively delete that node (which is now empty). Because the smallest node in the right subtree cannot have a left child, the second remove is an easy one - *Lazy Deletion* = When an element is to be deleted, it is left in the tree and merely marked as being deleted (if a deleted item is reinserted, the overhead of allocating a new cell is avoided)

CHAPTER 5: HASHING

*Hashing* = Hashing is a technique used for performing insertions, deletions, and finds in constant average time. - operations such as findMin, findMax, and the printing of the entire table in sorted order in linear time are not supported. TREES ARE NICE BUT ARRAYS ARE NICER (AKA HASHING USES ARRAYS) 1) See several methods of implementing the hash table. 2) Compare these methods analytically. 3) Show numerous applications of hashing. 4) Compare hash tables with binary search trees.

5.1 General Idea

*Key* = data member - The common convention is to have the table run from 0 to TableSize − 1 - Each key is mapped into some number in the range 0 to TableSize − 1 and placed in the appropriate cell - if hash returns a value larger than the size of the table? Use cell at hash(key) % tablesize *Hash Function* = mapping each key into some number in the hash table *Collision* = when two keys hash to the same value

4.4 AVL (Adelson-Velskii and Landis) TREES

*Properties of AVL Tree* - is a binary search tree with a balance condition *Balance Condition* = must be easy to maintain, and it ensures that the depth of the tree is O(logN) - The simplest idea is to require that the left and right subtrees have the same height - Another balance condition would insist that every node must have left and right subtrees of the same height. - An AVL tree is identical to a binary search tree, except that for every node in the tree, the height of the left and right subtrees can differ by at most 1 - The height of an empty tree is defined to be −1 - All the tree operations can be performed in O(logN) time, except possibly insertion and deletion *Why Insertion Is NOT O(logN) time* - When we do an insertion, we need to update all the balancing information for the nodes on the path back to the root, but the reason that insertion is potentially difficult is that inserting a node could violate the AVL tree property. - the property has to be restored before the insertion step is considered over. It turns out that this can always be done with a simple modification to the tree, known as a *rotation* *Let us call the node that must be rebalanced α. Since any node has at most two children, and a height imbalance requires that α's two subtrees' heights differ by two, it is easy to see that a violation might occur in four cases:* 1. An insertion into the left subtree of the left child of α (outside) 2. An insertion into the right subtree of the left child of α (inside) 3. An insertion into the left subtree of the right child of α (inside) 4. An insertion into the right subtree of the right child of α (outside) *Single Rotation* = the insertion occurs on the "outside" (i.e., left-left or right-right) *Double Rotation* = the insertion occurs on the "inside" (i.e., left-right or right-left) *4.4.1 Single Rotation* - look at pictures on page 147 - look at screenshots in CS14 Final Folder *4.4.2 Double Rotation* - look at pictures on page 150-151 - look at screenshots in CS14 Final Folder

4.2 Binary Trees

*Properties of a Binary Tree* - A binary tree has at most two children per node - the depth of an average binary tree is considerably smaller than N => O(sqrt(N)) - the average depth of a binary search tree is O(log(N)) *4.2.1 IMPLEMENTATION* - The declaration of tree nodes is similar in structure to that for doubly linked lists, in that a node is a structure consisting of the element information plus two pointers (left and right) to other nodes - trees are generally drawn as circles connected by lines, because they are actually graphs - every binary tree with N nodes would require N + 1 nullptr links *4.2.2 AN EXAMPLE: EXPRESSION TREES* *Expression Tree* = a tree that contains *operands* (such as constants or variable names) and *operators* (+, -, *, /) - it is possible for nodes to have more than 2 children (unlike binary trees) - it is possible for nodes to have only 1 child *Inorder Traversal* = is the general strategy - (left, node, right) => meaning that we go through all the nodes in the left subtree, the root, and then all the nodes in the right subtree *Postorder Traversal* = to recursively print out the left subtree, the right subtree, and then the operator - (left, right, node) *Preorder Traversal* = print out the operator first and then recursively print out the left and right subtrees - (node, left, right)

8.1 Equivalence Relations

*Relation* = is defined on a set S if for every pair of elements (a, b), a, b ∈ S, aRb is either true or false. If aRb is true, then we say that a is related to b. *Equivalence Relation* is a relation R that satisfies three properties: 1. (Reflexive) aRa, for all a ∈ S. 2. (Symmetric) aRb if and only if bRa. 3. (Transitive) aRb and bRc implies that aRc *Electrical Connectivity* = where all connections are by metal wires, is an equivalence relation. - The relation is clearly reflexive, as any component is connected to itself. - If a is electrically connected to b, then b must be electrically connected to a, so the relation is symmetric - If a is connected to b and b is connected to c, then a is connected to c. *Thus electrical connectivity is an equivalence relation*

5.3 Separate Chaining

*Separate Chaining* = to keep a list of all elements that hash to the same value - these lists are doubly linked and waste space; therefore, it might be preferable to avoid their use if space is tight - To perform a search, we use the hash function to determine which list to traverse. - We then search the appropriate list. To perform an insert, we check the appropriate list to see whether the element is already in place (if duplicates are expected, an extra data member is usually kept, and this data member would be incremented in the event of a match) - If the element turns out to be new, it can be inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future. - The hash table *stores an array of linked lists*, which are allocated in the constructor. - Just as the binary search tree works only for objects that are Comparable, *the hash tables in this chapter work only for objects that provide a hash function and equality operators* (operator== or operator!=, or possibly both) - *Any scheme could be used besides linked lists to resolve the collisions*; a binary search tree or even another hash table would work, but *we expect that if the table is large and the hash function is good, all the lists should be short*, so basic separate chaining makes no attempt to try anything complicated. - If the item to be inserted is already present, then we do nothing; otherwise, we place it in the list *The Load Factor* = λ, of a hash table to be the ratio of the number of elements in the hash table to the table size. - The average length of a list is λ. - A successful search requires that about 1 + (λ/2) links be traversed - The general rule for separate chaining hashing is to make the table size about as large as the number of elements expected (in other words, let λ ≈ 1) - if the load factor exceeds 1, we expand the table size by calling rehash - keep the table size prime to *ensure a good distribution*

5.4 Hash Tables Without Linked Lists

*Separate chaining hashing has the disadvantage of using linked lists* - This could slow the algorithm down a bit because of the time required to allocate new cells (especially in other languages) and essentially requires the implementation of a second data structure - An alternative to resolving collisions with linked lists is to try alternative cells until an empty cell is found where hi(x) = (hash(x) + f(i)) mod TableSize, with f(0) = 0 - *The function, f, is the collision resolution strategy.* - Because all the data go inside the table, a bigger table is needed in such a scheme than for separate chaining hashing - Generally, the load factor should be below λ = 0.5 for a hash table that doesn't use separate chaining => *Probing Hash Tables* *5.4.1 LINEAR PROBING* *Linear Probing* = f is a linear function of i, typically f(i) = i. This amounts to trying cells sequentially (with wraparound) in search of an empty cell - As long as the table is big enough, a free cell can always be found, but the time to do so can get quite large. *Primary Clustering* (BAD) = if the table is relatively empty, blocks of occupied cells start forming - any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster *5.4.2 QUADRATIC PROBING* - Quadratic probing is a collision resolution method that *eliminates the primary clustering problem of linear probing*. Quadratic probing is what you would expect—the collision function is quadratic. The popular choice is f(i) = i. - When 49 collides with 89, the next position attempted is one cell away. This cell is empty, so 49 is placed there. Next, 58 collides at position 8. - There is no guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime. This is because at most, half of the table can be used as alternative locations to resolve collisions - we prove now that if the table is half empty and the table size is prime, then we are always guaranteed to be able to insert a new element. - If the load factor exceeds 0.5, the table is full and we enlarge the hash table (rehashing) *Secondary Clustering* = Although quadratic probing eliminates primary clustering, elements that hash to the same position will probe the same alternative cells - Simulation results suggest that it generally causes less than an extra half probe per search *5.4.3 DOUBLE HASHING* - *Double Hashing* = For double hashing, one popular choice is f(i) = i*hash2(x)

9.2 TOPOLOGICAL SORT

*Topological Sort* = is an ordering of vertices in a directed acyclic graph, such that if there is a path from vi to vj, then vj appears after vi in the ordering - It is clear that a topological ordering is not possible if the graph has a cycle, since for two vertices v and w on the cycle, v precedes w and w precedes v - A simple algorithm to find a topological ordering is first to find any vertex with no incoming edges. We can then print this vertex, and remove it, along with its edges, from the graph. Then we apply this same strategy to the rest of the graph. *Indegree* of a vertex = is the number of edges (u,v) that point to that particular vertex - The function findNewVertexOfIndegreeZero scans the array of vertices looking for a vertex with indegree 0 that has not already been assigned a topological number. It returns NOT_A_VERTEX if no such vertex exists; this indicates that the graph has a cycle - We can remove this inefficiency by keeping all the (unassigned) vertices of indegree 0 in a special box. The findNewVertexOfIndegreeZero function then returns (and removes) any vertex in the box. When we decrement the indegrees of the adjacent vertices, we check each vertex and place it in the box if its indegree falls to 0. - While the queue is not empty, a vertex v is removed, and all vertices adjacent to v have their indegrees decremented. A vertex is put on the queue as soon as its indegree falls to 0. - The topological ordering then is the order in which the vertices dequeue - The queue operations are done at most once per vertex, and the other initialization steps, including the computation of indegrees, also take time proportional to the size of the graph.

8.4 Smart Union Algorithms

*Union By Size* = A simple improvement is always to make the smaller tree a subtree of the larger, breaking ties by any method - We can prove that if unions are done by size, the depth of any node is never more than logN - When its depth increases as a result of a union, it is placed in a tree that is at least twice as large as before. Thus, its depth can be increased at most logN times. - This implies that the running time for a *find operation* is *O(logN)*, and a *sequence of M operations* takes *O(M logN)* *Union By Height* = We keep track of the height, instead of the size, of each tree and perform unions by making the shallow tree a subtree of the deeper tree. - guarantees that all the trees will have depth at most O(logN) - This is an easy algorithm, since the height of a tree increases only when two equally deep trees are joined (and then the height goes up by one). Thus, union-by-height is a trivial modification of union-by-size.

8.5 Path Compression

- If there are many more finds than unions, this running time is worse than that of the quick-find algorithm *Path Compression* = is performed during a find operation and is independent of the strategy used to perform unions - the only way to speed the algorithm up, without reworking the data structure entirely, is to do something clever on the find operation. - Suppose the operation is find(x). Then the effect of path compression is that every node on the path from x to the root has its parent changed to the root. - The only change to the find routine (besides the fact that it is no longer a const member function) is that s[x] is made equal to the value returned by find; thus, after the root of the set is found recursively, x's parent link references it. This occurs recursively to every node on the path to the root, so this implements path compression. Compatible with Union-By-Size? (YES) - Path compression is perfectly compatible with union-by-size, and thus both routines can be implemented at the same time. - the combination of path compression and a smart union rule guarantees a very efficient algorithm in all cases Compatible with Union-By-Height? (NO) - Path compression is not entirely compatible with union-by-height, because path compression can change the heights of the trees *Ranks* (union-by-rank) = is just as efficient in theory as union-by-size - path compression significantly reduces the worst-case running time

5.2 Hash Function

- it is often a good idea to ensure that the TABLE SIZE is prime to have a more even distribution - usually, the keys are strings - If the keys are very long, the hash function will take too long to compute. A common practice in this case is not to use all the characters. The length and properties of the keys would then influence the choice. *hash(key) = value in hash table //assigns data item in key to the value in the hash table

CHAPTER 9: GRAPH ALGORITHMS

1) Show several real-life problems, which can be converted to problems on graphs. 2) Give algorithms to solve several common graph problems. 3) Show how the proper choice of data structures can drastically reduce the running time of these algorithms. 4) See an important technique, known as depth-first search, and show how it can be used to solve several seemingly nontrivial problems in linear time.

9.1 Definitions

A *Graph* G = (V, E) consists of a set of *vertices*, V, and a set of *edges*, E. *Edges* are sometimes referred to as *arcs*. - Each edge is a pair (v, w), where v, w ∈ V *Directed Graph* = if the pair is ordered - Directed graphs are sometimes referred to as digraphs - Vertex w is adjacent to v if and only if (v, w) ∈ E. *Undirected Graph* - In an undirected graph with edge (v, w), and hence (w, v), w is adjacent to v and v is adjacent to w. - Sometimes an edge has a third component, known as either a *weight* or a *cost* *Path* = is a sequence of vertices w1, w2, w3, ... , wN such that (wi, wi+1) ∈ E for 1 ≤ i < N *Length* = the number of edges on the path, which is equal to N − 1 *Loop* = If the graph contains an edge (v, v) from a vertex to itself *Simple Path* = a path such that all vertices are distinct, except that the first and last could be the same *Cycle* = a path (of length > 0) that starts and ends at the same point *Acyclic* = has no cycles - directed (DAG) *Corrected (Undirected Graph)* = if there is a path from every vertex to every other vertex *Strongly Connected (Directed Graph)* = if there is a path from every vertex to every other vertex *Weakly Connected Graph* = the underlying graph (without direction to the arcs) is connected *Complete Graph* = is a graph in which there is an edge between every pair of vertices *9.1.1 REPRESENTATION OF GRAPHS* *Adjacency Matrix* = a two-dimensional array (a simple way to represent a graph) - if we were looking for the cheapest airplane route, we could represent nonexistent flights with a cost of ∞. If we were looking, for some strange reason, for the most expensive airplane route, we could use −∞ (or perhaps 0) to represent nonexistent edges - adjacency matrix is an appropriate representation if the graph is *dense* (|E| = omega(|V|^2)) *Adjacency List* = used as a solution if graph is *sparse* (opposite of dense) - O(|E|+|V|), which is linear in the size of the graph - the standard way to represent graphs - A common requirement in graph algorithms is to find all vertices adjacent to some given vertex v, and this can be done, in time proportional to the number of such vertices found, by a simple scan down the appropriate adjacency list - the lists themselves can be maintained in either vectors or lists - for sparse graphs, when using vectors, the programmer may need to initialize each vector with a smaller capacity than the default; otherwise, there could be significant wasted space.

4.1 Preliminaries

A *tree* can be defined in several ways. One natural way to define a tree is recursively. - A tree is a collection of nodes. The collection can be empty; otherwise, a tree consists of a distinguished node, r, called the *root*, and zero or more nonempty (sub)trees T1, T2, ... , Tk, each of whose roots are connected by a directed *edge* from r. - The root of each subtree is said to be a *child* of r, and r is the *parent* of each subtree root. - We find that a tree is a collection of N nodes, one of which is the root, and N − 1 edges - *N − 1 edges* follows from the fact that each edge connects some node to its parent, and every node except the root has one parent *Leaves* = nodes with no children *Siblings* = nodes with the same parent *Grandparent* = the nodes' parent's parent *Grandchild* = the nodes' child's child *Path* = The sequence of nodes, each the child of the previous. The path from B to W is B-C-Q-W/ There is no path from E to L. - Notice that in a tree there is exactly one path from the root to each node *Length* = The length of the longest path from the root to the node *Depth* = The length of the longest path from the root to the node (counting top to bottom) *Height* = The length of the longest path from the node to a leaf (counting bottom to top) *Ancestors* = A node which has a path to you *Descendent* = A node which you have a path to If n1 = n2, then n1 is a *proper ancestor* of n2 and n2 is a *proper descendant* of n1 *4.1.1 IMPLEMENTATION OF TREES* - One way to implement a tree would be to have in each node, besides its data, a link to each child of the node *Typical Declaration:* struct TreeNode { Object element; TreeNode *firstChild; TreeNode *nextSibling; }; The solution is simple: Keep the children of each node in a linked list of tree nodes. *4.1.2 TREE TRAVERSALS WITH AN APPLICATION* *Preorder Traversal* = visit node BEFORE subtrees - (node, left, right) - root first *Postorder Traversal* = visit node AFTER subtrees - (left, right, node) - root last

8.2 The Dynamic Equivalence Problem

As an example, suppose the equivalence relation is defined over the five-element set {a1, a2, a3, a4, a5}. Then there are 25 pairs of elements, each of which is either related or not. However, the information a1 ∼ a2, a3 ∼ a4, a5 ∼ a1, a4 ∼ a2 implies that all pairs are related. We would like to be able to infer this quickly. *Equivalence Class* = of an element a ∈ S is the subset of S that contains all the elements that are related to a. - To decide if a ∼ b, we need only to check whether a and b are in the same equivalence class *Disjoint* = keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two useful operations: 1) Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its "representative"; by comparing the result of two Find operations, one can determine whether two elements are in the same subset - which returns the name of the set (that is, the equivalence class) containing a given element 2) Union: Join two subsets into a single subset - this second operation adds relations. If we want to add the relation a ∼ b, then we first see if a and b are already related. This is done by performing finds on both a and b and checking whether they are in the same equivalence class. *If they are not, then we apply union.* - This operation merges the two equivalence classes containing a and b into a new equivalence class. From a set point of view, the result of ∪ is to create a new set Sk = Si ∪Sj, destroying the originals and preserving the disjointness of all the sets **Dynamic because, during the course of the algorithm, the sets can change via the union operation *Online Algorithm* = When a find is performed, it must give an answer before continuing *Offline Algorithm* = Such an algorithm would be allowed to see the entire sequence of unions and finds

9.3 Shortest-Path Algorithms

Dijkstra's algorithm

CHAPTER 8: THE DISJOINT SETS CLASS

For the disjoint sets data structure, we will: - Show how it can be implemented with minimal coding effort. - Greatly increase its speed, using just two simple observations. - Analyze the running time of a fast implementation. - See a simple application.

CHAPTER 4: TREES

In this chapter, we look at a simple data structure for which the average running time of most operations is O(logN) - See how trees are used to implement the file system of several popular operating systems. - See how trees can be used to evaluate arithmetic expressions. - Show how to use trees to support searching operations in O(logN) average time and how to refine these ideas to obtain O(logN) worst-case bounds. We will also see how to implement these operations when the data are stored on a disk. - Discuss and use the set and map classes.

8.3 Basic Data Structure

One idea might be to use a tree to represent each set, since each element in a tree has the same root. Thus, the root can be used to name the set. We will represent each set by a tree. - the only information we will need is a parent link - Since only the name of the parent is required, we can assume that this tree is stored implicitly in an array: Each entry s[i] in the array represents the parent of element i

Binary Tree

The *binary search tree* is the basis for the implementation of two library collections classes, set and map, which are used in many applications


Ensembles d'études connexes

United Nations Sustainable Development Quiz

View Set

CH.3 SUPPLY AND DEMAND AND MARKET EQUILIBRIUM

View Set

History Sepoy rebellion and Opium Wars quest

View Set

15.6 Quiz: Consumer Health and Aging

View Set

Emergency Management Test 1 Terms

View Set

Biochemistry - Intro to Proteins (Ch. 5) - Test 2

View Set

Data Management Foundations - C175

View Set

cultural anthropology midterm short answers

View Set

6 - Health Insurance Policy Provisions

View Set

Accounting Systems Chapter 4 Exam Prep

View Set