Unordered Data Structures

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is an adjacent vertex in a graph?

Any vertex connected to the vertex by an incident edge

Using the convention followed by the video lessons, given three disjoint sets (1,3,5,7), (2,8) and (4,6), which set would be referenced by the value 3?

(1,3,5,7)

What is the union of the disjoint sets (1,3,5,7) and (2,8)?

(1,3,5,7,2,8)

Which of these edge lists has a vertex of the highest degree? (a, b), (a, c), (a, d), (b, d) (a,b), (b, c), (d, b), (g, b) (d,b), (g,a), (h,f), (c, e) (a, c), (e, g), (c, e), (g, a)

(a,b), (b, c), (d, b), (g, b) Vertex b has degree four

Which adjacency matrix corresponds to the edge list: (1,2), (2,3), (3,4), (1,4) (where the rows/columns of the adjacency matrix follow the same order as the vertex indices)?

0 1 0 1 0 1 0 0 1 0

What is the log*n of 2^65536?

1 + log*2^65536 1 + 1 + log*65536 1 + 1 + 1 + log*16 1 + 1 + 1 + 1 + log*4 1 + 1 + 1 + 1 + log*2 1 + 1 + 1 + 1 + 1 + log*1 1 + 1 + 1 + 1 + 1 + 0 = 5

When encoding height into the root of an up-tree, what value should be placed in element 7 of the following array? 3 | -1 | 7 | -1 | 7 | -1 | ? | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

1 -> 3 -> 7 5 -> value = -3 The value should be equal to -1 minus the height. A singleton disjoint set would have height zero but there is no -0 and 0 would point to the 0th element of the array, so we increment the height by one and negate it before storing it in the root of the up-tree.

When encoding size into the root of an up-tree, what value should be placed in element 7 of the following array? 3 | -1 | 7 | -1 | 7 | -1 | ? | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

1 -> 3 -> 7 5 -> value = -4

What three things does a hash table consist of?

1) a hash function 2) an array 3) collision handling

What is the sum of the degrees of all vertices in a graph?

2*number of edges

According to the disjoint set array representation in the video lessons, Which of the following arrays would NOT be a valid representation of the disjoint set (1,3,5,7)? See Quiz 2.4

3,-1, 5,-1,7,-1,1,-1 1, 2, 3, 4,5, 6,7,8 This is indeed not valid because there is no root of the up-tree. Element 1 points to element 3 which points to element 5 which points to element 7 which points to element 1, so no element in this disjoint set is the root and would represent the disjoint set

What is a disjoint set?

A collection of sets with unique items Elements within a set are said to be absolutely equivalent. Identity element is the element that represents the set

For which situation described here can Dijkstra's algorithm sometimes fail to produce a shortest path? You would want to avoid using Dijkstra's algorithm in this situation. A connected graph where some of the edge weights are negative and some have weight zero. A connected graph where there are multiple paths that have the same overall path cost (distance), and all of the edge weights are non-negative. A connected graph where all of the edges have the same positive weight. A connected graph where some of the edge weights are zero and the rest are positive.

A connected graph where some of the edge weights are negative and some have weight zero. There is nothing wrong with the edge weights of zero, but the negative weights are a problem. Dijkstra's algorithm, without modifications, achieves its fast running time by making certain assumptions about which paths are best. If it encounters an edge with negative weight, the assumptions fail, and it may not correctly identify the shortest path. Some people modify Dijkstra's algorithm to iterate when negative edge weights are encountered, to make corrections. However, this causes the algorithm to run very slowly in the worst case, and it's not part of the classical algorithm. (As a separate note, if there is any graph where a cycle has weights that sum to a negative value overall, then other shortest path algorithms can also fail to find a shortest path even if they are able to handle negative edge weights in some cases. That's because the graph may have paths with infinitely negative weight.)

What is an Adjacency Matrix? What's the time complexity for insert vertex remove vertex areAdjacent(v1,v2) incidentEdges(v)

A graph implementation where you a have vertex list and an edge list stored in arrays or hash tables. You also have a matrix that is n x n, where n is number of vertices. A 1 in a slot indicates there is an edge connecting two vertices. A 0 indicates no edge. Only the top right half of the matrix is filled in an undirected graph, since the bottom left is redundant. Instead of a 1, you can use a pointer to the edge in the edge list. insert vertex: O(n), where n is number of vertices remove vertex: O(n) areAdjacent(v1,v2): O(1) incidentEdges(v): O(n)

What is an Adjacency List? What's the time complexity for insert vertex remove vertex areAdjacent(v1,v2) incidentEdges(v)

A graph implementation where you have a vertex list and an edge list stored in arrays or hash tables. Vertex list: each vertex has linked list of all adjacent edges. Each edge in the linked list has a pointed to the edge in the edge list Edge list: each edge has the two vertices connected by the edge and the name of the edge. Each vertex in the edge list also has a pointer back to the edge's location in the linked list in the vertex list insert vertex: O(1), since you just add to vertex list and point to nullptr remove vertex: O(degv), which is at worst 2*m areAdjacent(v1,v2): O(min(degv1,degv2)) incidentEdges(v): O(degv)

What is an edge list? What's the time complexity for insert vertex remove vertex areAdjacent(v1,v2) incidentEdges(v)

A graph implementation where you have a vertex list and an edge list stored in arrays or hash tables. the vertex list simply has the vertex name in each slot. The edge list has the two vertices in the edge, and the name of the edge insert vertex: O(1) amortized remove vertex: O(m), m is num of edges areAdjacent: O(m) incidentEdges: O(m)

Simple graph

A graph with no self-loops or multi-edges

Which of the following data structures would be the better choice to implement a memory cache, where a block of global memory (indicated by higher order bits of the memory address) are mapped to the location of a block of faster local memory. Why? A hash table implemented with separate chaining, using an array of linked lists. A hash table implemented with double hashing. A hash table implemented with linear probing. An AVL tree.

A hash table implemented with double hashing. Double hashing would be a good strategy because the cache addresses are quite small and compactly stored in the array. Furthermore, double hashing is more efficient than linear probing, which suffers from clumping.

Suppose you are given an undirected simple graph with unweighted edges, and for a particular specification of three vertices uu, vv, and ww, you want to find the shortest path from uu to ww that goes through vv as a landmark. What is the most efficient method that can find this? A single run of Dijkstra's algorithm from uu. Two runs of Dijkstra's algorithm, first from uu and then from vv. A single run of breadth-first search from vv. Three runs of breadth-first search: once each from uu, vv, and ww.

A single run of breadth-first search from vv. A single breadth-first search from the landmark vertex finds the shortest paths from it to the start vertex and the end vertex, and since the edges are undirected, their combination is the shortest path from start to end that also visits the landmark. It's not necessary to use Dijkstra's algorithm in this case since the edges are unweighted.

Suppose you have a rapid data feed that requires you to remove existing data point vertices (and any of their edges to other vertices) quickly to a graph representation. Which graph representation would you WANT to utilize? Edge List Adjacency Matrix Adjacency List All three representations have the same time complexity for removing a vertex from a simple graph of n vertices.

Adjacency List Since the adjacency list has a list of the edges the removed vertex shares with other vertices, it only needs time proportional to the degree of the removed vertex. In the worst case, that vertex could be connected to all of the other vertices and so require O(n) time, but in the typical case the degree will be less and the adjacency list is a better choice than the adjacency matrix.

Suppose you want to implement a function called neighbors(v) that returns the list of vertices that share an edge with vertex v. Which representation would be the better choice for implementing this neighbors() function? Edge List Adjacency Matrix Adjacency List All three representations result in the same time complexity for the neighbor() function.

Adjacency List The adjacency list requires a simple walk through the list of pointers to adjacent edges to find the neighboring vertices. This representation has an "output sensitive" running time meaning it runs as fast as possible based on the minimum amount of time needed to output the result.

Suppose you want to implement a function called neighborsQ(v1,v2) that returns true only if vertices v1 and v2 share an edge. Which representation would be the better choice for implementing this neighborsQ() function? Edge List Adjacency Matrix Adjacency List All three representations support the same time complexity for implementing the neighborQ() function.

Adjacency Matrix The neighborsQ(v1,v2) function can simply lookup the appropriate v1,v2 entry in the adjacency matrix, which takes constant O(1) time. This representation supports the fastest method for implementing this query.

Suppose you have a rapid data feed that requires you to add new data point vertices quickly to a graph representation. Which graph representation would you NOT want to utilize? Edge List Adjacency Matrix Adjacency List All three graph representations have the same time complexity for adding vertices to a simple graph.

Adjacency Matrix The adjacency matrix requires linear time, O(n), to add a vertex because the addition requires new entries to be placed in a new row and a new column of the matrix, and there are n elements in the new row and n elements in the new column. This means that as the number of vertices grows in the graph, it will take longer to add a new vertex, which is not a very good choice when processing a data feed.

Which of these algorithms can be used to count the number of connected components in a graph? Count the number of times a breadth first traversal is started on every vertex of a graph that has not been visited by a previous breadth first traversal. Count the number of times a depth first traversal is started on every vertex of a graph that has not been visited by a previous breadth first traversal. All of the above None of the above

All of the above

Which graph representation has a better worst-case storage complexity than the others for storing a simple graph of n vertices? Edge List Adjacency Matrix Adjacency List All three graph representations have the same worst-space storage complexity for a simple graph of n nodes.

All three graph representations have the same worst-space storage complexity for a simple graph of n nodes. All three require O(n^2) storage in the worst case. The adjacency matrix requires O(n^2) space to store at least the upper-triangular portion of the n x n matrix. Both the edge list and adjacency list representations require O(n + m) storage, but in the worst case m is proportional to n^2 and O(n + n^2) = O(n^2).

Which of the following data structures would be the better choice to implement a dictionary that not only returns the definition of a word but also returns the next word following that word (in lexical order) in the dictionary. An AVL tree. A hash table implemented with double hashing. A hash table implemented with linear probing. A hash table implemented with separate chaining, using an array of linked lists.

An AVL tree. While the AVL tree needs O(log n) time to find the definition of the word, which is worse than the performance of a hash table, the AVL tree can find the next word in lexical order in O(log n) time whereas any hash table would need O(N) steps to find the next word in lexical order.

Dijkstra's algorithm

An SSSP (single source shortest path) algorithm for finding the shortest paths between nodes in a weighted graph. Works on undirected or directed. Unconnected or connected. Does not work when there are negative weights For a given source node in the graph, the algorithm finds the shortest path between that node and every other. It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined. Its time complexity is O(m + nlogn), where E is the number of edges and V is the number of vertices.

Kruskal's Algorithm

An algorithm to get the minimum spanning tree Create minheap of edges based on weights. Use disjoint sets to create minimum spanning tree. Remove min from minheap repeatedly unioning it with the current set, but only if the nodes in the edge are not already in the set. Runs in O(mlogm) You can also use a sorted array, and this will still be O(mlogm)

Prim's Algorithm

An algorithm to get the minimum spanning tree sparse graph: (O(mlogm), since m is about n in a sparse graph. where m is number of edges and n is the number of vertices) dense graph: O(n^2*logn) Starting from a vertex, grow the rest of the tree one edge at a time until all vertices are included. Greedily select the best local option from all available choices without regard to the global structure. You can do this by using a minheap of edges with the current min weight to add them to the current tree.

What is an incident edge in a graph?

An edge connected to a vertex

In a hash function, when should you use the different collision-handling techniques? When should you use an AVL tree instead?

Big records - Use separate chaining, because it will take a lot of time to copy the record into the array. A linked list would be better Structure speed (really efficient hashing with great runtime complexity) - Use double hashing. Range finding/nearest neighbor - Use AVL tree. Hash table is terrible for this.

Which traversal method has a better run time complexity to visit every vertex in a graph? Breadth First Traversal Depth First Traversal Both have the same run time complexity. Neither traversal method will necessarily visit every vertex in a graph.

Both run in O(n+m) for n vertices and m edges

What is the union operation in a disjoint set?

Combines two sets. Now all elements share the same identity element, so find operation would be equivalent.

What are the components of a good hash function?

Compresses the value to fit into the memory array. Computation time must be O(1) Deterministic: Gives same result every time Satisfy the Simple Uniform Hashing Assumption. h(key1) == h(key2) probability is 1/N, where N is capacity of array

Which of the following is a true statement about Dijkstra's algorithm? Assume edge weights (if any) are non-negative. Dijkstra's algorithm finds the shortest unweighted path, if it exists, between a start vertex and any other vertex, but only for an undirected graph. Dijkstra's algorithm finds the shortest weighted path, if it exists, between a start vertex and any other vertices, but only for an undirected graph. Dijkstra's algorithm finds the shortest weighted path, if it exists, between a start vertex and any other vertices in a directed graph. Dijkstra's algorithm finds the shortest weighted path, if it exists, between all pairs of vertices in a directed connected graph.

Dijkstra's algorithm finds the shortest weighted path, if it exists, between a start vertex and any other vertices in a directed graph.

Difference between Djikstra's and Prim's Algorithm

Djikstra's - keep track of sum of weights. Used to find shortest path from start vertex to all other connected nodes Prim's - Used to find minimum spanning tree

When using double hashing to store a value in a hash table, if the hash function returns an array location that already stores a previous value, then a new array location is found as the hash function of the current array location. Why? Only one additional hash function is called to find an available slot in the array whereas linear probing requires an unknown number of array checks to find an available slot. Since the hash function runs in constant time, double hashing runs in O(1) time. Double hashing reduces the clumping that can occur with linear probing. Double hashing reduces the chance of a hash function collision on subsequent additions to the hash table.

Double hashing reduces the clumping that can occur with linear probing. The subsequent hash functions spread out the storage of values in the array whereas linear probing creates clumps by storing the values in the next available empty array location, which makes subsequent additions to the hash table perform even worse.

Which graph representation would be the best choice for implementing a procedure that only needs to build a graph from a stream of events. Edge List Adjacency Matrix Adjacency List All three representations would share the same storage and time complexity for the procedure.

Edge List The Edge List performs worse in general than the Adjacency Matrix and the Adjacency List representations, but it is much simpler and easier to implement. It also takes less space than the alternatives, and can insert vertices and edges in constant time. The adjacency list can also insert vertices and edges in constant time, but if those are the only operations needed, then one need not waste space and additional code on building the adjacency list on top of the edge list.

Compare the complexity of edge list, adjacency matrix, and adjacency list space insertIndex removeVertex insertEdge removeEdge incidentEdges areAdjacent

Edge list | Adjacency matrix | Adjacency list space: O(n+m) | O(n^2) | O(n+m) insertVertex: O(1) | O(n) | O(1) removeVertex: O(m) | O(n) | O(degv) insertEdge: O(1) | O(1) | O(1) removeEdge: O(1) | O(1) | O(1) incidentEdges: O(m) | O(n) | O(degv) areAdjacent: O(m) | O(1) | O(min(degv1,degv2))

When storing a new value in a hash table, linear probing handles collisions by finding the next unfilled array element. Which of the following is the main drawback of linear probing? If the hash function returns an index near the end of the array, there might not be an available slot before the end of the array is reached. There may not be an available slot in the array. The array only stores values, so when retrieving the value corresponding to a key, there is no way to know if the value at h(key) is the proper value, or if it is one of the values at a subsequent array location. Even using a good hash function, contiguous portions of the array will become filled, causing a lot of additional probing in search of the next available unused element in the array.

Even using a good hash function, contiguous portions of the array will become filled, causing a lot of additional probing in search of the next available unused element in the array. This happens because the hashing distributes values uniformly in the array, but the linear probing fills in gaps between the locations of previous values, which makes the situation worse for later values added to the array.

T or F: a connected directed graph with no cycles is a tree.

False For example, a directed graph such as A -> B, A --> C, B --> D and C --> D is connected and has no cycle, but is not a tree because there are multiple paths from vertex A to vertex D.

Which one of the following four hashing operations would run faster than the others? Finding a value in a hash table of 100 values stored in an array of 1,000 elements. Finding a value in a hash table of 4 values stored in an array of 8 elements. Finding a value in a hash table of 2 values stored in an array of 2 elements. Finding a value in a hash table of 20 values stored in an array of 100 elements.

Finding a value in a hash table of 100 values stored in an array of 1,000 elements. The load factor is 100/1,000 = 0.1 which is less than the other options.

Let G = (V,E) be a simple graph consisting of a set of vertices V and a set of (undirected) edges E where each edge is a set of two vertices. Which one of the following is not a simple graph? G = ( V = (a,b,c), E = ((a,b)) ) G = ( V = (a,b,c), E = ((a,b),(b,c),(a,c)) ) G = ( V = (a,b,c), E = ((a,b), (a,c), (b,a), (b,c), (a,c), (b,c)) ) G = ( V = (a,b,c), E = () )

G = ( V = (a,b,c), E = ((a,b), (a,c), (b,a), (b,c), (a,c), (b,c)) ) This is not a simple graph because the same edge between a and b appears twice, once as (a,b) and a second time as (b,a). Since these are sets, (a,b) == (b,a).

In a BFS, how do you get number of disjoint graphs?

Increase count for number of components each time BFS is called when going through vertex list. You only call bfs if the vertex is unexplored. You will only call BFS once if all vertices are connected

What is Separate Chaining? What is the time complexity of insert and remove/find in worst and average case?

It is one way to manage collisions in a hash function. When you have a collision, you insert the value at the head of the linked list at that memory location. insert: O(1) both cases remove/find: O(n) worst; O(n/N) average

Which of these is considered the least run-time complexity? O(1) O(log* N) O(log N) O(log log N)

O(1)

In a BFS, how do you determine if there is a cycle?

Mark edges as discovery and cross-edges. A cross-edge indicates a cycle.

Minimum edges on: not connected graph: connected graph: Maximum edges on: simple graph: not simple graph:

Minimum edges on: not connected graph: 0 connected graph: v-1 Maximum edges on: simple graph: O(v*(v-1)/2) = O(v^2) not simple graph: infinite

Suppose you have a good hash function h(key) that returns an index into an array of size N. If you store values in a linked list in the array to manage collisions, and you have already stored n values, then what is the expected run time to store a new value into the hash table?

O(1) Storing a new value takes constant time because the hash function runs in constant time and inserting a new value at the head of a linked list takes constant time.

What is the ideal and worst-case runtime of finding in a disjoint set implemented as uptrees? What if we implement with smart union and path compression?

O(h), where h can be n in the worst case of a linked list. An ideal uptree is very flat. All nodes point to the identity element. with smart find and union, after any sequence of m union and find operations, the worst case runtime becomes O(mlog*n), where n is the number of items in the disjoint set. So, this is very close O(m). So this is amortized constant to more a single find or union operation. log*(n) = 0 for n<=1 1 + log*(logn) for n > 1

Which of the following is the optimal run time complexity to find the shortest path, if it exists, from a vertex to all of the other vertices in a weighted, directed graph of n vertices and m edges. O(m + lg n) O(n) O(m + n) O(m + n lg n)

O(m + n lg n) This is the running time for Dijkstra's algorithm which is optimal.

Suppose you have a good hash function h(key) that returns an index into an array of size N. If you store values in a linked list in the array to manage collisions, and you have already stored n values, then what is the expected run time to find the value in the hash table corresponding to a given key?

O(n/N) This is the "load factor" of the hash table, and is the average length of the linked lists stored at each array element. Since the lists are unordered, It would take O(n/N) time to look at all of the elements of the list to see if the desired (key/value) pair is in the list.

For a simple graph with n vertices, what is the worst case (largest possible) for the number of edges, in terms of big Oh?

O(n^2) Recall that the adjacency matrix has one entry per edge in its upper triangular portion. There are n^2 elements in the n x n adjacency matrix, and about 1/2 n^2 elements in its upper triangular portion, and O(1/2 n^2) == O(n^2).

What is linear probing?

One collision-handling strategy in a hash function that linearly moves through the memory locations when there is a collision, until an empty slot is found. find: O(1) average, O(n) worst case

Which elements encountered by a breadth first search can be used to detect a cycle in the graph? Unexplored edges to unexplored vertices that remain so after completion of the breadth first search. Previously visited vertices that have been encountered again via a previously unexplored edge. Discovered edges that were previously unexplored by the traversal have been added to the breadth-first traversal. Unexplored vertices that have been encountered by the traversal of a previously unexplored edge.

Previously visited vertices that have been encountered again via a previously unexplored edge. A breadth first traversal returns a spanning tree of each connected component of the graph. Any edge that is not part of the breadth first search (e.g. not marked discovered) will connect one portion of the tree to another forming a cycle. Thus all unexplored edges, including ones ignored because they reach a previously visited vertex will create a cycle if added to the breadth first search.

The SUHA states

SUHA = Simple Uniform Hashing Assumption P(h(key1) == h(key2)) = 1/N.

What is the degree of a node in a graph?

The number of incident edges

Given a hash function h(key) that produces an index into an array of size N, and given two different key values key1 and key2, the Simple Uniform Hashing Assumption states which of the following? The probability that h(key1) == h(key2) is 1/N. The probability that h(key1) == h(key2) is 0. If h(key1) == h(key2) then h needs a running time of O(lg N) to complete. If h(key1) == h(key2) then h needs a running time of O(N) to complete.

The probability that h(key1) == h(key2) is 1/N.

The breadth first traversal of a connected graph returns a spanning tree for that graph that contains every vertex. If the graph has weighted edges, which of the following modifications is the simplest that produces a minimum spanning tree for the graph of weighted edges. No modification is necessary because a breadth first traversal always returns a minimum spanning tree. The queue is replaced by a priority queue that keeps track of the total weight encountered by the current traversal plus each of the edges that connects a vertex to the current breadth first traversal. The queue is replaced by a priority queue that keeps track of the least-weight edge that connects a vertex to the current breadth first traversal. An ordinary breadth first traversal is run from each vertex (as its start vertex) and the resulting spanning tree with the least total weight is the minimum spanning tree.

The queue is replaced by a priority queue that keeps track of the least-weight edge that connects a vertex to the current breadth first traversal. A minimum spanning tree for a weighted graph can be found through a greedy breadth-first algorithm that simply chooses from the entire queue the least weight edge to add.

A breadth first traversal starting at vertex v1 of a graph can be used to find which ones of the following? The shortest path (in terms of # of edges) between vertex v1 and any other vertex in the graph. The shortest path (in terms of # of edges) between any two vertices in the graph. All of the above. None of the above.

The shortest path (in terms of # of edges) between vertex v1 and any other vertex in the graph.

What is re-hashing?

This occurs when you fill the array used for a hash function. You have to move all the values to a new array, and as a result, you need to change the hash function to maintain SUHA, then you need to rehash all values in the array

What happens when you take the union of two disjoint sets that contain the same value?

Two different disjoint sets by definition can never share the same value. Disjoint sets represent a partitioning of unique values into subsets that do not have any items in common. That is, each value belongs to exactly one of the sets. This is why each element can be used as an array index look up its "up-tree" parent, which represents the set the element belongs to.

How can you efficiently implement a disjoint set? How can you ensure optimal unions? How can you make finds more efficient?

Use a graph structure. Store values in an array. The index is the value of an element. A value of -1 is the representative element of a set, which we will call an uptree. A value besides -1 is the value of another element in the set. So, each element will point to another element in the uptree, and they will all have a path to the identity element. Smart Union: Instead of -1, you can also use either the negation of the height minus 1 or the number of elements in the array. You need to use height minus -1, because a one element tree will have a height of 0, and you can't have -0. Tracking the height or size allows for efficient unioning. You want to point the smaller to the larger. Smart find, Path compression: You can be even more efficient by updating the node identity elements when doing find operation. This way all elements in an uptree can point to the identity element.

How do you do a BFS of a graph? What is the runtime?

Use a queue. Maintain a list of visited nodes to avoid cycles. runtime is O(n+m), but m can be n^2 if there is the max number of edges in a simple graph

How do you do a DFS of a graph? What is the runtime?

Use a stack. Runtime is O(n+m), but m can be n^2 if there is the max number of edges in a simple graph. Edges are called discovery and back edges.

When computing the union of two disjoint sets represented as up-trees in an array, (using proper path compression) which of these strategies results in a better overall run time complexity than the other options? Always make the up-tree with fewer elements a subtree of the root of the up-tree with more elements. Always make the up-tree with a shorter height a subtree of the root of the up-tree with a larger height. The overall run time complexity is not affected by which up-tree is chosen to become a subtree of the other up-tree. Using either size or height strategies above results in the same overall run time complexity.

Using either size or height strategies above results in the same overall run time complexity.

Which of the following best describes "path compression" as described in the video lessons to accelerate disjoint set operations? (Here we say "parent pointer" to mean whatever form of indirection is used to refer from a child to its parent; this could be a literal pointer or it could be an array index as in the lectures.) When the root of an element's node is found, all of the descendants of the root have their parent pointer set to the root. When the root of the up-tree containing an element is found, both the element and its parent will always have their parent pointers set to point to the root node. When traversing the up-tree from an element to its root, if any elements in the traversal (including the first element, but excluding the root itself) do not point directly to the root as their parent yet, they will have their parent pointer changed to point directly to the root. When the root of the up-tree containing an element is found, the element and all of its siblings that share the same parent have their parent pointers reset to point to the root node.

When traversing the up-tree from an element to its root, if any elements in the traversal (including the first element, but excluding the root itself) do not point directly to the root as their parent yet, they will have their parent pointer changed to point directly to the root. That's right: Path compression only flattens the lineage of nodes in an up-tree from an element to the root, and not all of the elements in the up-tree every time. This has amortized benefits as the data structure is optimized over the process of several union and find operations

How can you manage your runtime complexity when using linear probing or double hashing for collision handling in a hash function?

You have to manage the load factor (n/N), because the runtime is proportional to the load factor and not n. If you expand the array every time the load factor reaches a certain value, you can keep a constant runtime O(1) for the operations. The runtime increases exponentially as the load factor approaches 1.

What is the find operation in a disjoint set?

find operation: find(4) would find set with 4 in it and return the identity element. So, find of two elements in same set is equal.

According to video lesson 1.1.2, which of the following is a good hash function h(key) that translates any 32-bit unsigned integer key into an index into an 8 element array? int h(uint key) { int index = 5; while (key--) index = (index + 5) % 8 return index; } int h(uint key) { return key & 7; } int h(uint key) { return rand() % 8; } int h(uint key) { return max(key,7); }

int h(uint key) { return key & 7; } (Note that an expression like "2 & 3" uses the bitwise-AND operator, which gives the result of comparing every bit in the two operands using the concept of "AND" from Boolean logic; for example, in Boolean logic with binary numbers, 10 AND 11 gives 10: for the first digit, 1 AND 1 yields 1, while for the second digit, 0 AND 1 yields 0. An expression like "4 % 8" uses the remainder operator that give the remainder from integer division; for example, 4 % 8 yields 4, which is the remainder of 4/8. In some cases, these two different operators give similar results. Think about why that is.) This always generates the same output given the same input, and it has a uniform chance of collision. It also runs in constant time relative to the length of the input integer (that is, relative to the number of bits, without respect to the magnitude of the integer). Note that in binary, the number 7 is 0000...0111. (The leading digits are all zero, followed by three 1 digits, because these place values represent 4+2+1.) When you do "key & 7", the result will have leading zeros, and the rightmost three digits will be the same as those of key. Because this results in values between 0 and 7, it's similar to taking the remainder of division by 8. That is, "key & 7" should give the same result as "key % 8". Bitwise operations like this can be somewhat faster than arithmetic operations, but you have to be careful about the specific data types and the type of computing platform you are compiling for. Note that this trick only works for some right-hand values as well, based on how they are represented in binary. These tricks are not always portable from one system architecture to another.

Recall that the iterated log function is denoted log*(n) and is defined to be 0 for n <= 1, and 1 + log*(log(n)) for n > 1. Let lg*(n) be this iterated log function computed using base 2 logarithms. Blue Waters, housed at the University of Illinois, is the fastest university supercomputer in the world. It can run about 2^53 (about 13 quadrillion) instructions in a second. There are about 2^11 seconds in half an hour, so Blue Waters would run 2^64 instructions in about half an hour. Which one of the following is equal to lg*(2^64)?

lg*(2^64) = 1 + lg*(64) = 1 + 1 + lg*(6) = 1+ 1 + 1 + lg*(~2.6) = 1 + 1 + 1 + 1 + lg*(1.4) = 1 + 1 + 1 + 1 + 1 + lg*(0.5) = 5

In C++, what's the difference between the std::map and the std::unordered_map?

map has the lower_bound(key) and upper_bound(key) methods which return the first element <= and the first element > the key. The map is a red-black tree structure. unordered_map is a hash function, so the lower and upper bound methods don't exist. Both support operator[], erase, and insert.

How much farther can a cross-edge take you from the root?

no more than 1 farther

What structure is formed by the discovery edges in a BFS?

spanning tree


Kaugnay na mga set ng pag-aaral

Foundations of Information Systems chap 1 sammary

View Set

Chapter 1: Introducing Health Psychology

View Set

A&P II Exam 4 Chapter 26: Fluid, Electrolyte, and Acid-Base Balance

View Set

(Med-Surg I) Final Exam Multiple Choice Questions

View Set

Final Exam US history comprehensive chapter 1 - 15

View Set