Data and Algorithms Final

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Disjoint Paths

"Disjoint paths" are the distinct ways to get from point A to point B. Disjoint paths can be found using max flow: - set capacity of each edge to 1 - find max flow: edges with flow make up the paths

Runtime of Scaling Max-Flow

(|E|^2 log C).

Maintaining Red-Black BST properties ensures balance

** See lecture 19 for fixing a Red-Black BST after insertion

Runtime of Ford-Fulkerson

- Do Breadth-First Search to find path on residual graph: O(|E|) - Modify the residual graph to change edge capacities: O(|E|) - At most, C rounds of the algorithm (C is the capacity of all edges coming out of the source node) The final runtime is O(2|E|C) which is simplified to O(|E|C). (pseudo polynomial)

Open Pit Mining Problem

- Each unit of earth has a profit (possibly negative) - Getting to the ore below the surface requires removing the dirt above - Test drilling gives reasonable estimates of costs - Plan an optimal mining operation Best Approach: - vertices for each choice, with cost/payout - if costs > payout, a min cut cuts the payouts. If payout > costs, a min cut cuts all the paths you need to take.

Some examples of NP problems

- Is the number N composite (not prime)? The certificate would be the factors. - A similar but harder problem is factoring. It's still checkable in poly time, so still in NP. - Is there a path from s to t in this graph? (this is also in P - you don't need the certificate to find the path. You can just DFS or BFS)

Binary Flips with Potential Method

- Let the potentials be the number of 1's: This starts at 0 and is always non-negative, so it's a valid "credit". Also, 1 measures the amount of work we might do with each operation.

NP-complete problems

- NP-complete problems are in the NP-hard and NP categories - If you could solve one in polynomial time, you could solve them all in polynomial time - we currently believe that there are no polynomial solutions to any NP-complete problem. If all NP problems had poly time solutions, then "P = NP."

How to set up a Push-Relabel

- Set all heights h[v], flows f(u,v), excesses e[v] to 0. - Set height of source s to |V| - Put maximum flow on the neighbors of source s: set flows on these edges to maximum, set excesses of neighbors appropriately. (This isn't really a 'push' because the height difference is V) - while we can push or relabel; choose an operation and do it. Ex: Red arrow: Backwards edge Red number: Excess flow Number in vertex: Height

Extracting the minimum from fibonacci trees

- Use an array to index the roots by degree - consolidate roots with the same degree - extract the min after consolidation is complete

Unioning/merging two fibonacci heaps

- combine the roots - make the minimum the smaller of the two mins

Deleting from a BST

- if no children, set the "node to delete" to null - if one child, replace the "node to delete" with the chid - if two children, replace the "node to delete" with its successor

Rules for Red-Black BSTs

- the root must be black ("black root property") - all leaves must be black - all red nodes must have black children ("red parent property") - black nodes are allowed to have black children - For all nodes, all paths to descendant leaves have the same number of black nodes. ("black-height property") - no two red nodes back-to-back

Two options for solving a max flow problem when your edge weights are very large:

1. Push-relabel lets us ignore edge weights 2. Ford-Fulkerson variant for a logarithmic runtime: Scaling Max-Flow

Three Kinds of Amortized Analysis

1. The Aggregate Method 2. The Accounting Method 3. The Potential Method Each of these is just an analysis trick that will let us argue for tighter bounds.

Binary Heaps

A binary heap is a complete binary tree that either has the minimum or the maximum node as the root. A heap is not a sorted structure and can be regarded as partially ordered. As you see from the picture, there is no particular relationship among nodes on any given level, even among the siblings. Binary heaps aren't so good when we have to merge - binomial heaps and fibonacci heaps are better

Binary Search Tree

A binary tree with the property that for all parent nodes, the left subtree contains only values less than the parent, and the right subtree contains only values greater than the parent.

Push-Relabel

A faster max flow with a runtime of O(|V|2|E|). - Flow can only be pushed downhill by 1 - relabel raises the height of a column to the height of the shortest neighbor plus one. - Flow can only be pushed if a vertex is overflowing, the edge has enough capacity, and the height of the neighbor is downhill by 1.

Scaling Max-Flow

A good option for when edge weights are very large. Augment through the biggest bottlenecks first, and save the small stuff for later. There are at most log base 2 C iterations of the outer loop, and each Δ-round takes at most 2|E| augmentations.

Ford-Fulkerson Algorithm

A greedy, optimal way to compute the maximum flow. As long as there is a path from the source to the drain, with available capacity on all edges in the path, we send flow along one of the paths, and make a backwards edge. Then we find another path, and so on. F-F builds a residual graph G'.

Binomial Heap

A linked list of binomial trees, each obeying the min-heap property.

Radix Trees/Tries

A search tree for strings (or other sequences) where nodes correspond to different possible prefixes. Used to find the string's place in the tree in time that's proportional to the string's length. Faster than binary search trees for strings. Hashmaps are ok, but you have to traverse the entire string before hashing, while with radix trees (tries) you can store values immediately. You can also sort a radix tree quickly.

What data structure allows us to maintain a median with O(log N) expected time to update and insert?

A skiplist

Binomial Tree

A tree comprised of other trees. Tree Bk has a height of k, and 2^k nodes. The root has the most children. The "degree"/"rank" of a Binomial Tree is the number of tiers, not counting the root. Max degree of any node is log n.

When can you "relabel" in push-relabel?

A vertex can only be relabeled if a vertex is overflowing and no neighbor has a smaller height. At this point the vertex can be relabeled to have a height of the smallest neighbor plus one.

Using Network Flow to solve Bipartite Matching

Add a source and a drain, set all capacities to 1, and find the max flow.

NP-Hard Problems

All problems that, if solved in polynomial time, would let us solve any problem in NP in polynomial time. We think there are no polynomial-time algorithms for solving any NP-Hard problems. (Example of NP-hard but not NP: "what will this program do if we run it")

Residual Graphs

Allows us to find the max flow by giving us the ability to reverse our decisions. Whenever we add flow to an edge, we can add a 'backwards edge' as an 'undo' option. So we simply hunt for forward paths, reversing each new edge, until we can no longer find any forward paths. The final flows are then equal to the weights of the backwards edges!

Amortized Cost using the Potential Method

Amortized Cost of an operation = (true cost) + (change in potential)

Potential Analysis of Extract Min on fibonacci trees

Amortized Cost of an operation = (true cost) + (change in potential) Here the potential function is Φ(H) = t(H) + 2m(H) Let D(n) be the maximum degree of any root for a heap of size n. Actual cost is O(D(n) + t(H)) — one merge per original tree plus each child of the extracted min Original potential t(H) + 2m(H) So the final amortized cost is O(D(n)), which is actually O(log n)

How is Amortized Analysis different from the Average Cost?

Amortized analysis is different from the expected value/average cost because every now and then, you will incur a big cost.

What is the difference between Arrays and Arraylists?

Arrays are a fixed size, while ArrayLists can be added to. ArrayLists work this way because they initially take up a large block of memory as 'credit' and then assume their true size after you're done adding/removing elements. You're basically recopying all elements into a new array whenever you change something.

Doubly-Linked siblings help extract the minimum from Fibonacci trees

As with binomial heaps, lop off the min root and add its children to the root list. - consolidate roots with equal degree - Keep an array of roots indexed by degree: if there is a collision, merge (make the smaller one the parent of the larger) and try again at degree++ This takes O(log n) time and the potential does not change.

What data structure allows us to minimize the number of times nodes must be accessed from a slow medium?

B-trees

Binomial Trees vs Fibonacci Trees

Compared with binomial heaps, the structure of a Fibonacci heap is more flexible. The trees do not have a prescribed shape and in the extreme case the heap can have every element in a separate tree. Fibonacci trees also have a better amortized running time. Both heaps are simply collections of binomial trees. - Min-heap property in each tree - A merge/consolidate operation that combines similar trees - Efficient union: O(log n) for binomial, O(1) for Fibonacci - While binomial heaps keep things neat all the time, Fibonacci heaps leave things unordered and only "clean up" on ExtractMin - Fibonacci heaps have great amortized asymptotic time, but be careful about using them in practice

Binary Flips with the Accounting method

Cost of setting a bit to 1 - 2 (here is where we prepay) Cost of setting a bit to 0 - 0 We maintain a positive credit because as we count upwards in our binary number, each bit must flip to 1 before it flips back to 0. So again, for an N bit number where the cost is, at worst, 2 for each, the worst cost is 2N. For N flips in amortization we are looking at O(2N)/N or O(2), which is the same as O(1) constant time. Big improvement over N log N.

What algorithm allows us to sort in linear time, as long as all the integers are positive?

Counting Sort

B-Tree

Extremely "wide" trees where internal nodes store the values between each of their children. Used in filesystems and databases to avoid costly disk seeks. Maintain a number of children between t and 2t at each node. The height of a B-Tree with N keys and minimum t children at each node is at most log base t of (N+1)/2.

What data structure allows us to implement a priority queue that can merge with another in constant time?

Fibonacci Heap

Insertion into a Radix Tree (Trie)

Find the longest common prefix with an already stored word, then branch where they diverge. May require the creation of a new node. Ex. Insert "That"

Merging/Unioning two Binomial Heaps

First, "merge" the two lists of trees to create a single list of trees. Next, step through this list of trees and perform "addition" of the trees for each index k. - if there's only 1 tree at any given k, just leave it alone - if there's two trees for a k, make the tree with the smaller root the parent of the other This takes O(log N) since that's the number of roots.

Binary flips with Aggregate Method

Flipping all bits in an N-length binary number would take O(N log N) in regular analysis. But, if we look at it from an Aggregate point of view, N bits take O(N) flips, and O(N)/N is equal to O(1) flips per increment.

When can you "push" in push-relabel?

Flow can only be pushed if a vertex is overflowing, the edge has enough capacity, and the height of the neighbor is downhill by 1.

What is the maximum number of relabels we might have to do in a Push-Relabel?

For V vertices, we would have to do, at most, V^2 relabels.

Saturating/Nonsaturating Pushes

In Push-Relabel, a saturating push is one that maxes out an edge. Intuitively, a nonsaturating push is one that gets rid of all a vertex's overflow without maxing out an edge.

How long does it take Ford-Fulkerson to find a min cut?

Ford-Fulkerson can find a min cut in pseudopolynomial O(|E|C) time. (Min cut is the same as max flow). Do a BFS or DFS on the residual graph after running Ford-Fulkerson, it's only O(|E|) time more.

Ford-Fulkerson Pseudocode

Ford-Fulkerson is optimal.

What algorithm should you use to solve the project selection problem, where each project has a positive or a negative payoff/prerequisite?

Ford-Fulkerson or Push-Relabel

Decreasing a Node in a Fibonacci Tree

Given a node in a Fibonacci heap to decrease, decrease the key and check whether the key is now smaller than the parent. If it is, move the node to root level and unmark it. (If the node is smaller than the heap min, make it the min) - If the parent is not marked, mark it. - If the parent is already marked, move it to root level and unmark it, then recur to its parent, marking or recurring as necessary.

Red-Black BSTs

Helps to ensure that the tree is roughly balanced after all insertions and deletions, maintaining O(log n) search time. The root must always be black. Maintain an invariant that the number of black nodes on a path to a leaf is the same no matter which leaf you choose. Thus, no path to a leaf gets too long.

"Find" in a BST

If the current node is not the element you're looking for, search in the left if the node you're looking for is less than the current node, and search in the right if the node you're looking for is greater than the current node. If you hit a "null", return null - the element you're looking for is not in the BST.

Runtime of Bipartite Matching

If the maximum number of matches is N, the runtime of bipartite matching using max flow is O(|E|N), similar to ford-fulkerson's O(|E|C).

Multipop with the Aggregate method

If we have a stack of N items with the usual push and pop operations, and we want to pop k items, in the worst case scenario this should take O(N). To pop all items and run an operation using non-amortized analysis would take O(N) * O(N) time, so O(N^2) But if we look at it using aggregate amortized analysis, this really can't possibly take O(N^2) time because you can't multipop all the elements N times in a row. With amortized analysis, we'd expect O(N)/N = O(1), or constant time for each operation, and we would expect N calls to take O(N) work.

Return the Max/Min in a BST

If you want the minimum - return the last node in the left branches. If you want the maximum - return the last node in the right branches.

Amortized analysis

Imagining "down payments" towards the infrequent costly operations, spread out over all the calls so that the worst case scenario is less of a hit.

"Insert" in a BST

Insert is the same as find, except for when you encounter a "null" you simply replace it with the element you're inserting. And if the element you're inserting already exists in the BST, make your inserted element the child of the existing element.

Potential Function for Inserting into a Binomial Heap

Let the potential function be Φ(N) = # of trees t in binomial heap. Let m be the number of trees merged during the insert The actual cost c of an insert is 1 + m (merge until finally inserting a tree with no partner) The potential change Φ(N) - Φ(N') is 1 - m (one is for the tree we add) Amortized cost c + Φ(N) - Φ(N') = 2.

Runtime of Disjoint Paths

Like Ford Fulkerson O(|E|C), but where C is the maximum number of disjoint paths. The maximum number of disjoint paths is the same as the max flow. So the runtime for Disjoint Paths is O(max_flow * E)

Loop Invariant for Push-Relabel

Loop invariant: The residual graph represents a valid preflow (flow with possible excess on vertices)

What data structure allows us to implement a priority queue with Log N time to extract the minimum?

Min Heap - keeps the minimum at the top.

Worst-case runtime of Iterating through all neighbors N of all vertices V

O(|E|), a special case.

What is the maximum limit Φ(G) can increase to in push-relabel?

O(|V|^2|E|). This is also Push-Relabel's running time.

Complexity Classes

P: The class of all problems solvable in polynomial time. NP: The class of all problems checkable in polynomial time. NP-Hard: All problems that, if solved in polynomial time, would let us solve any problem in NP in polynomial time! NP-complete: Intersection of NP and NP-hard.

What Max Flow argument can ignore the magnitude of weights?

Push-Relabel. Useful for when edge weights are very large.

The Accounting Method of Amortized Analysis

Shift the costs from one operation to another. Can be thought of as a "credit" system, where you prepay for credits in advance, and then detract from your total credits with each operation. Overpaying for credits is fine, but you can't use more credits than you have (illegal to obtain a negative balance).

The Potential Method of Amortized Analysis

Similar to the Accounting Method where you shift the costs from one operation to another, but the Potential Method uses a "bank" of prepayments for analysis. the Potential method tracks what our remaining "balance" is. How can we find what cost to assign each operation? By measuring the amount of work the most expensive operation might have to do in the future. Amortized Cost of an operation = (true cost) + (change in potential)

Finding the minimum value in a Binomial Heap

Since all the binomial trees that make up a heap are organized in min-heaps, all you have to do is take the minimum of all the roots. But removing this is a different story - *****

Cook-Levin Theorem

States that any problem in NP can be reduced to polynomial time by a deterministic Turing machine that can determine whether a Boolean formula is satisfiable. A yes/no question can make any NP-problem solvable in polynomial time.

Finding "Successors" in a BST

Suppose we want to find the node with the next highest value . (useful for iterating in a sorted order) - If there's a right subtree, return its min. - If there's no right subtree but we're a left child, return the parent. - If we're a right child, iterate up until we're a left child; return the parent. - All the way up, no left child? We were the max; return null.

The Aggregate Method of Amortized Analysis

Take the total work and divide it by N number of calls - T(N)/N . Works well when each call has a different cost, or when the most expensive call can't occur on every iteration (ex - "delete everything in this data structure" can't happen twice in a row)

Amortized cost of adding to an ArrayList with Potential Method

The "copyarray" is double the size of your initial array (size N). So the amortized cost of adding a new element is 2N + N or 3N, so 3. Let Φ(N) = 2(N - array.length/2) (N is the number of items in the array) 0 at half full N when array full (1 + N) + (lose N potential, then gain 2) = 3 ** This doesn't really make any sense - review lecture 18

The Invariant in for how red-black BSTS are guaranteed to keep a tree balanced

The Invariant: - for any path from root to leaf, the number of black nodes is the same. The root must be black for this to be true.

The big O or the big Theta of doing multiple things

The The big-O or big-Θ is the worse of the two.

When can the Push-Relabel algorithm terminate?

The algorithm can only terminate when there is no overflow on any vertex. At termination, the residual graph represents a valid preflow with no excess: a real flow.

When does Ford-Fulkerson terminate?

The algorithm can't do better than filling up all the edges coming out of the source node. C is the capacity of all edges out of the source node. So there are at most C rounds of the algorithm

Amortized cost of adding to an ArrayList with Accounting Method

The copy array is twice the size of the initial array N. So we can think of the copy array as currently being 'half full', N/2, and we have to budget the remaining N/2 half for THREE operations: insertion, move an element once, and move an element more than once. ** This doesn't really make any sense - review lecture 18

Certificate

The extra info an algorithm needs to solve it in polynomial time. If a yes-or-no decision problem has a certificate that can be used to prove a 'yes' in polynomial time, the problem is in NP. As long as you can check a problem in polynomial time, it is NP.

"Max Flow equals Min Cut"

The minimum cut is the sum of the capacities of the smallest edges which, when removed, separate the source from the drain.

Fibonacci Heap Potential

The potential of a Fibonacci heap is given by Potential = t + 2m where t is the number of trees in the Fibonacci heap, and m is the number of marked nodes.

NP Problems

The solver is allowed to try all possible choices at once, in parallel. If the problem is checkable in polynomial time, it is NP. The extra info possibly needed to solve a problem in polynomial time is called a certificate.

Node rule of flow

The total flow in must equal the total flow out, except at the source and the drain nodes.

What is Push-Relabel's runtime?

There are at most O(|V|^2|E|) nonsaturating pushes in Push-Relabel, so this is the runtime.

Decreasing and Deleting Nodes in Binomial Heaps

To decrease a key, just change the key and heapify-up as with normal heaps. To delete a node, decrease the value of the key to -infinity to bring it to the surface, then extract-min.

What is the maximum number of NONsaturating pushes we can do in a Push-Relabel?

To find the number of possible nonsaturating pushes, we can use a potential function. The heights - start at 0 - remain non-negative The number of nonsaturating pushes is bound by the total summed amount of height the nodes grew by.

Inserting into a Binomial Heap

To insert a node into a binomial heap: 1. combine the inserted element with Bo 2. Union the two binomial heaps Worst case O(log n) - but we don't really have to do log n work EVERY time we insert something. The Amortized Cost of Insert is O(1).

Insertion into a B-tree

To maintain the 2t children limit: on insertion, while recurring, if the child we will recur on is full, split it before entering. If the root is full, make a new root to put the median in before splitting. To split elsewhere, take the median value and make it a key in its parent, then remove the values after that and put them in their own child, separate from the values before. Once you reach a leaf, insert it in the correct order. If we're recurring down the tree doing this, the parent is guaranteed to have room for a new key. Deletion in B-trees is possible but too complicated to learn here.

Finding Disjoint Paths on an undirected graph

Turn the graph into a directed graph with edges going in both directions.

What is the maximum number of saturating pushes we can do in a Push-Relabel?

We can do O(|V||E|) saturating pushes

Radix Sort

We can sort strings in O(N) time (where N is the sum of all string lengths) using this strategy: For each string, insert it into the trie. (Requires linear time for the number of characters in the string.) Once all strings are in, do a pre-order traversal (root, left subtree, right subtree) of the tree to print in lexicographic* order. (sorting short strings before long, then alphabetically) This strategy also works for sequences of bits.

Why doesn't it work to send the flow down the pipes with the greatest capacity?

We cannot just greedily send flow down the biggest pipes, because sometimes more flow can get through if we split it across multiple edges, and we need the ability to reverse our decisions.

Multipop with Accounting method

We pre-assign a cost of 2 to push, a cost of 0 to pop, and a cost of 0 to multipop. So with push, we're essentially 'prepaying' for the other operations. So with the worst case of N push/pop/multipop calls, we get 2N = O(N) operations, or an amortized cost of O(2N)/N = O(2) = O(1) operations per call.

What is a balanced search tree?

When the number of nodes in the left and the number of nodes in the right differ by at most 1.

How many keys can a depth 2 B-tree with 1000 keys at each node store?

With 1000 keys at each node, a depth 2 tree can store 1000 + 1001 * 1000 + 1001 * 1001 * 1000 = over 1 billion keys, needing only at most 2 seeks to retrieve any key

Fibonacci Heaps

a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than the binary heap and binomial heap. Find Minimum - O(1) Extract Minimum - O(log n) Insert - O(1) Delete - O(log n) Merge - O(1) A Fibonacci heap is a collection of trees satisfying the minimum-heap property - the child is always greater than or equal to the parent. So the minimum is always at the root of one of the trees. Siblings are linked to each other in circular doubly-linked lists

Amortize Definition

gradually eradicate a debt by putting money aside regularly. "Amortized analysis" is when you argue that the runtime (cost) is not as bad as it seems, because the operations rarely take that long - you're paying them off bit by bit.

Pseudopolynomial runtime vs Polynomial runtime for Max Flow

pseudopolynomial: O(|E|C) good for small C polynomial: O(|E|^2log C)

Min Heap

the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node.

What is the runtime of : MergeSorting an Array of integers , reversing it in linear time, then printing all possible pairs of numbers from the array?

Θ(N^2)

Potential Function on Push-Relabel

Φ(G) = Σ h[v] over all nodes with excess flow e[v], where h[v] is the height. Φ(G) can be thought of as the amount of disorder in state G, or its distance from an ideal state. Σ h[v] is the sum of the heights of overflowing nodes. The function Φ(G) = Σ h[v] starts at 0 and remains non-negative A nonsaturating push decreases Φ(G) by at least 1: All the overflow flowed downhill toward a node with 1 less height; so potential drops by 1. We can therefore bound the number of nonsaturating pushes by the total amount Φ(G) ever rises. Φ(G) increases in two ways: - relabels - pushes that result in overflow

If I wanted to claim that a running time was no better than linear (but is possibly linear), which Greek letter should I use for my asymptotic bound?

Ω


संबंधित स्टडी सेट्स

General Federal Regulation of Drugs Part 1

View Set

BIOLOGY - UNIT 5: GENETICS: GOD'S PLAN OF INHERITANCE QUIZ 1-4

View Set

Section 7: Contracts Used in Texas Real Estate

View Set

MacroEconomics Exam 3 (Ch. 8-10)

View Set

Sleep/Rest (knowledge check answers ch.33)

View Set

That, which, who (grammar final)

View Set

Pharmacology Ch. 30 - Macrolides, Tetracyclines, Aminoglycosides, and Fluoroquinolones

View Set

3.3.7 (Demonstrative Adjectives)

View Set