CSCI 3104 Review
Simplex
#O is often polynomial Solves linear program. Start in slack form. 1. Choose a non-basic variable v.e with positive coefficient in obj function 2. Find constraint in which v.e has tightest positive bound (basic variable on left side of that constraint is 'leaving variable' v.l) 3. Solve for v.e in constraint with v.l 4. Substitute v.e into obj function and other constraints 5. New v.e form will replace the v.l form in the new slack form Repeat until all coefficients in obj function are negative, solve for objective value
Dijkstra's (greedy)
#O((V+E)logV) #O(VlogV+E) if Fibonacci heap #O(ElogV) if dense graph Finds SSSP. Start at source. Repeatedly pick a vertex u s.t. u has minimum SP estimate, and relax all edges connected to u. Use priority queue. No negative edge weights.
Johnson's
#O((V^2) logV + VE) Finds all pairs shortest path. Uses reweighting to turn negative edge weights into positive edge weights. Reweighting must preserve shortest paths.
Chaining (hash table)
#O(1) for insert, delete #O(n) for search Add all elements that hash to the same slot to a doubly linked list. Expected length of any chain is n/m, where n is number of items and m is number of slots
Hash table
#O(1) for insert, search, and delete Look up table with hash function. A hash function deterministically maps items to slots.
Direct address table
#O(1) for insert, search, and delete Table where every slot corresponds to a certain key (no two elements have same key). Unfeasible if you have many items to put into table, or a wide range of keys.
Ford Fulkerson
#O(E |f*|) where f is max flow Gets max flow on graph G. While an augmenting path exists in Gf: pick any augmenting path p in Gf, find the bottleneck edge (edge with smallest residual capacity). Add that residual capacity to all forward edges in p in G, and subtract the residual capacity from all backward edges in p in G. Adjust Gf accordingly.
Prim's
#O(ElogV) Finds an MST. Build a single tree A starting at some root r. Find a lightest edge (u,v) that connects A to an isolated vertex, add that edge and vertex. Repeat until all vertices added.
Kruskal's
#O(ElogV) #O(EalphaV) if edges already sorted Finds an MST. Every vertex starts as its own component. Sort all edges by weight (min heap). Loop through sorted edges and connect two components if edge is safe (crosses a cut).
Finding SCCs
#O(V+E) Do topological sort on G via DFS. Then do DFS on G' in topologically sorted order.
DAG SSSP algorithm
#O(V+E) Finds SSSP. Topologically sorts, and for each vertex u in topological order, relaxes every edge (u, v) where v is a neighbor of u.
Breadth first search
#O(V+E) Implement with queue
Depth first search
#O(V+E) Implement with stack. Used in topological sort.
Topological sorting
#O(V+E) Use DFS on a DAG. Orders vertices in decreasing order of finishing time.
Bellman-Ford
#O(VE) Finds SSSP. Allows negative edge weights. Relaxes all edges V-1 times. Determines if any negative edge weight cycles exist, and v.d and v.pi for all vertices. Allows negative edge weights.
maxHeapify()
#O(logn)
buildMaxHeap()
#O(n) for i = floor(n/2):1: maxHeapify(A, i)
Floyd-Warshall (dynamic)
#O(n^3) when not naive Finds all pairs shortest path. Utilizes intermediate vertices. Memoizes, and use predecessor matrix.
Heap sort
#O(nlogn) 1. buildMaxHeap 2. run maxHeapify n-1 times
Merge sort (divide and conquer)
#O(nlogn) Divide array A into two subarrays of equal length, repeat until size of each subarray is 1. Merge them back together in sorted fashion.
Huffman encoding (greedy)
#O(nlogn) Lossless data compression via greedy algorithm. How to make huffman tree: take two characters with smallest values (lowest frequencies), add their values to get value of their parent, continue until all characters in tree. Going left in tree corresponds to a 0 in prefix code, going right corresponds to a 1.
Binary search tree
#Theta(logn), O(n) for insert, search, delete Every left child < parent, every right child > parent. Expected height = logn, worst case height = n
Quick Sort
#worst case O(n^2) #best/average case O(nlogn) 1. choose pivot (random is best) 2. partition array into two subarrays s.t. numbers left of pivot are <= and numbers right of pivot are >= 3. call quicksort recursively on each subarray
Minimum cut (flow)
(s,t) cut with smallest capacity of all possible cuts. Made up of min cost edges to disconnect s and t.
Steps for solving greedy algorithms
1. Cast optimization problem into a form where we make a choice and consequently leave a smaller subproblem. 2. Prove that the greedy choice is safe, i.e. prove that we can always achieve an optimal solution by making the greedy choice. 3. Prove that the problem exhibits optimal subtructure, i.e. after making the greedy choice, the remaining subproblem has the property that combining the greedy choice and the optimal solution to the subproblem yields an optimal solution to the original problem.
Primal to dual forms (linear programming)
1. Change max to min 2. Change any <= to >= (except in non-negativity constraint) 3. Swap coefficients in obj function with constants in constraints
Three ways to solve a recurrence relation
1. Expansion/Unrolling 2. Master Theorem 3. Recursion Tree
Steps for solving dynamic programming problem
1. Figure out the structure of the problem (overlapping subproblems?) 2. Recurse through subproblems, storing local solutions as necessary. 3. Compute value of optimal solution, in bottom-up fashion. 4. Construct the actual optimal solution, not just the optimal value
When to use dynamic programming
1. Local choice needs to be made 2. Polynomial number of subproblems 3. Optimal substructure
Standard form (linear programming)
1. Objective function must be max (if not, multiply coefficients in obj function by -1) 2. All constraints must be <= (if not, multiply constraint by -1) 3. All variables must be in non-negativity constraint (if not, replace x with x'-x'' and add x' and x'' to non-negativity constraint) 4. No equalities (if there are, replace with >= and <=)
Slack form (linear programming)
1. Set obj function equal to some variable Z 2. Move all variables onto right side of inequalities, and fill slots with basic variables 3. All variables are now subject to non-negativity constraint
Upper bound property (shortest path)
1. v.d >= d(s, v) 2. Once v.d reaches d(s, v), it never changes
Random fact 2 (graphs)
A directed graph G is acyclic iff a DFS on G produces no back edge
Strongly connected component
A maximal set of vertices s.t. you can get to any one vertex from any other vertex
Cut (of an undirected graph)
A partition of V into disjoint sets S and V-S
Vertex cover (of an undirected graph)
A subset V' of V s.t. every edge in the graph touches one of the vertices in V'.
Clique (graph)
A subset V' of V s.t. every vertex in V' is fully connected to every other vertex in V'.
Algorithm
A well defined computational procedure to transform some input(s) into some output(s)
2 ways to represent graphs
Adjacency list + adjacency matrix
Capacity constraint (flow)
Any edge cannot have a flow greater than its capacity.
Simple uniform hashing (hash table)
Any element is equally likely to be hashed into any of the m slots
Shortest path optimal substructure
Any subpath on a shortest path will itself be a shortest path. Shortest path cannot contain cycles.
Hoare partitioning
Choose a pivot, check values from both ends, working towards center. Generally better than Lomuto.
Lomuto partitioning
Choose a pivot, check values from left to right
Universal hashing (hash table)
Choose hash function randomly
Divide and conquer
Divide main problem into subproblems. Conquer (solve) each subproblem. Combine together.
Weak linear programming duality
Dual solution is an upper bound to the primal solution
Linearity of expectation
E(X + Y ) = E(X) + E(Y ), for random variables X and Y
Expected value (discrete)
E(X) = Sum(x.value * x.prob) for all x in X where E(X) is expected value, and x is an element in the set X of possible options
Conservation of flow
Every non-source/sink vertex has a net flow of 0 (has the same amount of flow coming in and going out)
Recursion tree
Expand recursion into tree. Find total cost for each level, and sum these costs. Hope it turns into a geometric series (formula below). Sum = (r^(n+1) − 1) / (r − 1) .
Longest common subsequence (dynamic)
Find longest string of characters that occur in both X and Y with 0 or more characters in between. Has optimal substructure.
Properties of an algorithm
Finiteness: terminates after finite number of steps Definiteness: steps are precise and unambiguous Correctness: must correctly give output Efficiency: should be efficient
Triangle property (shortest path)
For all (u, v): d(s, v) = d(s, u) + w(u, v)
Edge classifications
For any edge (u, v) Tree edge: If v is visited for first time Back edge: If v is ancestor of u Forward edge: If v is descendant of u Cross edge: None of the above
Single source shortest path
From source vertex, find minimum distance and predecessor for every other vertex. Related: single destination shortest path, single pair SP, all pairs SP. No negative edge weight cycles!
Graph
G = (V, E) = collection of vertices and edges
Finding optimal dual solution
Get final slack form from primal. Any variable subscripts in this form make up set of non-basic variables N. y.i = -1 * coefficient of x.(i+1) if |N|+i is in the set N. (The x.(i+1) is from the final slack form)
Master theorem assumptions
Given T(n) = aT(n/b) + f(n): a is number of subproblems the original splits into b is size of subproblem f(n) is cost of each level of recursion a >= 1, b > 1
Subset-sum problem (NP)
Given a finite set S of positive integers and a target t>0, does there exist a subset S' whose elements sum to t? To prove NP hard, reduce instance of 3-CNF-SAT to subset-sum problem.
Is it a max flow?
Given a graph G - look at the residual graph Gf. If there are no feasible paths from s to t in Gf, then G has max flow. If there is a path from s to t in Gf (an augmenting path), G doesn't have max flow.
Residual graph (flow)
Given a graph G, the residual graph Gf has edges Ef, where Ef is the set of edges that are not saturated in G (residual capacity is positive)
Minimum spanning tree (greedy)
Given a weighted undirected graph, choose the set of edges that connects every vertex and minimizes the total edge weight. Has |V|-1 edges. Not necessarily unique. While not all vertices are a part of spanning tree, choose lightest safe edge (i.e. no cycles)
All pairs shortest path
Given weighted directed graph, find SP between all pairs of points. Using Dijkstra's is O(V^3), using Bellman Ford is O(V^4).
2 parts of a greedy algorithm
Greedy choice property: choosing local optima will result in global optimum Optimal substructure: an optimal global solution will contain optimal solutions to subproblems.
Fractional knapsack problem (greedy)
Have choice to steal fraction of an item. Greedy is optimal.
0 1 Knapsack problem (greedy)
Have choice to steal or not to steal an item. Greedy not optimal.
Karatsuba's (divide and conquer)
Helps with polynomial multiplication. Turns O(n^2) problem into O(n^(log2 3)).
Linear probing (hash table)
If a slot is full, add some constant c to the hash value until empty slot is found.
Quadratic probing (hash table)
If a slot is full, add some square to the hash value until empty slot is found (i.e. add 1, then 4, then 9...)
No path property (shortest path)
If d(s,v) = infinity, then v.d = infinity always
Convergence property (shortest path)
If s -> u -> v is a SP, and if u.d = d(s, u), then by calling relax(u, v, w), v.d = d(s, v)
Random fact 1 (graphs)
In a DFS of undirected graph G, every edge in G is either a tree edge or a back edge
NP complete (complexity)
Intersection of NP and NP hard. No polynomial algorithm has yet to be found. If one can be solved in polynomial time, then all can be solved in polynomial time. A problem is shown to be NP complete if it can be reduced down to already known NP complete problems. Applies to decision algorithms.
Vertex cover problem (NP)
Is there a vertex cover of minimum size k? Show that it is NP by verifying a solution. Show it is NP hard by reducing an instance of the clique problem into a vertex cover problem.
Open addressing (hash table)
Items put into hash table from beginning to end (slot 1, slot 2, ...). Fewer collisions and faster retrieval than chaining b/c it doesn't spend memory on linked lists. Probe from beginning to end.
Skip list
Layers of linked lists, searches in binary fashion
Path relaxation property (shortest path)
Let p be a shortest path from v0 to v100 (100 is just any arbitrary subscript). If we relax all the edges on the path, then v100.d = d(s, v100).
Flow network
Modeled by directed graph. Source vertex s and sink vertex t.
3-CNF-SAT (NP)
NP complete problem. Decide wheter a 3-CNF problem can result in a 1 (true). k-CNF = k-conjunctive normal form: the AND of clauses of ORs of exactly k variables and their negations
Boolean satisfiability
NP complete problem. Decides whether or not a string of boolean variables chained together with AND, OR, and NOT operations can return a 1 (true).
Clique problem (NP)
NP complete. Try to find size of maximum clique in a graph. 1. Show that it is NP (give it a sample answer and check in polynomial time) 2. Show it is NP hard (reduce 3-CNF-SAT, a NP complete problem, to clique problem)
NP (complexity)
Non-deterministic polynomial. A solution can be verified in polynomial time.
Linear programming duality
Optimal primal solution = optimal dual solution
Linear programming
Optimize a linear function (objective function) subject to some set of inequalities.
Independent events
P(X = x and Y = y) = P(X = x)P(Y = y) P(X = x or Y = y) = P(X = x) + P(Y = y)
Traversing a BST
Pre order: root -> left -> right In order: left -> root -> right Post order: left -> right -> root
Polynomial time (complexity)
Run time of O(n^k) where k is some constant.
Weighted interval scheduling (dynamic)
Same as greedy activity scheduler but with weights. Has optimal substructure. OPT(j) = max(j.weight + OPT(p(j)), OPT(j-1)) where p(j) is the largest index s.t. it is incompatible with j. Can do recursive or iterative solution.
Reductions (complexity)
Shows that two problems are equally hard. Transform a, an instance of A (a problem we want to solve), into b, an instance of B (a problem we know the complexity of). Transformation should be polynomial time. If we show the problems are related, and we know B is NP complete, the A could also be said to be NP complete.
Activity scheduling problem (greedy)
Sort activities by finish time. Always choose the activity with the earliest finishing time s.t. its start time doesn't overlap with the previous activity.
Recurrence relations
Specify the nth value in a sequence given based on previous value(s) in the sequence
Number of bits to encode via Huffman encoding?
Sum(c.freq * c.codelength) for all c in set of characters C
Condensation graph (SCC)
Take a graph G and reduce any SCCs into a single vertex.
Light edge (crossing a cut)
The edge with minimum weight crossing a cut
Residual capacity (flow)
The residual capacity for an edge (u,v) is: (u,v).residual = (u,v).capacity - (u,v).flow if (u,v) in E (u,v).residual = (u,v).flow if (v,u) in E (u,v).residual = 0 otherwise
Max-flow min-cut theorem
The value of the maximum flow equals the value of the minimum cut.
Treap
Tree + heap. Every node has a key and a priority (which is randomly assigned). Sorts by priority. Expected height = logn.
Rod cutting (dynamic)
Try to get maximum revenue from cutting (or not cutting) a rod. Has optimal substructure. Naive way is O(2^n), smart way is Theta(n^2)
Double hashing (hash table)
Two hash functions, h1 and h2. If h1(k) hashes to a slot that is full, calculate h2(k) and use that as the offset.
Unrolling a recurrence
Unroll until you find a pattern. Apply base case (set contents of T from general pattern equal to contents of T from base case, solve for k). Plug k into general pattern, find big O.
Dynamic programming
Use memoization to store solutions to subproblems. Time/space trade off.
Max heap property
Value of parent node is greater than value of both child nodes.
Big Theta
c1g(n) ≤ f (n) ≤ c2g(n) ∀n ≥ n0 for some positive constants c1, c2, and n0
Big O
f (n) ≤ cg(n) ∀n ≥ n0 for some positive constants c and n0
Big Omega
f (n) ≥ cg(n) ∀n ≥ n0 for some positive constants c and n0
Division method (hash table)
h(k) = k mod m Remainder of k/m m shouldn't be close to a power of 2
Multiplication method (hash table)
⇒ h(k) = floor(m(kA − floor(kA))). Multiply key k by some constant A, extract the fractional part of kA, multiply that by m, and finally floor the result.