CPE 349: Midterm 2
Activity Selection Problem: Greedy Complexity
*O(N) for calculating* O(N × log N) for pre-sorting
optimization problem
A problem where the goal is to find a solution that gives the optimal/best result.
dynamic programming
An algorithm design technique that saves sub-problem results as they are computed and then later uses them.
What is the greedy choice property?
An algorithm exhibits the greedy choice property if the algorithm makes a greedy choice that always selects an element that belongs in some optimal solution.
prefix code
An encoding scheme where no code-word appears as a prefix of another one.
Huffman Coding Problem: Iterative Complexity
O(N × log N)
Activity Selection Problem: Dynamic Programming Complexity
O(N³)
Knapsack 0-1 Problem: Recursive Definition
V[ i, w ] = • 0, if i = 0 • V[ i - 1, w ], if i > 0 and m_i > w • max( V[ i - 1, w ], V[ i - 1, w - m_i ] + p_i ), if i > 0 and m_i ≤ w
What is the benefit to pre-sorting the nodes for the Huffman Code algorithm?
We can find the minimum frequency subtrees in O(log N) time instead of O(N) time for linear search. This results in an overall complexity of O(N × log N) instead of O(N²).
Activity Selection Problem: Iterative Approach
*ActivitySelection_iterative(s, f)* n = s.length A = { 1 } k = 1 for i = 2 to n --if s[ i ] >= f[ k ] ----A = A ∪ {i} ----k = i return A
Huffman Coding Problem: Iterative Approach
*Huffman(C, F)* n = C.length Define PQ as a priority queue/heap for i = 1 to n --Create new leaf-node l with character C[ i ] and frequency F[ i ] --PQ.insert(l) for i = 1 to n - 1 --x = PQ.deleteMin --y = PQ.deleteMin --Create new node z --z.left = x --z.right = y --z.frequency = x.frequency + y.frequency --PQ.insert(z) return PQ.deleteMin
Fractional Knapsack Problem: Iterative
*Knapsack_Fractional(W, M)* Define X[ 1..M.length ] list and initialize to 0 i = 1 w = W while w > 0 --if M[ i ] <= w ----X[ i ] = 1 ----w = w - M[ i ] ----i = i + 1 --else ----X[ i ] = w / M[ i ] ----w = 0 return X
Knapsack 0-1 Problem: Print Solution
*Knapsack_Problem(P, M, W)* (opt, B) = Knapsack_Value(P, M, W) print "The optimal value is " + opt. w = W for i = N downto 1 --if B[ i, w ] == 1 ----print i ----w = w - m_i
Knapsack 0-1 Problem: Bottom-up Approach
*Knapsack_Value(p, m, W)* N = p.length Define V[ 0..N, 0..W ], B[ 1..N, 1..W ] for w = 0 to W --V[ 0, w ] = 0 for i = 1 to N --for w = 0 to W ----if m[ i ] > w ------V[ i, w ] = V[ i - 1, w] ------B[ i, w ] = 0 ----else ------if V[ i - 1, w ] > V[ i - 1, w - m_i ] + p[ i ] --------V[ i, w ] = V[ i - 1, w] --------B[ i, w ] = 0 ------else --------V[ i, w ] = V[ i - 1, w - m_i ] + p[ i ] --------B[ i, w] = 1 return ( V[ N, W ], B )
Longest Common Subsequence Problem: Print Solution
*LCSProblem(X, Y)* (C, B) = LCS_DP(X, Y) print "Length of LCS is " + C[ m, n ] LCS_PrintSolution(B, X, m, n) *LCS_PrintSolution(B, X, i, j)* if i > 0 and j > 0 --if B[ i, j ] == "d" ----LCS_PrintSolution(B, X, i - 1, j - 1) ----print x_i --else if B[ i, j ] == "u" ----LCS_PrintSolution(B, X, i - 1, j) --else ----LCS_PrintSolution(B, X, i, j - 1)
Longest Common Subsequence Problem: Bottom-up Approach
*LCS_DP(X, Y)* m = X.length, n = Y.length Define C[ 0..m, 0..n ], B[ 1..m, 1..n ] for k = 0 to n --C[ 0, k ] = 0 for k = 0 to m --C[ k, 0 ] = 0 for i = 1 to m --for j = 1 to n ----if X[ i ] == Y[ j ] ------C[ i, j ] = C[ i - 1, j - 1 ] + 1 ------B[ i, j ] = "d" ----else ------if C[ i - 1, j ] > C[ i, j - 1 ] --------C[ i, j ] = C[ i - 1, j ] --------B[ i, j ] = "u" ------else --------C[ i, j ] = C[ i, j - 1 ] --------B[ i, j ] = "l" return C and B
Matrix-chain Multiplication Problem: Print Solution
*MatrixChainProblem(P)* (M, S) = MatrixChain_DP(P) print "Optimal cost is " + M[ 1, n ] MatrixChain_PrintSolution(S, 1, n) *MatrixChain_PrintSolution(S, i, j)* if i = j --print "A" + i else --print "(" --MatrixChain_PrintSolution(S, i, S[ i, j ]) --MatrixChain_PrintSolution(S, S[ i, j ] + 1, j) --print "("
Matrix-chain Multiplication Problem: Bottom-up Approach
*MatrixChain_DP(P)* n = P.length - 1 Define M[ 1..n, 1..n ], S[ 1..n, 1..n ] for i = 1 to n --M[ i, i ] = 0 for l = 2 to n --for i = 1 to n - l + 1 ----j = i + l - 1 ----M[ i, j ] = ∞ ----for k = i to j - 1 ------x = M[ i, k ] + M[ k + 1, j ] + P[ i - 1 ] × P[ k ] × P[ j ] ------if x < M[ i, j ] --------M[ i, j ] = x --------S[ i, j ] = k return M and S
Matrix-chain Multiplication Problem: Specification
*Problem*: For the given matrix-chain, parenthesize the product to achieve the lowest cost (i.e. the least amount of scalar multiplications). *Input*: List P of n+1 integers [p_0 .. p_n] representing sizes of n compatible matrices in a chain [A_1 .. A_n] where A_i matrix has size p_(i-1) × p_i. *Output*: The minimal cost of multiplying the matrix-chain and a parenthesization of the product that minimizes cost.
Activity Selection Problem: Specification
*Problem:* Find a max-size subset of compatible activities in a given set of activities. *Input:* A set of pairs representing the start and finish times of n activities in a set S = {(s_1, f_1), (s_2, f_2), ... , (s_n, f_n)}. *Output:* A set of numbers { i_1, i_2, ... , i_k }, where 0 < k ≤ n, representing the activities that are included in a max-size subset of mutually compatible activities of S.
Knapsack 0-1 Problem: Specification
*Problem:* For a given set of N items of different weights and values, find a subset of items that has the highest combined value and is within the given weight limit. *Input:* Weight limit W > 0 and two sequences of positive numbers: P = < p_1, p_2, ... , p_n > with the value and M = < m_1, m_2, ... , m_n > with the weight of N items. *Output:* A sequence < i_1, i_2, ... , i_k > of indexes, where k ≤ N, representing the k items that have the highest combined value and their combined weight is ≤ W.
Longest Common Subsequence Problem: Specification
*Problem:* For the given two sequences find the longest common subsequence. *Input:* Two sequences X = < x_1, x_2, ... , x_m > and Y = < y_1, y_2, ... , y_n > with m and n elements respectively. *Output:* Longest common subsequence Z = < z_1, z_2, ... , z_n > of X and Y, and a k-value.
Huffman Coding Problem: Specification
*Problem:* Given a character-file, generate an optimal variable-length code by assigning shorter code-words to the most frequent characters. *Input:* A set of n 1-node subtrees for each character, where each node contains the character and the character's frequency. *Output:* A full binary tree representing the optimal prefix code for that character set.
Which structure can be used to represent a prefix code?
A *full binary tree* can be used to represent a prefix code. The leafs contain the character used in the original data. The edges represent bits within the prefix code. If the edge connects to a left child, the edge is labeled with a 0. If the edge connects to a right child, the edge is labeled with a 1.
When is a prefix code considered optimal?
A prefix code is considered optimal *when it is represented by a full binary tree*: each node is either a leaf or has two children. The tree has *|C| leaves and |C| - 1 internal nodes*, where C is the matrix of original characters.
What does it mean for a problem to have an optimal substructure?
A problem exhibits optimal substructure if *the optimal solution to the problem contains optimal solutions to sub-problems*.
greedy algorithm
An algorithm that builds up a solution to an optimization problem step by step, at each step making a choice that gives the most immediate benefit (seems the best at the time). The hope is that the sequence of locally optimal choices will lead to a globally optimal solution.
When is an optimization problem eligible for a greedy solution?
An optimization problem is eligible for a greedy solution when the problem exhibits an *optimal substructure*.
Activity Selection Problem: Optimal Substructure
Assume A is an optimal solution to the activity selection problem. This means A is a max-size subset of mutually compatible activities of S. Take one activity a_k from A. Since the activities within A do not overlap, we can write A as A = A1 ∪ a_k ∪ A2, where A1 and A2 are disjoint subsets of A. A1 contains all activities in A that end before a_k starts, and A2 contains all activities in A that start after a_k finishes. From this definition of A, we can see that | A | = | A1 | + | A2 | + 1. Looking at the set S of activities, we can observe that any activity that overlaps with a_k cannot be in the A optimal solution, therefore we eliminate those conflicts and take the remaining activities of S. We split S into S1, a subset of activities that end before a_k starts, and S2, a subset of activities that start before a_k finishes. A1 is an optimal solution for S1 and A2 is an optimal solution for S2. If there were some subset X that is a better solution than A1, then | X | > | A1 |. This means that if we replace A1 with X, we can get a better solution to the problem than A because | X | + | A2 | + 1 > | A1 | + | A2 | + 1. This is a contradiction because we assumed A is an optimal solution. Symmetrically, we can prove that A2 is an optimal solution for S2 as well. Thus, an optimal solution of the problem is constructed from optimal solutions of sub-problems.
Longest Common Subsequence Problem: Optimal Substructure
Assume Z = < z_1, ... , z_k > is an optimal solution to LCS: Z is the longest common subsequence of X and Y. 1. *If x_m = y_n*, that means z_k = x_m = y_n. Z can be obtained from Z_(k - 1) by attaching the x_m = y_n element. Let Z_(k - 1) by the longest common subsequence of X_(m - 1) and Y_(n - 1). If there is a longer subsequence W, then we can simply attach the x_m = y_n element. This contradicts the assumption that Z is the longest common subsequence for X and Y. 2. *If x_m ≠ y_n*: a) If z_k ≠ x_m, then X's last element is irrelevant and Z is actually a subsequence of X_(m - 1) and Y. If Z is not a longest common subsequence of X_(m - 1) and Y, then there is a longer common subsequence W for X and Y. This contradicts the assumption that Z is the longest common subsequence for X and Y. b) If z_k ≠ y_n, then Y's last element is irrelevant and Z is actually a common subsequence of X and Y_(n - 1). If there is a longer common subsequence, then we can just replace Z with the longer common subsequence. This contradicts the assumption that Z is the longest common subsequence for X and Y.
What are the two methods of dynamic programming?
Bottom-up (Iterative) Top-down (Recursive Memoization)
Matrix-chain Multiplication Problem: Computing an Optimal Solution
Fill in the diagonal of the M matrix with 0s. Start with the first empty spot in the left side of M for the diagonal that represents the matrix chains of length 2. Calculate the value at this spot with the recursive solution. Store the index k that provided the optimal solution in S[ i, j ] where i and j are the row and column of M that is being calculated. Continue this pattern with each empty spot in the diagonal, until you reach the right side of matrix M. Repeat for each diagonal until the desired value is found.
Knapsack 0-1 Problem: Computing an Optimal Solution
Fill in the first row of the V matrix and B matrix with zeros. Start with the first column in the second row of the V matrix. Begin calculating the values of V with the recursive definition of the LCS problem in row-major order. At each spot in V, fill in the same spot within matrix B with a 1 if the item is taken or 0 if the item was not taken.
Longest Common Subsequence Problem: Computing an Optimal Solution
Fill the first row and first column of matrix C with 0s. Starting with the first open spot in the top-left corner of C, calculate the value of that spot using the recursive definition of the LCS problem. Fill in the same spot in matrix B with the direction of the sub-problem solution that was used to calculate the current solution (i.e. "d" for the diagonal-value, "u" for up-value, "l" for left value). Continue calculating empty spots in C in row-major order (left to right).
Huffman Coding Problem: Precondition
For the Huffman code algorithm to be efficient, the input subtrees should be *sorted in increasing order by frequency*.
Activity Selection Problem: Computing the Solution
Let A be a set containing the first activity in the pre-sorted set S of activities. Iterate through S, starting at index 2, until you find an activity that starts after the last activity in A. If you find such an activity, add it to A. Repeat until there are no activities left in S to look at.
Activity Selection Problem: Greedy Choice Property
Let A be an optimal solution to the Activity Selection Problem that is in order by finish time. The first activity in A is some activity k belonging to the set S of activities. If k = 1, then optimal solution A begins with the greedy choice. If k ≠ 1, then there is an optimal solution B that begins with activity 1. Lets set B = (A - {k}) ∪ 1. This means that: • Activity 1 finishes earlier than others (f_1 ≤ f_k), therefore it will be compatible with all the activities in B that were compatible with the activity k before. Now B contains all mutually compatible activities. • B has the same number of activities as A. From this we can conclude that B is an optimal solution for the problem just as A is. Once the greedy choice of selecting activity 1 is made, the problem reduces to finding an optimal solution for a sub-problem over those remaining activities in S that are compatible with activity 1. After each greedy choice, we are left with an optimization problem of the same form as the original problem.
Matrix-chain Multiplication Problem: Optimal Substructure
Let A_i..j indicate a chain of matrices A_i to A_ j where i < j. For any k where i ≤ k < j, the optimally parenthesized solution to A_i..j is the product of two sub-products that are the results of multiplying sub-chains A_i..k and A_(k+1 .. j). Each sub-product *must* be parenthesized optimally in order for the solution to be optimal. If the solution to one of the sub-problems is not optimal, then there is a "better" solution for that sub-problem. If we replace the current solution with the "better" solution, we will bring the overall cost of the problem down. This implies that the solution we were analyzing was not optimal to begin with, which contradicts our original statement.
Knapsack 0-1 Problem: Optimal Substructure
Let Knapsack(n, w) represent a subproblem of the Knapsack problem regarding N items with a weight limit of W, where n ≤ N and w ≤ W. Assume that we have an optimal solution for the Knapsack(n, w) problem. When we consider the item n, it is either in the optimal solution or it is not. 1. If item n is not in the optimal solution, then this solution is also an optimal solution for Knapsack(n - 1, w). If there was a better solution for Knapsack(n - 1, w), then we could use that better solution for an even better solution for Knapsack(n, w). This contradicts our original assumption, therefore it cannot be true. 2. If item n is in the optimal solution, this this solution is obtained by adding item n to the solution of the Knapsack(n - 1, w - m_n) subproblem. If there was a better solution for Knapsack(n - 1, w - m_n), then we could use it for an even better solution for Knapsack(n, w). This contradicts our original assumption.
Knapsack 0-1 Problem: Reading the Solution
Let N be the number of items we are interested in and W be the weight limit we have. Let matrix B contain the data for the optimal solution and matrix M contain the weight of each item. Read the value in B[ N, W ]. If the value is a 1, write down the value of N and set W = W - M[ i ]. If the value is a 0, do nothing. Subtract 1 from N, regardless of the value in B. Repeat this process until N = 1.
Longest Common Subsequence Problem: Reading the Solution
Let m be the size of sequence X and n be the size of matrix Y. Start in slot B[ m, n ], reading the value present there. If the value is "d", we take note of the value stored in X[ i ], then move to the top-left spot adjacent to B[ m, n ]. If the value is "u", we move up a spot in B. If the value is "l", we move over one spot to the left within B. Repeat this process until we exit the valid bounds of the B matrix. The solution is the reverse order of the values recorded from X for each "d" value found.
Fractional Knapsack Problem: Computing the Solution
Let there be N items available to be put in the knapsack. Let P be the matrix containing the price of each item. Let M be the matrix containing the weight of each item. P and M are pre-sorted in order of decreasing price per pound. Collect items until there is no space left in the sack. Eventually you end up taking a fraction of an item to fill the remainder of the sack.
What are the limitations of greedy algorithms?
Not every greedy algorithm will guarantee an optimal solution: the greedy strategy does not work for all optimization problems.
Longest Common Subsequence Problem: Complexity
O(M × N)
Knapsack 0-1 Problem: Complexity
O(N × W)
Matrix-chain Multiplication Problem: Complexity
O(N³)
Huffman Coding Problem: Computing the Solution
Remove the two smallest subtrees from the set of input and "merge" them by providing a parent node. The parent node will have a frequency equal to the sum of the two child frequencies. The parent is inserted back into the set of input nodes, maintaining the increasing order by frequency. This process is repeated until there is a single node left in the set.
Fractional Knapsack Problem: Specification
Same as Knapsack 0-1, with the addition that now a fraction of an item can be taken.
Matrix-chain Multiplication Problem: Reading the Solution
The S matrix provides the k value that splits the A chain of matrices. For an array of matrices A_i..j , we would first poll S[ i, j ] for an optimal index k to split A. Split A into its sub-products A_i..k and A_(k + 1 .. j), surrounding them with parenthesis. This pattern would repeat with each sub-product until you are left with segments of A that are one element in size (i.e. A_1..1, A_2..2, ... , A_n..n).
When is it appropriate to use the dynamic programming approach on a problem?
The dynamic programming strategy can be used when the problem can be divided into *sub-problems* that are *overlapping*.
Why does the greedy algorithm strategy work for the Fractional Knapsack problem but not the Knapsack 0-1 problem?
The fractional knapsack algorithm can fill the entire bag while the knapsack 0-1 algorithm may not fill up the sack to max capacity, decreasing the overall value.
What two conditions must be present for a greedy algorithm to be correct?
The greedy programming strategy is correct when the optimization problem has an *optimal substructure* and exhibits the *greedy choice property*.
What is the pre-condition of the greedy algorithm for the Activity Selection problem?
The precondition for the greedy algorithm for the activity selection problem is that *the activity set S must be sorted in order of their finish times*. Note that sorting n activities will take O(N × log N) time.
What is the time-memory tradeoff of dynamic programming?
The tradeoff of increasing memory complexity in order to reduce runtime complexity.
What is the benefit to pre-sorting the activities for the Activity Selection algorithm?
We can find the earliest finishing activity in O(1) time rather than O(N) for linear search. This makes the algorithm take O(N) time rather than O(N²).
What does it mean for a problem to have overlapping sub-problems?
We say that a Problem P has overlapping sub-problems if it has two sub-problems P1 and P2 that share a common sub-problem P'.
When can we apply dynamic programming to an optimization problem?
When the problem has *overlapping sub-problems* and *optimal substructure*.
Longest Common Subsequence Problem: Recursive Definition
c[ i, j ] = • 0, if i = 0 or j = 0 • c[ i - 1, j - 1 ] + 1, if i > 0 and j > 0 and x_i = y_ j • max( c[ i - 1, j ], c[ i, j - 1 ] ), if i > 0 and j > 0 and x_i ≠ y_ j
Matrix-chain Multiplication Problem: Recursive Definition
m[i, j] = • 0, if i = j • min from i ≤ k < j of {m[i, k] + m[k + 1, j] + p_(i - 1) × p_k × p_ j}, if i < j