Alg Des Ch09: Intractable Problems
What are 3 options for dealing with NP-complete problems.
1) Algorithms fast in the average case like backtracking algorithms with substantial pruning (they seek the exact global optimum but might be too slow on certain types of inputs); 2) Heuristics like simulated annealing or greedy approaches, which give no guarantees; and 3) Approximation algorithms, which are more problem-specific and guarantee that the answer is near the optimum (although a heuristic might still do better, like investing your money in stocks instead of leaving it in the bank at guaranteed 3% interest).
How can the Longest Increasing Subsequence problem be reduce to, and solved using, Edit Distance?
Edit distance computes cost of transforming string S to string T using insertions, deletions, and substitutions. If we set T = sorted(S) and set the costs of insertions, deletions, and substitutions to 1, 1, and infinity, respectively, then it is the cost of sorting S with only insertions and deletions. If we divide this cost by 2, it was the number of deletions, and if we subtract this from the length S, then it is the number of elements of S that were already in sorted, that is, increasing order.
How can the Independent Set Problem be reduced to the problem of finding a Clique within a graph G?
In ISP we look for a subset with no edges between two vertices. In a clique we insist that there always be an edge between two vertices. So we take the complement of the edges, meaning we create a graph where edges are present only if they were absent in the original. A clique of vertices in this graph means that no edges existed between them in the original, which is the definition of an independent set.
What theory addresses the question of whether an efficient algorithm exists to solve a given algorithmic problem?
NP-completeness.
How can an algorithm for finding the Convex Hull of a set of points be used to sort a set of numbers?
Place the point x at (x,x^2) since x^2 is a convex function. Then find the convex hull and read off the x values of the points in sequential order.
What is the 3-SAT problem?
Same as Satisfiability, but each clause has exactly 3 variables.
What does Euclid's Algorithm calculate?
The Greatest Common Divisor (Factor) between two numbers.
Explain Euclid's algorithm finding GCD(A,B), where A>B.
The main idea of the algorithm is that if a number divides both A and B, then it must also divide their difference, since A can be expressed as A = [A-B] + B. This idea is used to iteratively reduce the size of the larger of the two numbers. Specifically, A is expressed as A = kB + R for some integers k and R, where R<B is the residual, which let's us write GCD(A,B) = GCD(B,R). This is repeated with B the new A and R the new B. Eventually at some point R will divide B, at which point GCD(B,R) = R. And by the iteration, R = GCD(A,B).
If problem A is hard and we wish to show that problem B is also hard, should we reduce A to B or B to A? Skiena says this is tricky.
We wish to reduce A to B, meaning specify how to transform ANY input to A into an input to B. A solution to B would imply that A was not hard and thus violate our assumption. You would think that since A is hard, to show B is hard we just need to reduce B to A. But this just gives us a slow algorithm for solving B, not necessarily a lower bound.
If a reduction from problem A to problem B takes O(P(n)) time, what are 2 possible implications?
1) Algorithms for solving B can be applied to A, with translating and running times adding; and 2) an Omega lower bound on A can't be violated by translating to B, solving, and translating back. In other words, 1) if solving B is O(P'(n)), then solving A is O(P(n) + P'(n)); 2) if Omega(P'(n)) is a lower bound for A, then Omega(P'(n) - P(n)) must be an Omega lower bound for B, that way when you add in the cost of reducing you get back Omega(P'(n)) and don't violate it.
What are Skiena's 4 hard problems that he uses as sources when trying to prove the hardness of other problems? Why each?
1) Integer Partition for problems whose hardness seems to require using large numbers. 2) Vertex cover for any graph problem whose hardness depends upon "selection". 3) Hamiltonian path for graph problems whose hardness depends upon "ordering". And 4) 3-SAT when the others are not appropriate.
What are 7 tips when trying to prove that a given algorithmic problem is hard?
1) Look through Garey and Johnson's book for the problem. 2) Make your source problem as simple as possible. 3) Make your target problem as hard as possible. 4) Select the right source problem for the right reason. 5) Amplify the penalties for making the undesired selection. 6) Think strategically at a high level, then build gadgets to enforce tactics. And 7) when you get stuck, alternate between looking for an algorithm and a reduction.
What are 3 important things to note about using approximation algorithms?
1) Making a heuristic more complicated does not necessarily make it better; 2) A post-precessing cleanup step can't hurt; 3) The important property of approximation algorithms is relating the size of the solution produced directly to a lower bound on the optimal solution, which therefore limits how badly we might perform. This relates back to (1), since the more simple the heuristic, the more easily we can relate it to the optimal solution. Bottom line: simple approximation algorithms are easier to analyze and many times are better; add post-processing steps for better in-practice results.
What are 3 important properties of the proof of the hardness of Integer Programming via reduction to 3-SAT?
1) The reduction preserved the structure of the problem but did not attempt to solve the problem, just change it's format. 2) The possible IP instances that can result from the reduction are only a small subset of all IP instances. However, since some of them are hard, the general problem must be hard. Perhaps this means that IP is more flexible than 3-SAT. 3) The transformation captures the essence of why IP is hard, namely, satisfying a set of constraints is hard.
What is the Inequivalence of Programs with Assignments problem?
A Program is a sequence of assignments of the form x0 = if (x1 = x2) then x3 else x4. Assign x0 to one of two other variables based on the possible equivalence of two other variables. The problem is, if given a set of variables X, and set of possible variables V, and two programs P1 and P2, does there exist a set of initial assignments of values from V to variables in X such that the programs produce different results, i.e., different sets of final assignments? Are the two programs equivalent or not?
What is a Clique?
A complete subgraph, meaning there is an edge between every pair of points. They are maximally dense.
What does it mean for algorithmic problems to be P or NP?
A problem is in class P ("polynomial") if a polynomial-time ("fast") algorithm has been discovered to solve it. A problem is in class NP ("Non-deterministic polynomial-time" or "Not-necessarily Polynomial-time") if a given solution to the problem can be verified in polynomial time. All P problem are NP, but is is not known if all NP problems are P. Phrased differently, P means that discovering a solution is fast; NP means that verifying a solution is fast. It is not known if solution discovery is fundamentally harder than solution verification. Most scientist believe discovery IS fundamentally harder, meaning class P does not equal class NP.
What is an algorithmic Reduction?
A translation of instances from one type of problem to instances of another such that the correctness of answers are preserved. They allow efficient algorithms, or proofs of their nonexistence, to be translated as well. Note that they don't have to be reversible: a reduction from A to B means that algorithms for solving B can be applied to A, but not the other way around. Note that if A reduces to B, meaning an algorithm for B can be used to solve A, then if A is hard then B must be hard as well. Otherwise the hardness of A could be subverted.
Why does a proof that the Hamiltonian Cycle problem (HCP) is hard also prove that TSP is hard?
Because HCP reduces to TSP; specifically, if an efficient algorithm for TSP existed, it could be used to solve HCP, which would violate our proof that HCP is hard.
How can a Hamiltonian Cycle be found on a graph G using a Traveling Salesman algorithm?
Construct a complete (edge between every pair of vertices) graph from the vertices in G, with original edges having weight 1 and added edges having weight 2. Find a TSP tour, and if it has length n (number of vertices in G) then it is a Hamiltonian Cycle.
What is a really simple approximation algorithm for solving the Maximum Directed Acyclic Subgraph Problem (MDASP) that contains at least half as many edges as the optimum?
Construct any permutation of the vertices and interpret it as a left-right ordering akin to topological sorting. Edges are then classified as pointing left to right or right to left, and neither subset can contain any cycles. Pick the larger one, which will contain at least half the edges, and therefore at least half the number of edges of the optimal solution. It is ok if the graph becomes disconnected or loses some vertices.
What is a Decision Problem and how can many algorithmic problems be phrased as Decision Problems?
Decision Problems are those with answers restricted to True and False. Problems can often be phrased as Decision Problems by asking "Does there exist...". For example, TSP can be phrased as "Does there exist a TSP tour with cost less than k?". If this problem is solved, translating back to regular TSP is straightforward.
How can Euclid's Algorithm be used to find the Least Common Multiple of two numbers A and B?
First note that A <= LCM(A,B) <= AxB, where LCM(A,B) = A only if A=B. Also note that A and B have prime factorizations and AxB will be the multiplying of them. If any integers are shared between the two factorizations, the duplicates can be divided out to yield a number still divisible by both A and B. The collection of these duplicate factors is exactly GCF(A,B). Therefore LCM(A,B) = AxB / GCF(A,B), and the GCF can be found using Euclid's Algorithm.
Provide an approximation algorithm for Euclidean TSP using Minimum Spanning Trees (MST).
First note that if you remove an edge from a tour you get a tree, which shows that weight of the MST is a lower bound on the weight of the optimal TSP tour. Therefore use an algorithm to find the MST, then perform a depth-first traversal. Such a tour will travel along each edge twice, meaning that this tour costs at most twice the optimal tour. However, we can shorten the tour further by not traveling along edges twice, but rather going directly to the next unvisited node in the tour when possible. The crucial property is that edge weights obey the triangle inequality.
Outline how 3-SAT inputs can be transformed to Vertex Cover Problem inputs.
For each boolean variable, create a pair of "upper" vertices with an edge between them. Since these edges don't share vertices, one vertex from each pair will have to be in the cover to cover all them. Remember, coverings are about the edges. For each 3-clause create a "lower" triangle with vertices labeled with the 3 literals. Then connect each vertex in these lower clause "gadgets" with the matching node in the upper vertex "gadgets". If there are n boolean variables and c clauses, then at least n + 2c nodes will be needed to cover them, 2 nodes in each triangle and one for each vertex gadget pair. If a covering exists with exactly n + 2c nodes, then the 3-SAT input is satisfiable. Cross edges that are covered from above (by a node in the vertex "gadgets") indicate that the literal in that clause is True. Therefore at least 1 cross edge per triangle must be covered from above, which is equivalent to satisfiability.
How can the Satisfiability Problem be reduced to 3-SAT?
For each clause, we want to produce a set of new clauses, containing new variables as needed, such that if a set of truth assignments satisfies the original clause, then it will satisfy the engendered ones as well. SAT implies 3-SAT. Further, if it is not possible to satisfy the original clause, then neither is it possible to satisfy the new clauses. NOT SAT implies NOT 3-SAT. Therefore SAT is equivalent to 3-SAT. Here's how. If C has 1 variable, we add 2 new elements to make them 3-clauses, but create 4 clauses with all combinations of the other variables. The only way all will be True is if the 1 original variable is True. Same for a 2-clause: add one new variable, but make two clauses, one with the variable and one with it's complement. The original variables must be True to make both clauses True. For a 3-clause, leave it alone. k-Clauses with k greater than 3 are the toughest. We create n-3 new variables and n-2 new clauses. Each new variable appears in one clause, and it's compliment appears in another. The original variables each appear in one clause. The new variables alone will not be able to make all the clauses True. At least one original variable needs to be True for the clauses to all come out True. The specific form of the clauses is best seen with an example. C(1,2,3,4,5) goes to C(1,2,v1), C(NOTv1, 3, v2), and C(NOTv2, 4, 5).
What is the Uniconnected Subgraph problem?
Given a Directed Graph G, a series of Arcs (i.e. Edges), and an integer k, is there a subset of Arcs of size less than or equal to k such that on this subgraph, there is at most one directed path between any pair of vertices?
What is the Set Cover problem?
Given a collection of subsets of some Universal set, what is the smallest subset of subsets whose union equals the universal set?
What is the Maximum Directed Acyclic Subgraph problem?
Given a directed graph, what is the largest possible subset of edges such that the graph is acyclic?
What is the Vertex Cover problem?
Given a graph G and an integer k, is there a subset S of vertices from G of size less than or equal to k such that every edge in G contains at least one vertex in S?
What is the Independent Set Problem?
Given a graph G and an integer k, is there an independent set of k vertices in G? An independent set of vertices is one where no two points share an edge.
What is the General Movie Scheduling Problem?
Given a set I of n SETS of intervals on the line and an integer k, does there exists a subset of at least k mutually nonoverlapping interval sets? Movies are allowed to have multiple disjoint filming periods. We seek to select at least k movies to film whose schedules don't overlap.
What is the Satisfiability (SAT) problem?
Given a set of boolean variables and a set of clauses that contain them (variables and their complements can appear in multiple clauses), does there exist a set of truth assignments for the variables such that each clause contains at least one True?
What is the Integer Programming problem? Intuition?
Given a set of integer variables V, a set of (linear?) inequalities over V, a maximization function f(V), and an integer B, does there exist an assignment of integers to V such that all inequalities are true and f(V) is greater than or equal to B? Each linear inequality divides the solution space in half with a straight (i.e. linear) partition. For example, in 2D, each linear inequality refers to one side of a line. In 3D, each inequality is one side of a plane.
What is the Traveling Salesman Problem?
Given a weighted graph G and number k, does there exist a simple tour that visits each vertex of G without repetition whose total weight is at most k?
What is the Hamiltonian Cycle Problem?
Given an unweighted graph G, does there exist a simple tour that visits each vertex of G without repetition?
How can the Independent Set Problem be reduced to the General Movie Scheduling Problem?
In ISP, selected vertices cannot share in edge just as in GMSP selected movies cannot share a time interval. Therefore vertices become movies, with the adjacent edges the time intervals during which the movie shoots. Only 2 different movies are shooting during each time interval. By selecting a set of movies to film we are also selecting an independent set of vertices.
What is the difference between NP-hard and NP-complete?
NP-hard problems are generally harder than NP-complete... and here's why. NP-hard problems by definition are just as hard as any NP-complete problem, but yet are not in NP, meaning there is no polynomial-time verification strategy. For example, in a two player game like chess, there is no polynomial-time strategy for verifying that, given an opening move, there exists a "perfect play" strategy that ensures a win (checkmate). Such verification strategies must exist for NP-complete problems, since they are in NP.
What is an approximation algorithm for the Set Cover Problem (SCP) that will perform within a factor of log n of the optimal? Outline where the log n factor comes from.
Repeatedly select the subset that contains the greatest number of as-yet-uncovered elements. Note that the additional number of elements covered at each stage forms a non-increasing sequence. Where is the log n factor? Suppose the greedy algorithm selected k subsets and has covered half the elements. The final subset selected could not have covered more elements than the average, which is n/2k. Therefore within this smaller sub-problem with n/2 points, no set exists that can cover more than n/2k elements. Even an optimal covering will therefore need at least k subsets to cover the rest. Each time the number of elements get's halved, this lower (i.e. best-case) bound k can be recalculated upward, and the maximum such k forms a lower bound for the optimal solution. The greedy solution, however, performs at worst max(k) times log 2. The ratio between max(k) and max(k) times log 2 is log 2.
Provide an approximation algorithm for the Vertex Cover problem and explain why it is at most twice as large as the optimal?
Select an edge, add both it's vertices to the cover, then delete the edge, along with all edges incident on the two vertices. Repeat on a new edge until none are left. Since edges are only deleted once they are covered, this procedure will produce a full cover. Note that the edges selected by the algorithm form a matching (they don't share any vertices). Any possible covering of the graph will need to include at least one vertex from each of these edges, while the approximation selected both. Therefore the smallest covering is at least half the size of this cover; the approximate cover is at most twice as large as optimal.
Outline how to prove that the Inequivalence of Programs with Assignments problem (IPAP) is hard.
Since this isn't a graph or numerical problem, we will use 3-SAT as our source problem, and try to map instances of 3-SAT to instances of IPAP. To do so, we make P2 always produce sat = False for a variable sat and we seek to construct P1 such that finding a set of initial assignments that produce sat = True is equivalent to finding a assignment of boolean variables that satisfies the 3-SAT clauses. To do so, we restrict our possible value set V to either True or False, and have our variable set X contain a variable for every 3-SAT boolean variable and for every 3-SAT clause, as well as our global satisfiable variable "sat". For each clause variable, the program contains a sequence of assignment that check if that clause's literals are True. For example, if c1 = (x1, NOT x2), then we write c1 = if (x1 = True) then True else False; c1 = if (x2 = False) then True else c1. By chaining assignments like this, we check if clause 1 is satisfied. We do this for all clauses, and then perform a similar series of assignments to check if every clause is True. If so, then sat = True. If an initial series of assignments produces sat = True, the same series of assignments on just the boolean variables will solve the original 3-SAT problem. With this transformation from 3-SAT to IPAP, we know there can't be fast algorithm for IPAP, or else we would have one for 3-SAT, which we know can't exist.
How can be the Independent Set problem be reduced to the Vertex Cover problem, and vice versa?
Take complements. The idea is that if a vertex cover is found, the vertices NOT in the cover cannot be adjacent and hence form an independent set, otherwise their shared edge would not be "covered". The reverse idea is that if an independent set of vertices is found, every edge must contain at least one point NOT in the independent set, otherwise both points are in the independent set and they are not "independent". Therefore the complement of the independent set forms a vertex cover.
Outline how to prove that the Uniconnected Subgraph problem (USP) is hard.
This is a selection problem on a graph, so we use Vertex Cover (VCP) as our source problem. 2 dissimilarities are that USP seeks a subset of edges on a directed graph, while VCP seeks a subset of vertices on an undirected graph. We seek to transform the input graph for VCP into one appropriate for USP. To do so, we "split" each edge by adding a new vertex in the middle, with directed edges going to the two adjacent vertices. We also add a sink node, with edges going from each original vertex to the sink. Now, all edges are now directed, and you can only go from a new vertex to an old one, and then from an old one to the sink. With this transformation from VCP to USP, we seek to answer 1) what will a solution to USP on this graph look like, and 2) how will this correspond to a vertex cover in the original problem. We are assuming/skipping that this transformation can be done in polynomial time. To answer (1), note that there is only one path from an old node to the sink, and either 0 or 1 path from a new node to an old node. The issue is that there are 2 paths from each new node to the sink, since each connects to 2 different old nodes. To reduce this to a single path (while removing the fewest edges, which is enforced by incrementing k) the program would delete the edge between the old node and the sink. In other words, "we must delete the outgoing arc from at least one of the two vertices defining each original edge." These deleted outgoing arcs correspond exactly to the included vertices in the Vertex Cover. At least one vertex will be included from each edge, meaning the edges are all covered.
How can 3-SAT be reduced to Integer Programming?
We map boolean variables to integer variables and clauses to constraints. Specifically, turn every boolean variable into two complementary integer variables which are 1 for True and 0 for False: we force each to be between 0 and 1 and we force their sum to be 1, i.e. for exactly one of them to be 1. Each clause then becomes a constraint, specifically that the sum of the variables be greater than or equal to 1. The objective function and it's desired value don't matter.