EECS 376 W22 #2
Las Vegas Algorithms: Randomized Algorithms
A random algorithm that always gives correct results, but it gambles with the resources (mostly time) used for computation. Like Quicksort is worst case O(n^2) but it is usually O(n log n)
Random Variable
A random variable X is a variable whose value is subject to variations due to chance A RV is described by values {a1, ... am} and probabilities {p1, ...pm} such that For all i: Pr[X=ai] = pi where 0 <= pi <= 1 p1 + p2 + ... + pm = 1
Polynomial Time Reduction
Must show: 1. Independent set in NP, 2. It reduces to some NP-Complete problem (and thus all other NP complete problems) A<=p B 2.a: Create a mapping that runs in polynomial time 2.b: Show that if input exists in language A, the mapping of input exists in B 2.c. Show that if mapping exists in B, then the reverse (or original) exists in A 2.d. Prove the mapping ran in polynomial time
KNAPSACK
NP-C problem N items, each with value and weight There is a weight capacity W to the knapsack. Goal: find set of items with maximal value subject to weight capacity W
Caesar Cipher
No longer need a key as long as the message ki = s(fixed) Encryption: [Es(m)]i = (mi + s mod 26) Observation: frequency remains the same This is breakable by statistical analysis on most common letters in english text
Atlantic City Algorithm
One that returns a numeric value with a high probability the value is epsilon close to the actual (think of something similar ot polling). A good example would be the estimation of pi. YOu can amplify the success by taking the median of the values on multiple runs.
Fermat's Primality Test
Pick at random 1<= a <= n - 1 If a^(n-1) = 1 (mod n) say "prime" otherwise say "composite" The algorithm is efficient. n is prime--> for every a the algorithm says "prime" n is composite-->Pr[algorithm says composite] >= 0.50 We can make the probability 0.999 by repetition
Functional/Search problems
Problems that need more than a YES/NO answer (i.e. given a graph, find min vertex cover) Types of Search problems: Maximization: Maximal clique, KNAPSACK Minimization: Minimal Vertex Cover, TSP Exact: SAT, HAMPATH Theorem: A search problem has an efficient algorithm iff its decision version does: -Ex: Does G have a clique of k? how about k - 1? k -2? ...?
Reed Solomon Codes
RS[n,k]q is a linear [n,k,n-k+1]q code C:Zq^k-->Zq^n when q is prime and q > n For a = (a0, ..., ak-1) ∈ Zq ^k fa(y) = a0 + a1y + a2y^2 + ... + ak-1 y^(k-1) fa(y) is a polynomial of degree at most k - 1 over Zq Encoding: C(a) = (fa(1), fa(2), ..., fa(n)) --evaluations of fa Decoding: Using polynomial interpolation A RS codeword can be decoded given any k of its coordinates (assuming no transition errors occurred)
Monte-Carlo Algorithms
"Short" runtime (efficient) but output could be incorrect. Output is correct with high probability. For example, primality testing (Fermat's), the chance of it being wrong is small
Secret Sharing: (k,n)-Threshold
(k,n)-Threshold-Scheme is a scheme to distribute a secret amongst a group of participants, such that: * no group of k - 1 (or less) participants can recover the secret * any group of k (or more) participants can recover the secret Motivation: Majority for voting Special majority (75%) to change certain laws Fix a prime q > n. Let s ∈ Zq be a secret (q should probably be a large prime) Pick at random a1, a2,... ak-1 ∈ Zq Define the polynomial: f(y) = s + a1y + a2*y^2 +...+ak-1*y^(k-1) Notice degree k-1 which means you need k data points to determine polynomial and thus determine x-intercept i.e. the secret s For each i = 1 ... n : the shares si is the point (i, f(i)) Recovery: Given k shares = points, find the polynomial g(y) of degree k - 1 that passes through the points Output f(0) Correctness: There is a unique polynomial of degree k - 1 that passes through k points
3SAT
3CNF = conjunction of clauses (a v b v c) ^ (d v e v f) ^ .... All clauses have 3 variables 3SAT = {ϕ | ϕ is a satisfiable 3CNF formula} 3SAT is polynomial time reducible to SAT 3SAT is polynomial time reducible to VERTEX-COVER 3SAT is in NP 3SAT is in NP-Complete
CLIQUE
A clique in an undirected graph is a subgraph wherein every 2 nodes are connected by an edge. A k-clique is a clique with k nodes in it. CLIQUE = {<G,k>|G is an undirected subgraph with a k-clique} CLIQUE exists in NP
NP-Complete
A language A is NP Complete if 1) A is in NP 2) For every language in NP, it is polynomial time reducible to A
SET-COVER
A set of elements, U n sets: Si ⊆ U Find smallest collection of Si's that cover U (their union is U) SET-COVER= {<U, S1, S2,...Sn,k>| Si⊆ U and there exists a collection of size k that covers U} SET-COVER is NP-Complete VERTEX-COVER is polynomial time reducible to SET-COVER (f makes graphs into sets)
Minimum Spanning Tree
A tree is a connected, undirected graph with no cycle. A minimum spanning tree is a spanning tree of minimum weight. There is a polynomial time algorithm for finding the MST of G. weight of MST <= weight of optimal tour because if we remove one edge from optimal tour, we are left with a spanning tree.
VERTEX COVER
A vertex cover for Graph G=(V,E) is a set C⊆V s.t. for each edge (u,v) ∈ E, either u ∈ C or v ∈ C VERTEX-COVER = {<G,k>: G is an undirected graph that has a k-node vertex cover} VERTEX-COVER is NP-Complete 3SAT <=p VERTEX-COVER
Two Sided Error Algorithms
Alg is a two-sided error algorithm for L if: x∈L-->pr[Alg accepts] >= 3/4 x/∈L-->Pr[Alg accepts] <= 1/4 false positives and false negatives in both cases the error is <0.25 Can we make the error <0.01? (btw 3/4 is chosen arbitrary, we can take any 1/2 <= c <= 1: x∈L --> Pr[Alg accepts] >= c x/∈L-->Pr[Alg accepts] <= 1-c ) If L has a two-sided error algorithm then so does L's compliment If P=NP then any language with an efficient random two-sided error algorithm is in P
Graph Isomorphism
An isomorphism of two graphs G1 = (V1,E1) and G2 = (V2,E2) is a bijection between their vertex sets. That is f: V1-->V2 such that (u,v) ∈ E1 iff (f(u), f(v)) ∈ E2 G1 and G2 are called isomorphic and denoted G1~= G2 if there is an isomorphism between them GI = {(G1, G2) | G1 and G2 are graphs s.t. G1~=G2} GI ∈ NP, certificate is the isomorphism GNI = {(G1, G2)|G1 and G2 are graphs s.t. G1 is not isomorphic to G2} Is GNI in NP? We don't know--certificate? Let's try an interactive proof system
Las Vegas to Monte-Carlo
Any Las Vegas(always giving correct results) can be converted to a Monte-Carlo(mostly giving correct results) by bounding the time that the algorithm can run. Given: Las vegas (LV) algorithm with expected run time at most f(n) Idea: stop algorithm after alpha*f(n) time (where 0<=a<=1) Error: stop too early Probability: Pr[T>a*f(n)] <= 1/a (where T is run time of algo) This is then a monte-carlo algorithm with run time alpha * f(n) and error rate 1/alpha, the bound is actually Markov's inequality.
Cryptography
Applications of Crypto: 1) Authentication: proving one's identity 2) Privacy/confidentiality: ensuring that no one can read the message except the intended receiver 3) Integrity: assuring that the received message has not been altered in any way
Knapsack Relatively Greedy
As long as possible add an item to Knapsack: take item with largest realtive value (value/weight) Does not work well
Cook Levin Theorem
For every language L in NP, there exists a poly time computable function that maps L to SAT.
Fast Modular Exponentiation and Repeated Squaring
Calculate a^b mod N when b is a power of 2. a^b mod N = (a^(b/2) * a^(b/2)) mod N = (a^(b/2) mod N) * (a^(b/2) mod N ) mod N This only works if b is divisible by 2 because we only deal integers. In order to actually make this work b must be a power of 2 so we can reduce it to squares. a^b = a^(2^log(b)) = (((a^2)^2)^2)... and this is computable in O(log(b)) time since each step is only one multiplication.
Expectation of a Random Variable
EX[X] = sum of all values multiplied by their corresponding probabilities For every sequence of RVs X1, ... Xn: EX[X1 + .. + Xn] = EX[X1] + ... + EX[Xn] A sequence of RVs X1, ... , Xn is independent if for every b1, ... bn : Pr[X1 = b1, ... , Xn = bn] = multiplication of all Pr[Xi = bi]
P (Efficiently decidable)
Efficiently decidable language A language L is efficiently decidable if there exists a polynomial time TM M(x) such that 1) x is in L --> M(x) accepts 2) x is not in L---> M(x) rejects P ⊆NP
NP (Efficiently Verifiable)
Efficiently verifiable language A language L is efficiently verifiable if there exists a TM(x, y) called a verifier such that: 1) runtime of V is polynomial in |x| 2) x is in L --> there exists a certificate c such that V(x, c) accepts 3) x is not in L--> for every certificate c, V(x,c) rejects All NP languages are decidable. Deciding an NP language will take at most exponential time
Application of RSA: Signatures
Ensuring identity of sender Run RSA "backwards" Given public/private keys (e,d) Signing message m: (m, sig(m)) where sig(m)=m^d mod n Verify signature: check that sig(m)^e = m, or that (m^d)^e mod n = m, where d*e = 1 mod ϕ(n) so m^e*d = m^1 = m So you either have to know private key (d) or know how to solve a computationally hard problem (haha, no) Correctness and security follow But this scheme has a flaw
Eupath
Eupath = {G | G has Eulerian path} Eulerian path is a path that visits every edge exactly once Eupath is in NP Eupath is in P, while HAMPATH is NP-C If Eupath is NP-C then P=NP Not known if Eupath is NP-C
Proof for NP≠P
Find a language L in NP which is simultaneously not in P
RSA
First public key cryptosystem Protocol public/private keys (e,d) secrete message m System parameters: the modulus n = p * q, a product of two large primes Alice sends her public key e to Bob Bob sends the ciphertext c = m^e mod n to Alice Alice computes m' = c^d mod n using her private key d Why it works: m' = c^d = (m^e)^d = m^(e*d) = m^ (1 + kϕ(n)) for some k = m (mod n) Eve knows: n, e, m^e mod n Eve does not know: p, q, d Eve wants to learn: m Try every possible m? O(n log n) Compute d given e, n? where e*d = 1 (mod ϕ(n)) Good knows, multiplicative inverse can be found efficiently given ϕ(n) using extended euclidean alg. How hard is it to compute ϕ(n), or in other words given n, find p and q? ϕ(n) = (p - 1)(q - 1) There is no efficient algorithm for integer factorization! In order to break RSA Eve will have to factor n which is believed to be a computationally hard problem Generate pairs of public/private keys: Given n, how do you generate a pair of public/private keys(e,d)? e *d = 1(mod ϕ(n)) A centra authority knows p,q and thus ϕ(n) Pick e, 2 < e < ϕ(n) such that gcd(e,ϕ(n)) = 1 and run the extended euclidean algorithm (it computes x and y for ax + by = gcd(a,b))
Fermat's Little Theorem
For any prime p and 0 < m < p, m^p-1 = 1 (mod p) For any prime p and 0 <= m < p, m^p = m (mod p) Ex: 3^6 = 729 = 1 + 104 * 6 = 3^6 = 1 (mod 7) where m = 3 and p = 7 2^10 = 1024 = 1 + 93 * 11 = 2^10 = 1 (mod 11) where m = 2 and p = 11
Randomized Algorithm Property
For any ϕ in 3SAT there exists an efficient randomized algorithm that satisfies 7/8 of the clauses (because (1/2)^3 = 1/8 of the time we pick all bad assignments where 1/2 is the probability we picked a shitty assignment for variable) There exists a randomized algorithm that 1/2-approximates max-cut (yes, the local search algorithm, but what else? Hint: assign variables/choose vertices at random) Knapsack has an efficient algorithm for instances with "small" numbers
α-approximation
Given a min/max search problem, we say that a solution is α-approximation if: Minimization: α * Optimal Solution <= value(x) <= Optimal solution, when α < 1 Maximation: Optimal solution <= value(x) <= α*Optimal solution, when α > 1 The closer α is to 1, the better the approximation ration
MAX-CUT
Given graph G=(V,E), a cut (S,T) is a separation of vertices V into two disjoint sets (S = V - T) E(S,T) = { e = (u,v) : e ∈ E, u ∈ S, v ∈ T} = the set of edges when one endpoint in S and the other in T Size of cut (S,T) = |E(S,T)| Problem: Given G, find a maximal cut. i.e. the number of edges between S and T is as large as possible. Decision version: MAX-CUT = {<G,k> : G has a cut of size k} Claim: MAX-CUT is NP-Complete
Primality Testing
Given n, there is an algorithm that decides if n is prime with runtime O(sqrt(n)) An efficient algorithm should have runtime O(logk(n)) Does an efficient primality testing algorithm exist? Yes, Fermat's primality test
Verifying Polynomial Identities
Given two polynomials determine if they compute the same function Chose some a such that 1<= a <= 4d where d is the degree of the polynomials p and q Suppose p(x) not equal q(x): Pr[p(a) not equal q(a)] >= 3/4 when a is chosen at random Proof: Consider r(x) = p(x) - q(x). By definition r(x) not equal to 0 and is of degree at most d. r(x) can have at most d roots Pr[p(a)=q(a)] = Pr[r(a) = 0] = |{roots of r(x)}| / 4d <= d/4d = 1/4 Pr[p(a) not equal q(a)] = 1-Pr[p(a) = q(a)] >= 3/4
HAMCYCLE
HAMCYCLE = {G : G is a directed graph with a cycle that visits all the vertices} HAMPATH <=p HAMCYCLE Proof: Given <G,s,t> define G' = G U (t, u) U (u,s) <G,s,t> in HAMPATH <--> G' in HAMCYCLE HAMCYCLE is in NP-C
HAMPATH
HAMPATH = {<G, s, t>: G is a directed graph with a Hamiltonian path from s to t} A path in G that starts at s, ends at t and visits each vertex in G exactly once. HAMPATH is NP-C
Independent Set
If G = (V, E) is an undirected graph, an independent set of G is a subset of the nodes where no two nodes share an edge. It asks whether a graph contains an independent set of a specified size. Independent Set = {<G,k> | G is an undirected graph that has a k-node independent set| Independent set is NP-Complete Independent set reduces to Clique
Sampling Thm
If n >= (ln(2m/delta)/(2*epsilon^2) ) Then we satisfy the fine print
Cryptosystem Security (2 types)
Information-Theoretic (unconditional): Eve can not learn the secret even with unbounded computational power Computational (conditional): In order to learn the secret, Eve will have to solve a computationally hard problem
Union Bound
Let B1, B2, ... Bk are "bad events" Union bound: Pr[B1 or B2 or .. Bk] <= sum from 1 to k of Pr[Bi] We want to bound the probability that any bad event occurs Pr[A U B] = Pr[A] +Pr[B] - Pr[A n B] <= Pr[A] + Pr[B] We can use union bound even if these probabilities are dependent
Interactive Proof System
Interaction: more rounds of information exchange Randomness: allows random number generators Observation: Let G be a graph. Let H be a graph that results from vertex renaming of G. Then H~= G Idea: If G1 ~= G2 then given any vertex renaming of one of them (like H), Merlin will be able to tell what the original graph is (is H~= G1 or is H~=G2?) Why? If H is vertex renaming of G1 --> H~=G1 but not G2 If H is vertex renaming of G2--> H~=G2 but not G1 Arthur on G1, G2: Pick i ∈ {1,2} at random Randomly rename the vertices of Gi to get H Ask Merlin for c such that H~=Gc If c = i accept, else reject G1 not isomorphic to G2 --> merlin knows the right answer c G1 ~= G2 --> Merlin doesn't know the right answer so he can only guess Merlin can cheat Arthur with probability <= 0.50 Idea: repeat the protocol 10 times and accept iff c = i each time --> merlin can cheat Arthur with probability < 0.0001 Theorem: L∈NP. There exists an interactive proof system where Merlin can convince Arthur that x is not in L with probability 0.999 (meaning if x is not in L, there is a high probability Merlin can correctly communicate this to us) For example: Vertex cover: no vertex cover of size k exists SAT: the formula is unsatisfiable L=Dist: G represents a code with distance exactly d Why is this cyrpto? Because Eve can still impersonate Alice and listen to our interactions. How can we prevent Eve from impersonating Alice without encryption? We can use the protocol for GNI as Eve does not learn anything Graph Isomorphism is not believed to be an NP-Complete problem. It is "almost" in P
Baby-step Giant-step Algorithm for computing discrete logarithm (not polynomial time efficient)
It computes discrete logarithm of an integer h not divisible by p with respect to a generator g of p g^(p-1) = 1 (mod p) where g^i not equal 1 (mod p) for i = 1, 2, ... p - 2 It has complexity O(sqrt(p) * log(p)) which is not polynomial time given input O(log(p)) Example: p = 31 and g = 3. Assume we want to compute discrete logarithm for h = 6. m =ceil( sqrt(31) )= 6 so we compute g^i for 0, 1, ...,5 g^0 = 3^1 = 1 g^1= 3 g^2=9 g^3=27 g^4 =81=19 g^5 = 19*3 mod 31 = 26 compute g^(-m) = 3^(-6) mod 31 Using euclidean algorithm, we find 3^(-6) = 21^6 = 2 mod 31 Start computing h* (g^(-m))^q = 6*(2^q) h*2^0 = 6 * 2^0 = 6 h * 2^1 = 12 h*2^2 = 25 ... h*2^4 = 6 * 2^4 = 3 mod 31 The collision between u and v is at u(1) = v(4) so q = 4 and r = 1 h = g^(4m + 1) = g^(6*4+1) = g^(25)(mod p) thus 25 is the discrete log of 6
LONG-PATH
LONG-PATH = {<G,k>: G has a simple path of length k}, the hamiltonian path that visits all vertices in NP reduction from HAM-PATH to LONG-PATH (set k such that the problem is eq to solving HAMPATH) LONG-PATH is NP-Hard
Randomized Algorithms and NP One Sided Error Algorithm
Let L be a language and suppose there exists a one-sided polynomial-time algorithm A with no false positives (i.e. always rejects when x is not in the language). Then L ∈ NP There is an efficient verifier V for L. r is the certificate for x and represents the sequence of random choices A can make to cause it to accept x: "On input <x, r>: 1. Run A on input x, but using the string r to determine the choice whenever a random choice is called for. 2. If A accepts, accept. Otherwise, reject." If x /∈ L then no sequence of choices r can ever cause it to accept a string that is not in L. This shows if P = NP then any language w an efficient randomized algorithm with no false positives also has an efficient deterministic algorithm because any language w an efficient randomized algorithm w no false positives exists in NP and thus would also exist in P. That is, we would be able to solve the problem without the help of a random number generator. If L∈P, then there exists a one-sided error algorithm with no false positives and no false negatives, namely, any polynomial-time decider for L.
Two Sided Error Randomized Algorithms
Let L be a language. A is a two-sided error algorithm for L if x∈L --> P[A accepts x] >= 3/4 x/∈L --> Pr[A accepts x]<=1/4 If you repeat the algorithm, you can amplify our probability of getting a correct answer t be arbitrarily close to 1 x∈L-->Pr[A accepts x] >= c x /∈ L --> Pr[A accepts x <= 1-c where c is a real number s.t. 1/2 < c < 1. Then for any epsilon > 0, there exists a 2 sided error polynomial time algorithm B s.t. x∈L --> Pr[B accepts x] >= 1 - epsilon x/∈L --> Pr[B accepts x] <= epsilon
Types of Randomized Algorithms
Let L be a language. Let Alg be a randomized algorithm for L. We say Alg has no false positives if: x∈L --> Pr[Alg accepts] >= 3/4 x /∈ L -- > Alg always rejects We say Alg has no false negatives if: x∈L --> Alg always accepts x /∈ L --> Pr[Alg rejects] >= 3/4 Claim: by repeating the algorithm several times the error can be made arbitrarily small
Deviation from Expectation
Let X = X1 + ... Xn Be the sum of independent indicator random variables where EX[Xi] = Pr[Xi = 1] = p EX[1/n X] = p Let epsilon > 0 then: Upper tail: Pr[1/nX >= p + epsilon] <= e^(-2*(epsilon ^2) * n) Lower tail:Pr[1/nX >= p - epsilon] <= e^(-2*(epsilon ^2) * n) Combined: Pr[ |1/n X - p | >= epsilon] <= 2e^(-2*(epsilon ^2) * n)
Secret Sharing: All or Nothing
Let Zm be a set of all secrets. Let s ∈ Zm be a secret Given n players, distribute a share si to each player i, such that no single player can recover s However, s can be recovered given all shares No proper subset of players can learn s For i = 1 to n - 1: pick si at random sn = s - (s1 + s2 + ... + sn-1) Recover: sum all si - s Correctness : s1 + s2 +... + sn = 2 Security: the subset of shares of size <= n - 1 can be arbitrary for any s Claim: For any secret s ∈ Zm, 1 <= i <= n and a ∈ Zm : Pr[si = a] = 1/m
One Time Pad
Let m = m1m2m3 ... mn be a message Let k = k1k2k3...kn be a secrete key Encryption: Ek(m) = (m1 + k1 mod 26);(m2 + k2 mod 26).... Decryption: Dk(c) = (c1 - k1 mod 26); (c2 - k2 mod 26)... Information Theoretic security (unconditional) Eve can't find message without knowing k Downsides: the key has to be as long as the message can't use the same key twice Application: used for online banking in europe
Randomized Protocol for EQ
Let x,y∈I subset {0,1}^n such that Alice has x and Bob has y Suppose for each x,y∈I 1. x = y 2. hamming distance (x,y) >= n/2 Protocol: Alice: pick a random 1<=i<= n and send (b,i)<--(xi, i) to Bob (this costs log(i) for the number i, O(log(n)) and O(1) for the bit xi) Bob: if b = yi, output 1; otherwise output 0 If dh(x,y) >= n/2 then there is more than half a chance that when we take the random bit and x does not equal y, it will get an xi that does not match up with yi with probability >=1/2 We can transform the input to get a large distance with error correcting codes. For each n>= 1, there exists a binary (4n, n. 3n)-ECC C (code length = 4n, distance = 3n) AliceL pick a random 1<= i <= 4n send (b,i)<--([C(x)]i, i) to Bob Bob: if b = [C(y)]i output 1; otherwise output 0 If x = y then for all i, [C(x)]i = [C(y)]i --> the protocol always outputs 1 if x not equal to y then Pr[ [C(x)]i not equal [C(y)]i] = distance between C(x),C(y) / 4n, which is at most 3n/4n = 3/4 The protocol has no false negatives It is correct with probability >= 3/4 Cost: log(4n) + 1 bits << n
Max-3SAT
Max-3SAT: Given a 3CNF formula, satisfy as many clauses as possible. We know there exists an efficient randomized algorithm that given a 3CNF formula satisfies 7/8 of the clauses. Suppose it has m clauses. Assign each variable at random. For each clause 1<= i<=m define Xi (# of clauses satisfied by algorithm) = 1 if ith clause is satisfied, 0 otherwise For any RV Z, there exists a realization s.t. Z >= EX[Z] Regard X as a function of an assignment a. EX[X(a)] = 7/8m. There exists an assignment such that X(a) >= 7/8m
Traveling Salesman Problem
Minimization problem (so approximation > 1) TSP = { (G, k) : there exists a path that visits all the vertices in G and has weight at most k} TSP is NP-C Given Graph G that is a complete, edge weighted, undirected graph. Complete means that for every pair of vertices, an edge exists between them so there is a clique of size |V|. Find a hamiltonian cycle of minimum weight called the optimal tour. You can use the 2-optimal algorithm for the TSP for the special case that the edge weights satisfy the triangle equality w(v1, vs) <= w(v1,v3) + w(v3,v2)
Knapsack Smart Greedy
Run relatively-greedy and dumb-greedy, pick the best solution among the two. It 1/2 approximates Knapsack so 1/2*optimal <= our solution <= optimal Pretty good job
Local Search Algorithm for Max-cut
S = null and T = V For every v ∈ S and u ∈ T 1. If moving u to S or v to T increases size of cut, make the move 2. Repeat while such move is possible. The algorithm is efficient and 1/2 approximates Max-Cut It makes at most |E| iterations because in each iteration the size of the cut increases 1. Let (S*, T*) be a Max-cut. Then E|S*, T*) <= |E| 2. |E(S,T)| >= sum of (all incident edges to all vertices v divided by 2) >= |E|/2 >= |E(S*,T*)|/2 1/2 * optimal <= our solution <= optimal
SAT
SAT = {ϕ | ϕ is a satisfiable boolean formula} SAT exists in NP-Complete
NP-Hard
Search problems cannot be "NP-Complete" because they don't return a YES/NO decision. We call a problem "NP hard" if its decision version is NP Complete. A problem is HP-Hard if an efficient algorithm to the problem implies P=NP We can design an efficient algorithm that "approximates" an NP-Hard search problem
Proof for NP = P
Show every language L in NP has an efficient decider (show an NP Complete problem exists in P)
Poly Time Computable Function
TM M given string s as input halts with f(x) on its tape in polynomial time (by |x|) A is polynomial time reducible to B (A <=p B) if a polynomial time computable function exists such that: x is in A --> f(x) is in B x not in A --> f(x) not in B Polynomial time reducible = efficiently mapping reducible If A is reducible to B and B is in P, then A is in P
Knapsack Dumb Greedy
Take the single item with the largest value Does not work well
Cover-Twice Algorithm
This is for finding the minimum vertex cover of G Init C = null While G has at least one edge: -Find an edge e = (u,v) in G -Remove the edges covered by u and v from G -C=C U {u,v} (add both endpoints to C) Algorithm is efficient and 2-approximates Min-VC. Proof: Let C* be minimal vertex cover of G and let S be the set of edges e the algorithm finds. 1. Edges in S are pair-wise disjoint --> |C*| >= |S| 2. actual min vertex cover is |C| = 2* |S| --> |C| <= 2 * |C*|
Heuristics
Two types: 1. Guaranteed to get the right answer, but may take a long time Goal: "often" works fast Example: SAT-solvers for SATs that emerge from software verification 2. Guaranteed to run fast, but may not get the right answer. Goal: "often" (approximately) correct Example: Greedy algorithms for TSP Downside : usually hard to prove interesting things with Heuristics
2-OPT for TSP
Use MST, travel it like a tour so total distance traveled is <= 2*weight of MST. this is <= 2*weight of optimal so optimal <= our solution <= 2 *optimal
Markov's Inequality
What is the probability that a random variable X deviates from its expectation? Let X be a non-negative random variable Pr[X >= a] <= EX[X]/a Pr[X >= c*EX[X]] <= 1/c
An Adversary
When B knows the strategy of A in advance and is going to choose an outcome such that B can always win, B is an adversary
Indicator
Z is an indicator (or 0-1 random variable) if Pr[Z=1] = p and Pr[Z=0] = p - 1 Observation: EX[Z] = Pr[Z=1] = p
Diffie-Hellman
Zp = {0, 1, ..., p -1} g∈Zp is a generator if for every x ≠ 0 ∈ Zp, there exists i ∈ N such that g^i = x For every prime p, Zp has a generator Protocol: For large prime p, g is a generator of Zp Alice picks a ∈ Zp and sends x = g^a to Bob Bob picks b ∈ Zp and sends y=g^b to Alice Shared key k = g^ab: Alice computes y^a, Bob computes x^b Eve knows: g, p, x=g^a and y=g^b Eve doesn't know: a nor b Eve wants to learn: k = g^ab Question: How can Eve learn k from g, p, x and y? Answer: Eve should somehow extract a ( from g^a) then compute y^a Question: How to compute a given x=g^a, g and p? Answer: Try consecutive powers of g, g^2, until we hit x Question: How long will this take us? The Discrete Log Problem: Let g be a generator of Zp and let x not equal 0 and exists in Zp. Find i such that g^i = x Similar to "regular" log, but it's different. There will always exist a solution where 1 <= i <= p - 1 The runtime of the powering algorithm is O(p) BUT input size: bitsize(p) + bitsize(g) = O(log p), and the runtime is O(p) so we have exponential runtime right now Question: can we do better? Answer: yes, there exists an algorithm with runtime O(sqrt(p) * log(p)), but no efficient O(logk(p)) algorithm is known Assumption: No algorithm for Discrete log problem exists This is why Eve cannot efficiently compute for a given x=g^a, g and p
Computing the value of pi
pi is the ratio of a circle's circumference to its diameter Task: compute pi accurately inscribe circle into a square throw n darts at random (n=y) count how many darts hit the circle(k = 6) output 4*k/n Why should it work? Pr[dart hits circle] = area(circle)/area(square) = pi *r^2 / (4 * r^2) = pi/4 if we throw many darts: k/n ~ pi/4