CS 6515: Graphs
Kruskal algorithm
1. Sort all edges E in non-decreasing order by weight. 2. Initialize X as an empty set. 3. Create a disjoint-set data structure with each vertex as its own set. 4. For each edge e = (v, w) in the sorted list: a. If Find(v) != Find(w): i. Add e to X. ii. Union(v, w) in the disjoint-set. 5. Return X. The runtime is O(E log E) with a basic sorting method, but the dominating factor is the union-find operations which can make it O(E log V).
Minimum Spanning Tree (MST)
A Minimum Spanning Tree (MST) is a subset of the edges of a connected, undirected graph that connects all the vertices together, without any cycles, and with the minimum possible total edge weight.
What is a digraph?
A directed graph that can have cycles
Max-Flow Input, Graph Requirements, Output, Runtime
A greedy algorithm to find max flow on networks. The algorithm continually sends flow along paths from the source (starting node) to the sink (end node), provided there is available capacity on all edges involved. This flow continues until no further augmenting paths with available capacity are detected. Ford-Fulkerson in 5 minutes Ford-Fulkerson Algorithm Web Animation Input: G = (V, E) Flow capacity c Source node s Sink node t Output: Max flow We have access to: Can trivially create the final residual network with G Max flow of G Example use: We run FF on the flow network to get the maximum flow. We use this to construct the residual graph. Graphs that can use FF: Directed graphs with capacity of edges Runtime: O(C * m) C is the maximum flow in the network m is the number of edges
Sink in SCC vs Source in SCC
A sink SCC in a directed graph can consist of just a single node or multiple nodes. The exact number can vary, but all nodes within the sink SCC are strongly connected, meaning there's a directed path between every pair of nodes within that SCC. However, from this sink SCC, there are no outgoing edges that lead to nodes outside of this SCC. So, the number of nodes in a sink SCC can be anywhere from 1 to the total number of nodes in the graph, depending on the structure of the graph.
spanning tree
A spanning tree is a subgraph of a given graph that includes all its vertices, is tree-structured (acyclic), and connects all the vertices together.
Let G = (V,E) be a connected, undirected, weighted graph. Which of the following conditions guarantee the MST is unique? All the weights are different numbers. All the weights are positive. There is exactly one edge of minimum weight. For every cut in the graph, the edge of minimum weight in that cut is unique.
All the weights are different numbers. For every cut in the graph, the edge of minimum weight in that cut is unique.
Pre/Post Numbers in Undirected Graphs
Ancestor and descendant relationships, as well as edge types, can be read off directly from pre and post numbers This is to say pre(u) < pre(v) < post(v) < post(u)
BFS: Input, Graph Requirements, Output, Runtime
BFS is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph) and explores the neighbor nodes at the present depth before moving on to nodes at the next depth level. Input: G = (V, E) Start vertex v in V Output: dist[] For all vertices u reachable from the starting vertex v, dist[u] is the shortest path distance from v to u. If no such path exists, infinity otherwise. prev[] Vertex preceding u in the shortest path from v to reachable vertex u. Graphs that can use BFS: Unweighted graphs Undirected/Directed graphs Runtime: O(m + n)
Bellman Ford - What can it solve?
Bellman-Ford algorithm finds the shortest paths from a single source vertex to all other vertices in a weighted graph, including those with negative weights.
Bellman-Ford Input, Graph Requirements, Output, Runtime
Bellman-Ford is used to derive the shortest path from s to all vertices in V. It does not find a path between all pairs of vertices in V. To do this, we would have to run BF |V| times. Negative weights are allowed. Bellman-Ford: Theory in 4 minutes Bellman-Ford: Example in 5 minutes Input: G = (V, E) Start vertex s Output: The shortest path from vertex s to all other vertices. We have access to: Detect negative weight cycles. We can compare T[n, *] to T[n - 1, *]. We can only find negative weight cycles that can be reached from starting vertex s. Graphs that can use BF: Weighted graphs Undirected/Directed graphs CAN HAVE negative weights Runtime: O(mn)
How to Write a Solution - Analyze the runtime
Can explain in words. Can use black box runtime. Must include modifications run time.
What is CNF
Conjunctive Normal Form
Dijkstras Tips
Considers edge eights Needs Source Vertex O((n+m) log n) run times Can also see if s can reach a vertex v (inf distance if it can't) Does not work on negative weights
2-SAT Algorithm
Construct implication graph G for f Take a sink SCC S: Set S SCC to True (if any literal is negated, set it to F) Set not S SCC (which is a source) to False (if any literal is negated, set it to T) Remove S and it's complement not S repeat until empty O(n+m)
Blackbox Algorithms for Graph
DFS (which outputs connected components, topological sort on a DAG. You also have access to the prev, pre and post arrays.), the Explore subroutine, and BFS. Dijkstra's algorithm to find the shortest distance from a source vertex to all other vertices and a path can be recovered backtracking over the pre labels. Bellman-Ford and Floyd-Warshall to compute the shortest path when weights are allowed to be negative. SCCs which outputs the strongly connected components, and the metagraph of connected components. Kruskal's and Prim's algorithms to find an MST. Ford-Fulkerson and Edmonds-Karp to find max flow on networks. 2-SAT which takes a CNF with all clauses of size ≤ 2 and returns a satisfying assignment if it exists.
Explain DFS, Input, Graph Requirements, Output, Runtime
DFS is an algorithm for traversing or searching tree or graph data structures. It starts at the root or an arbitrary node of a graph and explores as far as possible along each branch before backtracking. Input: G = (V, E) Output: ccnum[] a topological sort on a DAG (Directed Acyclic Graph). it takes O(1) to access the first or last vertex of the topological sorting. More specifically, it outputs: Undirected graph G, where the vertices are labeled by connected component number (ccnum). Directed graph G, a list of connected components. We have access to the prev, pre, and post-arrays. Graphs that can use DFS: Unweighted graphs Undirected/Directed graphs DAGs Runtime: O(m + n)
Dijsktra Input, Graph Requirements, Output, Runtime
Dijkstra's algorithm is used to find the shortest distance from a source vertex to all other vertices. A path can be recovered by backtracking over all of the pre-labels. Dijkstra's in 3 minutes Input: G = (V, E) Start vertex v in V Output: dist[] Shortest distance between vertex v and reachable vertex u or infinity otherwise if not reachable. We have access to: prev[] Vertex preceding u in the shortest path from v to reachable vertex u Graphs that can use Dijkstra's: Weighted graphs Undirected/Directed graphs NO negative weights Runtime: O((m + n) log n) O(m log n) if the graph is strongly connected
Which of the following graphs can be topologically sorted? Choose ALL that apply. Directed Acyclic Graphs Undirected Acyclic Graphs Directed Cyclic Graphs The metagraph of a Directed Cyclic Graph
Directed Acyclic Graphs The metagraph of a Directed Cyclic Graph
DFS on Directed Graph
Directed graph, we don't utilize connected component numbers. Instead, we have a count "clock" which changes with every new vertex we visit, and "clock" also changes with every vertex we complete exploring. These counts are saved in the "pre" and "post" arrays for each vertex.
Please select all statements which are always true. Every directed acyclic graph must have at most one sink. Every directed acyclic graph must have at least one source. If we run DFS on a directed graph G (not necessarily acyclic), then the vertex with the highest post label is always in a source component of the metagraph of G. If we run DFS on a directed graph G (not necessarily acyclic), then the vertex with the highest post label is always in a sink component of the metagraph of G. If we run DFS on a directed graph G (not necessarily acyclic) starting at node s, which is in a source component of G, then s will have the highest post label. If we run DFS on a directed graph G (not necessarily acyclic) starting at node s, which is in a sink component of G, then s will have the highest post label.
Every directed acyclic graph must have at least one source.
How to Write a Solution - Prove Correctness
Explain how your problem is solved by the black boxes used. Explain how changing the inputs and/or outputs gets what you want.
Explain Explore, Input, Graph Requirements, Output, Runtime
Explore is used by DFS as a sub-routine. Very simply, it outlines how to get from one vertex to another along a path in a graph or a tree. A key point to Explore is that we only move to the next available neighbor only when we have fully explored the current vertex and its adjacencies. It is the best algorithm for checking if start vertex v can reach certain vertices in G. Input: G = (V, E) Start vertex v in V Output: visited[] which is set to TRUE for all vertices reachable from v. We have access to: ccnum array (ccnum[]) previously visited array (visited[]) An array of vertices before a given vertex. Used by Explore and required for DFS. Graphs that can use Explore: Unweighted graphs Undirected/Directed graphs DAGs Runtime: O(m + n)
Floyd Warshall Input, Graph Requirements, Output, Runtime
FW is primarily used to find the shortest path from ALL nodes to all other nodes where negative weights are allowed. Floyd-Warshall in 4 minutes Input: G = (V, E) Output: The shortest path from all vertices to all other vertices We have access to: We can detect negative weight cycles by checking the diagonals (T[n, i, i]). Graphs that can use FW: Weighted graphs Undirected/Directed graphs CAN HAVE negative weights Runtime: O(n^3)
How to Write a Solution - Graphs
Figure out your Black Box. State the steps of your algorithm. Prove Correctness Analyze the runtime.
Kruskals / Prim
Finds MST for an undirected, connected Graph O (m log n) Kruskals: Sort edges by weight Grab lightest available edge that will not create a cycle when added to the MST Another ways to look at this is to never add edges if both endpoints are already merged Prims: Start with a arbitrary vertex v and put it into a subtree S of included vertices In each iteration, grow S by adding the lightest edge between a vertex in S and vertex outside of S
BFS Tips
Finds shortest path for Graph, Tell us show many edges from s ( doesn't consider edge weights) Needs Source Vertex O (n + m) Can also see if s can reach a vertex v (inf distance if it can't)
Floyd Warshall - What can it solve?
Floyd-Warshall algorithm computes the shortest paths between all pairs of vertices in a weighted graph.
BFS of Graph (Graph, Start Vertex)
For all V in vertices: dist(v) = min # edges from s to v, (infinity if no path)
Find Source/Sink
For all vertices, if they have no outgoing edges, then they are a sink. You can find all sinks this way. O(n) Reverse the graph, for all vertices in the reversed graph, if they have no outgoing edges, then they are a sink. All sinks in the reverse graph are sources in the original. O(n+m)
Reverse a Graph Algorithm
Function ReverseGraph(G): 1. Let G_R be an empty graph (This will store the reversed graph). 2. For each vertex 'v' in G: Add 'v' to G_R with no outgoing edges. 3. For each vertex 'v' in G: For each edge (v -> u) from 'v': Add an edge (u -> v) to G_R. 4. Return G_R. Runtime: O (V + E)
Strategies: SCC
Get SCCs, and use it get more information graph your graph, G Get SCCS, and use it determine pathing, connectivity, cycles, etc
Strategies: DAG
Geta DAG, sort it, go down the ordering and look for more information
Kosaraju-Sharir algorithm
Here's a step-by-step breakdown: Reverse the Graph: Take the graph GG and produce the reversed graph GRGR by reversing the direction of all the edges. Run DFS on the Reversed Graph GRGR: This DFS traversal will provide you with post-order (or "finish times") for each vertex. Order Vertices by Decreasing Post-Order: After the DFS on GRGR completes, order all vertices by decreasing post-order (or finish time) number. This means the vertex that finished last will be first in this ordering. Run DFS on the Original Graph GG: Use the order produced in step 3. When you run DFS on GG considering vertices in this specific order, each DFS "forest" you produce will correspond to a strongly connected component. Its runtime is O(V + E )
Explore Algorithm
Input: G = (V,E) is a graph; v ∈ V Output: visited(u) is set to true for all nodes u reachable from v visited(v) = true previsit(v) for each edge (v, u) ∈ E: if not visited(u): explore(u) Tip: The previsit and postvisit procedures are optional Previi Runtime: O (E)
Topological Sort Requirements
Only works on DAGS: First SCC is a source an last SCC is a sink.
Output - Prims vs Kruskal
Kruskal - List of edges that belong to MST Prims- is a previous array that tells you what each vertex is connected to
Kruskal Algo Use Case
Kruskal's algorithm is used to find the minimum spanning tree of a connected, undirected graph, optimizing pathways or connections with the least possible total edge weight.
Kruskal Input, Graph Requirements, Output, Runtime
Kruskal's is one of the two algorithms used to find the Minimum Spanning Tree (MST) discussed in class. Kruskal's Algorithm in 2 minutes Input: Connected, Undirected Graph G = (V, E) with edge weights w_e Output: An MST defined by the edges E Graphs that can use Kruskal's: Connected Undirected Weighted Runtime: O(m log n)
2SAT algorithm
Lets you solve boolean satisfiability problems
SCC algorithm Requirements
Only works on Directed graphs
How to Write a Solution - Algorithm Steps
NO PSEUDOCODE - use words What changes you may need for the input. What black box you will feed your input to. What changes you may need for the output. Repeat with more black boxes as needed
Does v with lowest post # always lie in a sink SCC?
No.
Checking, reading, or removing one vertex runtime
O(1) We can simply index into the array to find the vertex we want.
Iterating, checking, reading, removing, or otherwise working on all vertices runtime
O(n)
Checking, reading, or removing one edge runtime
O(n) or O(m) One is usually more correct than the other depending on what you are doing, but for the sake of simplicity, we will accept either when a single edge is involved. You must first index into the array for the start, then traverse the edge list to find the end.
Traversing, reversing, copying, subgraphing, or otherwise working with a full graph runtime
O(n+m)
Iterating, checking, reading, removing, or otherwise working on all edges (or subset) runtime
O(n+m) If you are working on all edges, you will need to check all vertices for its edges. So you must pay the cost of this check whether a vertex has edges or not.
Prims Input, Graph Requirements, Output, Runtime
Prim's algorithm is the second and final algorithm used to find the MSTs as discussed in class. Prim's Algorithm in 2 minutes Input: Connected, Undirected Graph G = (V, E) with edge weights w_e Output: An MST defined by the prev[] array Graphs that can use Prim's: Connected Undirected Weighted Runtime: O(m log n)
Prims Algo Use Case
Prim's algorithm is used to find the minimum spanning tree of a connected, undirected graph, starting from an arbitrary vertex and greedily growing the tree by adding the nearest vertex with the smallest edge weight.
Prim's Algorithm
Prim's algorithm starts from an arbitrary vertex and repeatedly selects the edge with the smallest weight connecting a vertex in the tree to a vertex outside it, adding the edge and its adjacent vertex to the tree. The process continues until all vertices are included in the tree. The runtime is O(V2) with a basic implementation, but can be improved to O(E +V log V) using priority queues or Fibonacci heaps.
Floyd Warshall Input/Output
Provide the graph with its edges and weights; the algorithm returns the shortest path lengths between every pair of vertices. Floyd-Warshall takes O(V^3) and the minimum can be found in O(V), leading to anO(V^3) runtime.
Bellman Ford Input / Output
Provide the graph, edges, weights, and source vertex; the algorithm returns shortest path lengths to all vertices or detects a negative cycle.
Strategies - Kruskals/ Prims
Remove an edge or certain edges and run Kruskals or Prims to determine what happens without the edges (s) Make changes to the weights to avoid some edges from being picked first Make changes to the weights to force some edges int being picked first Be very careful when changing weights, make sure you think it through There is difference between being picked last, and not being picked at all Run either on a subset of G and then build on the partial MST Use the known MST Properties(cut, cycle, n-1, etc) to justify your correctness
Strategies - BFS/ Dijkstras
Remove an edge or certain edges, and run either to determine what happens without the edges Do Graph reversals (directed graphs) and then run either to determine pathing to a. vertex from another vertex Never run Dijkstra's more than twice or it will increase run time Tip: Running djikstras from every vertex is a bad idea
Strategies: Traversals
Remove an edge, or certain edges and run Explore of DFS to determine what happens without edges. Do graph reversals (Directed graph) and then run Explore or DFS to determine pathing to a vertex instead of from a vertex Tip: Running Explore from every vertex is a probably a bad idea
Detect a cycle in a graph with e
Remove e = (u, v) from g. Run Explore from u and check whether v is visited. Output that there is a cycle containing e. Otherwise, output that there is no cycle containing e.
What you can do for DFS to find all reachable vertices from s
Run DFS from s for v ∈ V, find all v with the same ccnum as s. These are reachable by s.
Dag Source Vertex/Sink Vertex in DAG
Source Vertex = no incoming edges and is the highest post order number Sink Vertex = no outgoing edges and is the lowest post order number
What is SCC?
Strongly Connected Components. A component is strongly connected if there is a path between a source vertex and a destination vertex, and a path from the destination vertex back to the source vertex. Another way a component is strongly connected, is if no other vertex can reach it. (Sink vertex)
CNF Input, Graph Requirements, Output, Runtime
The 2-SAT problem is to determine whether there exists an assignment to variables of a given Boolean formula in 2-CNF (conjunctive normal form) such that the formula evaluates to true. The algorithm for solving 2-SAT uses graph theory by constructing an implication graph and then checking for the existence of a path that satisfies the conditions. Input: A Boolean formula in 2-CNF is represented as a set of clauses where each clause is a disjunction of exactly two literals. Output: A Boolean value indicates whether the given 2-CNF formula is satisfiable. If it is satisfiable, the algorithm may also provide a satisfying assignment of variables. Graphs that can use 2-SAT: Directed graphs The implication graph is inherently directed since each implication (¬x → y) has a direction. Runtime: O(m + n) - m is the number of clauses in the 2-CNF formula, n is the number of literal or variables. This runtime stems from the linear runtime of SCC finding algorithms and the construction of the implication graph.
Edmonds Karp Input, Graph Requirements, Output, Runtime
The Edmonds-Karp (EK) algorithm is utilized to determine the maximum flow in a network. This is analogous to the Ford-Fulkerson method but with one distinct difference: the order of search for finding an augmenting path must involve the shortest path with available capacity (BFS for G where all edge weights equal 1). Input: G = (V, E) Flow capacity c Source node s Sink node t Output: Max flow We have access to: Can trivially create the final residual network with G Max flow of G Example use: We run EK on the flow network to get the maximum flow. We use this to construct the residual graph. Graphs that can use EK: Directed graphs with capacity of edges Runtime: O(nm^2) n is the number of vertices m^2 is the number of edges
SCC Input, Graph Requirements, Output, Runtime
The SCC algorithm is used to determine the strongly connected components as well as the meta-graph of connected components in a given directed graph. Input: G = (V, E) Output: meta-graph (DAG) that contains the connected components Topological sorting of the meta-graph With a source SCC first and a sink SCC last We have access to: ccnum[] - strongly connected components produced from the 1st DFS run Graphs that can use SCC: directed graphs Runtime: O(m + n)
Topological Ordering?
Topological ordering, is a list where every vertex appears after all vertices it depends on. For the image attached, this would be the topological ordering: XYZWU XYZUW XYUZW
Undirected Graph Edges
Tree edges are actually part of the DFS forest. Forward edges lead from a node to a nonchild descendant in the DFS tree. Back edges lead to an ancestor in the DFS tree. Cross edges lead to neither descendant nor ancestor; they therefore lead to a node that has already been completely explored (that is, already postvisited).
DFS on Undirected Graph
Undirected graph, cc and ccnum(Z) relate to determining the connected components of the graph. "ccnum" is an array returned by DFS which tells us the connected component number of every vertex. "cc" is a counter used within DFS to determine which connected component number the algorithm is on and should currently set the vertices to. The "cc" counter only moves up one if there are more vertices left to visit after fully exploring from a particular vertex.
Blackbox to find path from S to T
Use Explore
Dijkstra's algorithm (BFS-Based)
Uses a priority Queue. input: G = (V,E), Source vertex, length of edge > 0 for every edge output: for all vertices, dist(v) = length of the shortest path from source to vertex
Graphs (Floyd Warshall) - Requirements/ What can it not solve?
What are the requirements? The graph can be directed or undirected, and the algorithm requires knowledge of all its edges and their weights. What can it not solve? It's not efficient for single-source shortest paths in sparse graphs, and there are faster algorithms for graphs without negative cycles.
Graphs (Bellman-Ford) - Requirements/ What can it not solve?
What are the requirements? The graph must be connected, and the algorithm requires knowledge of all its edges and their weights. What can it not solve? It cannot efficiently solve shortest path problems on graphs with non-negative edge weights (Dijkstra's is faster) and fails to produce correct results for graphs with negative cycles.
How to find a sink and a source?
While doing the explore algorithm, pass in a source and a sink array. The sink array will be empty when passed in, when looping and there is no edges, add it to the sink. The source array will have every vertex, as each explore method is called, you will remove a vertex if it is part of another edge.
Does v with the highest post # always lie in a source scc?
Yes
How to find SCC sink and SCC source?
You reverse the graph. Run DFS on the reverse graph. Order the vertex by post clock number. Run DFS on the original graph, with the ordering by post clock number from the reversed graph While doing the explore algorithm, pass in a source and a sink array. The sink array will be empty when passed in, when looping and there is no edges, add it to the sink. The source array will have every vertex, as each explore method is called, you will remove a vertex if it is part of another edge.
DFS (Graph) Algorithm
dfs(G): for all v ∈ V : visited(v) = false for all v ∈ V : if not visited(v): explore(v) Runtime: O (|V| + |E|})
SCC Algorithm
input: DAG = (V,E) in adj list: 1. Construct reverse graph 2. Run DFS on the Reverse Graph 3. Order V(vertex) by post # in decreasing order 4. Run undirected connected components all on G
Previsit(Vertex)
previsit(v): ccnum[v] = cc Background Info: Check if a graph is connected and, more generally, to assign each node v an integer ccnum[v] identifying the connected component to which it belongs. CC needs to be initialized to zero and to be incremented each time the DFS procedure calls explore.
Time Orderings: with Previsit(vertex) and Postvisit(vertex)
previsit(v): pre[v] = clock clock = clock + 1 postvisit(v): post[v] = clock clock = clock + 1
Bellman-Ford Algorithm
procedure Input: Output: shortest-paths(G, l, s)Directed graph G = (V, E);edge lengths {le : e ∈ E} with no negative cycles; vertex s ∈ VFor all vertices u reachable from s, dist(u) is set to the distance from s to u. for all u ∈ V : dist(u) = ∞ prev(u) = nil dist(s) = 0repeat |V | − 1 times: forall e∈E: update(e) The runtime of the Bellman-Ford algorithm is O(V×E), where V is the number of vertices and E is the number of edges in the graph. This is because the algorithm iteratively relaxes all the edges for V−1 times in the worst case.
BFS with Queue
procedure bfs(G, s)Input: Graph G = (V, E), directed or undirected; vertex s ∈ V Output: For all vertices u reachable from s, dist(u) is set to the distance from s to u. for all u ∈ V : dist(u) = ∞ dist(s) = 0Q = [s] (queue containing just s) while Q is not empty: u = eject(Q)for all edges (u, v) ∈ E: if dist(v) = ∞: inject(Q, v) dist(v) = dist(u) + 1
Kruskal Algo(From Book)
procedure kruskal(G, w) Input: A connected undirected graph G = (V, E) with edge weights we Output: A minimum spanning tree defined by the edges X for all u ∈ V : makeset(u) X = {} Sort the edges E by weightfor all edges {u, v} ∈ E, in increasing order of weight: if find(u) ̸= find(v): add edge {u, v}
What are the SCC, in this graph? Graph is in the definition.
{A},{B,E},{C,F,G},{D},{H,I,J,K,L}