Graphing Algorithms Final Exam
graph G { a -- b; b -- c; b -- d; } What does this graph look like?
---------C -------/ a --- b -------\ ---------D
Steps for maximum bipartite matching using max-flow
1. Add Source and sink and edges to each 2. Give every node a capacity of 1 3. Add residuals 4. Find augmenting paths
Louvain Steps
1. Assign each node a cluster 2. Calculate change of modularity for adding node to any neighboring communites 3. If its positive, add it 4. Repeat till you iterate over all nodes in the graph and no change in the clustering was made. 5. New graph is created with all nodes in each communites become single node and make multi graph. 6. Repeat till no change
KNN Search Steps
1. Check to see if the heap contains the smallest values already and k is the number of neighbors looking for 2. check if leaf node 3. if it is, loop through the data and find data points that are closer than Q.peek 4. If its not a leaf node, call KNN_search on both left and right child
Lloyd's Algorithm Steps
1. Choose K cluster centers randomly 2. Assign data points to the cluster center they are most close to 3. update cluster center according to mean location of all data points
How to find minimum cut using max flow?
1. Find the max flow 2. Traverse in a breadth first fashion from s and when we encounter a saturated edge, we add it to the cut and do not continue
Kmeans++
1. First cluster center is chosen at random 2. each cluster center after is chosen by weighting the random choice among data points according to the square of their distance to their closest cluster center
Ball Tree steps
1. If length is 1, it is a leaf node and create a new ball tree with the single node being the center 2. Else create a new ball tree and set its center and radius based on the data and and split into left and right ball trees to create
Spectral Clustering
1. Laplacian transform L = D - A... where D is diagnoal with row sums as valuess 2. The lowest k eigenvalues are 0 according to the number of connected components
Color optimization: Local Search
1. Start with some coloring 2. Make local moves if it improves the solution. 3. Indirect objective function to make local moves to delete a color
Graph coloring Greedy Steps
1. Take the nodes in some order 2. for each one, we could color it the minimum color that is not currently used by its neighbors
A face is bound by
3 edges
All Planar graphs are at most what colorable?
4
hill-climbing strategy
A commonly used strategy in problem solving. If people use this strategy, then whenever their efforts toward solving a problem give them a choice, they will choose the option that carries them closer to the goal.
What is Rand Index used for?
A metric to tell how similar or how different two clusterings are (a + b) / (a +b + c +d) a: Same, same b: diff, diff c: same, diff d: diff, same
Edmonds-Karp Algorithm
A variation of Ford-Fulkerson Algorithm that uses BFS instead of DFS to ensure strongly polynomial time. O(VE^2)
What is Graph Coloring?
Assigning a color to each node such that no two adjacent nodes are the same color, sudoku
Clustree
Creates a tree/graph to visualize the best number of clusters. Bad when new clusters are formed with two different clusters
silhouette score
Each data point should be close to other points in same cluster and far from points in different cluster
Face
Every space the graph subdivides the space into
kmeans
Find k cluster centers in multi dimensional space such that the sum of the squared distances from data points to their closest cluster center is minimized NP-Hard
What does karger's do?
Finds the global min-cut that increases the number of connected components
Ball trees
Gives a nice property of having to search this space or not
What does Louvain method do?
Graph clustering also known as community detection
Planar Graph
Graph that can be drawn in the plane without crossing edges
Which approach does the louvain method use?
Greedy agglomerative clustering approach in which every node starts out as its own cluster
KNN Search
Holds a max heap of the closest points yet. Must be a max head cause if you pop, you want to remove the furthest closest point in the heap to replace it.
A* and its comparison to Dijkstra
In Dijkstra's, nodes are expanded in order of distance from the source node whereas in A*, nodes are expanded by minimum distance to source node + heuristic.
Hopcraft-Karp Pseudo Code
M <- empty set repeat ---G' alternating level graph ---P maximal set of vertex-disjoint shortest augmenting paths ---M xor edges in P until P is empty
Perfect Matching
Matching the covers every vertex of the graph
Chromatic number
Minimum number of colors a graph can be colored
What does Lloyd's Algorithm do?
Optimizes kmeans
Strongly Polynomial
Relied on the number of input data points said to be strongly polynomial
Weakly Polynomial Algorithm
Relies on the values of the input data rather than just the number of input data points
DSatur Heuristic
Similar to greedy but deals with next node to be the node whose neighbors have largest num of dif colors
Graph Face
Space that is enclosed by the graph edges
Modularity formula
Summation of edges within a community c - E(edges within community)
Kemp Chaining
Swapping nodes with their neighbors until no change is needed
Why using DFS for the ford fulkerson is not optimal
The longer augmenting paths we find, the more likely we will be to use an edge with a small bottleneck value
Divisive
Top down approach
Augmenting Path
Unmatched node to unmatched node which alternates using edges not in current matching and edges in current matching
Karger's steps
Works by the contraction operation... To contract an edge, we make two nodes that are incident on that edge into a single node. Delete self loops but retain multi edges. Randomly select edges and its repeated till 2 nodes
s-t cut
a cut that disconnects any path from s to t.
flow graph
directed graph with source and sink nodes where edges have a flow and capacity
Beam Search
at each stage in your sub solution, you branch in aBFS manner and then keep the top k scoring sub solutions (or the best lower bound sub solutions) for the next steps. If k is sufficiently large, the sub solution of the global solution is likely to be retained.
Agglomerative
bottom up approach
Problem with Lloyd's algorithm
can fall into local optima quite easily To overcome... restart the algorithm picking different initial cluster centers randomly differently each time. Then pick the final clustering which minimizes the sum of squared distances from observations to their cluster centers
K Colorable
colored with k colors
Alternating Level graph
created by a breadth first seach starting at unmatched nodes in set A alternating between using edges not in matching and edges in matching. Ends at level in which we find at least one unmatched node
saturated edge
edge where capacity - flow = 0
Different ways to determine the number of clusters in your data?
elbow plots, silhouette score, and clustree
Branch and bound
eliminates groups of trees from consideration upon discovering that all their members are worse than the best tree found so far
Why are planar graphs sparse?
for a given number of nodes, the number of possible edges a planar graph can have is limited edges <= 3v - 6
residual edge
for every directed edge <x, y> if there is not as edge <y, x> then we create it with capacity 0 and flow 0.
Ford-Fulkerson psuedo code
for x,y in G,edges: ---e.flow <-0 ---if y, x, e2 is not in G.edges: ------G.add_edge(y, x, flow = 0) while there is a path from s to t such that every edge has capacity - flow >0: ---bottleneckVal = min(capacity -flow among all edges in path p ---for each edge x, y, e in path p: ------e.flow = e.flow + bottleneck_value ------backwards_edge = G.edges.get(y,x ------backwards_edge = backwards_edge.flow - bottleneck_value
Intuition of Louvain
good clustering on a graph is one in which there are more edges between nodes that are both within a cluster than we would expect if the connections were random (Modularty)
Kuratowski's theorem
graph is planar if and only if it does not contain a subgraph which is a subdivision of K5 or K3,3
Feasible
if it obeys the constraints of the problem
What does the Hopcraft-Karp Solve?
maximum cardinality matching for unweighted bipartite graphs
Max Flow min cut theorem
maximum value of an s-t flow is equal to the minimum capacity of all s-t cuts
What does the Hungarian method solve?
maximum weighted perfect matching problem
augmenting path
path from source to sink along edges that have not been saturated
Hierachial clustering
seeks to either build up clusters bycombining data points and clusters that are close to one another into larger and larger clusters or find splitsof the dataset recursively splitting clusters until each data point is its own cluster.
Adjusted Rand Index
subtracts off expected level of similarity between two clusterings
Feasibility
travel through infeasible solutions by giving objective function a penalty or probability and ending up with a feasible one.
Elbow Plots
x axis is number of clusters, y axis is the loss function Loss function can be modularity or sum of squared distances. Always goes down as you increase clusters but will decrease by much less when you go past the true cluster num