Graph Algorithms
COMPSCI 163
- Graph Representation
- BFS/DFS
- Minimum Spanning Trees
- Topological Ordering
- Shortest and Longest Paths
- Widest Paths
- Euler Tours
- Traveling Salesman Problem
- Cliques and Coloring
- Graph Coloring
- Chordal Graphs
- Perfect Graphs
- Interval Graphs and Path Decomposition
- FPT
- Flow
Graph Representation
Want the following out of a graph data structure: - Count vertices: O(1) - Iterate through all vertices: O(1) per vertex - Associate information with vertices: O(1) per get/set - Iterate through all edges: O(1) per edge - Get degree of vertex: O(1) - Count all edges: O(1) - Iterate through edges in/out of a vertex: O(1) per edge - Associate information with edges: O(1) per get/set - Test whether two vertices are connected by an edge: O(1)
Decorator Pattern and Adjacency Lists
Possible solutions:
- Dictionary, with vertices as keys
- Number vertices and index an array using those numbers
- Separate variables for vertex objects for each piece of information
- One dictionary variable per vertex with keys for each piece of information
- An adjacency list uses a dictionary of vertices as keys and the list of outgoing neighbors as values
- There are lots of different variations on this structure; different objects for vertices and edges
- The Python-style dictionary representation satisfies the time constraints needed for an efficient PageRank algorithm
- Some operations are not supported; e.g. looping through incoming edges can be slow
BFS/DFS
- Implicit graph: A graph whose edges and vertices are not already given and must be found through an algorithm (like BFS)
- A path is an alternating sequence of endpoints and edges that does not allow repetition
- A walk is the same as a path but allows repetition
- Reachability: Does there exist a path between two nodes? If so, then they are reachable from each other
Minimum Spanning Trees
- Given an undirected graph, find a tree that connects all vertices and minimizes the total weight
- Disconnected graphs will output a spanning forest as opposed to a tree
- Minimize and maximize are the same problem; simply make weights negative
Applications
- Build an optimal power grid/transportation routes with minimal total construction cost
- Build a computer network with maximum badwidth - maximum spanning tree
- The minimum spanning tree has the smallest possible bottle neck out of all trees; maximum has the largest
Properties
- Assumes that there are no “tied” edges, and if they do have the same edge, then use a consistent tiebreaking method
-
Cycle property: If C is a cycle in the original graph G, then the heaviest edge in this cycle cannot be in the MST
- In other words: If C is any cycle and e is its heaviest edge, then an MST that contains e will not be the MST
-
Cut property: If we cut the vertices of the graph into any two subsets of vertices X and G - X, and e is the lihtest edge with endpoints in both subsets, then e must be in the MST
- Can be used to derive algorithms
- Path property: For any two vertices x and y, the MST path from x to y has the minimum possible weight for its heaviest edge
Algorithms
- Most algorithms to find an MST takes O(n log n) time and utilize the cut property
- Algorithms have been found that take close to linear time or expected linear time but randomized
Jarnik’s Algorithm (Prim-Dijkstra)
- Start with some vertex S; this will be the start of the MST, named T
- Repeat until T contains all vertices:
- Partition the graph into T and G - T
- Find the minimum-weight edge e connecting T and G - T
- Add e to T (as well as its corresponding vertex)
- Can be implemented via a priority queue
- Store vertices not yet in T by the weight of the minimum-weight edge that would connect it to T
- Update the priority queue each time you update T by looping through the edges
- Using a binary heap, all find/remove operations take O(log n) time, so the total time is O(m log n) because each edge is looped over twice
- A Fibonacci heap can get this down to O(m + n log n)
Kruskal’s Algorithm
- Start by creating a forest of one-vertex trees
- Sort the edges by weight
- For each edge, if it connects two different trees, conenct the trees and add the edge to the MST
- O(m log m) for sorting, O(m) to iterate through list; O(m log m) total
- Can use the union-find data structure for near-constant lookups of forests
Boruvka’s Algorithm (Sollin)
- Start by creating a forest of one-vertex trees
- Label vertices by their tree
- While there is more than one tree, find the minimum-weight outgoing edge from each forest
- Add all of these minimum-weight edges to the T and connect the forests using them
- The number of trees will go down by a factor of 2 or more, so we will do this iteration log n times
- Looping through all m edges per iteration means the runtime is O(m log n)
Hybrid Algorithm
- Run Boruvka’s algorithm until the number of trees is less than or equal to n / log n
- Run Jarnik’s algorithm with Fiboniacci heaps, using a priority queue on the trees as opposed to the vertices
- O(m log log n) from Boruvka and O(m) from Jarnik means that the total runtime is O(m log log n)
Topological Ordering
- Topological ordering: A list of all vertices in G such that, for every edge $x \rightarrow y$, x is before y in the list
- Topological sorting: An algorithm to compute a topological ordering
- Graphs that have cycles cannot have a topological ordering
- Thus, only directed acyclic graphs can have a topological ordering
- Can use a simple greedy algorithm; add vertices with no incoming edges to the ordering, and remove its outgoing edges from the graph
- If there is ever a point where there are no “ready” vertices but there are still vertices left to process, then there is a cycle
- To find the cycle: go to a remaining vertex and walk forward until a vertex is repeated; the sequence between the repititions is the cycle
Shortest and Longest Paths
- Shortest path can found by BFS; longest path is NP-hard because it includes finding a Hamiltonian path
- When there exists negative paths, then shortest path is NP-hard
- Shortest walk is not possible if the graph has a negative cycle; can keep walking around the cycle and reducing total weight
- For finding shortest paths in directed graphs:
- If graph is DAG, use topological ordering (O(m))
- If graph has nonnegative edge lengths, use Dijkstra’s (close to linear time)
- Works on undirected graphs
- If graph has negative edges but no negative cycle, use Bellman-Ford algorithm (O(mn))
- Can also detect negative cycles if one exists
- Bellman-Ford is optimal and can be improved based on recent results
- Dijkstra’s is “universally optimal” for a fixed graph with variable, positive weights
Critical Path Scheduling (PERT method)
- Input: a project consisting of multiple tasks that take different amounts of time
- Some tasks can be done simultaneously; others rely on earlier tasks to be completed
- Goal: Find a schedule for the tasks that finishes the project in the lowest possible time
- AKA longest path finder
- Can use an activity-on-edge graph; tasks are converted to edges, and vertices are project milestones
- Constraints are represented by directed edges and have zero weight
- The critical path is the longest path in this graph from start to end and is the minimum possible time for the schedule
- Assign times to vertices by making sure that the difference between stages is greater than or equal to the length of the edge between them
- To calculate the optimal schedule, we can use a topological ordering
- Set the path length of the starting vertex to 0
- Find the largest incoming edge $u\rightarrow v$ and set the path length of v to $L(u) + weight(u \rightarrow v)$
- Can find the smallest incoming edge to get the shortest paths
Shortest Path Trees and Relaxation Algorithms
- In graphs without negative cycles, we can generate a tree of shortest paths
- If s is the starting vertex, then other vertices will have parents that are the vertices that come before them in their shortest path
- e.g. if the shortest path from x to c is $x \rightarrow b \rightarrow c$, then b is a parent of c in the tree
- Input: A graph with edge lengths and a starting vertex
- Output: Tree of shortest paths from s to all other reachable vertices
- Two decorations per vertex: the parent and distance from start vertex
- General idea: use the two decorations and update them when shorter paths are found
- Invariants
- D is the length of some path to x that is greater than or equal to the correct value
- P is the second-to-last vertex on a path of length ≤D
- Initialize by setting P(x) = None and D(x) = 0 if the vertex is s; otherwise D(x) = infinity
- For each edge uv, check whether the distance to u + the length of uv is less than the current shortest path for v, then update if so
- Known as relaxing an edge
def relax(u, v):
if D[u] + length(edge uv) < D[v]:
D[v] = D[u] + length(edge uv)
P[v] = u
- If the graph is a DAG:
- Simple algorithm is to go through vertices in topological order and relaxing each vertices’ edges
- Total time is O(m)
initialize D, P
for v in topological order:
for incoming edges uv:
relax(u, v)
Bellman-Ford Algorithm
- Uses relaxation paradigm and takes O(mn) time
initialize D, P
repeat n-1 times:
for each edge uv in the whole graph:
relax(u, v)
- Can be improved based on ordering of nodes or stopping early; still results in same big O
Dijkstra’s Algorithm
- Topologically sorts the shortest path tree by sorting vertices by distance
- Uses a priority queue to order on the fly
initialize D, P
make priority queue Q of vertices, prioritized by D[v]
while Q is non-empty:
find and remove minimum-priority vertex v in Q
for each edge vw:
relax(vw)
- Takes O(m log n) with a binary heap or O(m + n log n) with a Fibonacci heap
Changing Weights without Changing Optimum Paths
- We can add another decoration to the edges, a height h(x), to reweight the edges
- Each edge has two properties: length and height
- New length: $l_h(u\rightarrow v) = l(u\rightarrow v) - h(u) + h(v)$
- Downhill edges have decreased length, uphill edges have increased length
- Heights can eliminate negative edges and allow Dijkstra’s to work
- Can now use Johnson’s and Suurballe’s algorithm
- Won’t get rid of negative cycles
- Can also change the order of computation
- A* algorithm uses this fact
Johnson’s Algorithm
- Input: A graph G with negative edges but not negative cycles
- Output: Shortest paths between all pairs of vertices
- Add a start vertex s that is connected to all other vertices by zero-weight edges
- Use Bellman-Ford only once to compute the shortest paths from s to all other vertices
- Reweight using the absolute value of distance from s as the height (guarantees that all edges are non-negative)
- Use Dijkstra from each starting vertex in the reweighted graph
- BF takes O(mn), reweighting takes linear time, Dijkstra is run n times; $O(mn + n^2 \text{log } n)$ time
Suurballe’s Algorithm
- Input: A weighted directed graph and two endpoint vertices, s and t
- Output: Find two paths from s to t such that the paths don’t share edges and minimize the sum of the length of both
- Can find two shortest paths by finding one shortest path, then adding “reverse” edges with negative weights from the first shortest path
- If both shortest paths take the same edge, then the edge will be cancelled out
- Use Dijkstra to find single-source paths and distances from s
- Reweight the edges if they belong to a shortest path to equal 0
- Find shortest path P from s to t and reverse its edges
- Find a new shortest path in the modified graph
- Remove any edges visited in both paths and reconnect the paths
- Two runs of Dijkstra, linear time to reweight and reverse; total runtime of O(m + log n)
A*
- Want to traverse paths to destination with Dijkstra without going down bad paths
- Idea: choose a height function for each vertex v that is easily to calculate and accurately estimates the distance from v to t
- Must be an underestimate so paths to t eventually go downhill
- Can’t decrease too quickly; height between two vertices differs by at most the length of the edge between them
- AKA needs to be admissible
- Reweight the graph by height function
- Edges will be non-negative because of our height function
- Edges with smaller estimates appear shorter
- Run Dijkstra on this reweighted graph
- Same runtime as Dijkstra, but with linear time added to reweight
Widest Paths
- Input: A graph with weighted edges
- Output: Path from start to end with the maximum minimum weight (i.e. the path with the highest throughput, as given by the minimum edge weight on the path)
Linear Algorithm
- For undirected graphs: if there is a single source and a single destination, the path can be found in linear time
- Find median edge weight
- If a path can be made using heavier-than-median edges, get rid of lighter edges and recurse
- If a path cannot be made in this fashion, collapse connected components of wide edges into “supervertices” and recurse; combine edges
- Linear time because $\Sigma \frac{m}{2^{i+1}} = O(m)$
- For directed graphs: we can use an algorithm that divides edges into subsets and runs linear time Dijkstra on these subsets in order to get a runtime of O(m log*n)
Dijkstra
- We can modify the relax algorithm to prioritize widths of paths
def relax(u, v):
if v in Q and min(D[u], length(uv)) > D[v]:
D[v] = min(D[u], length(uv))
P[v] = u
- Same runtime as regular Dijkstra’s: O(m + n log n)
- If edges are already sorted, then this algorithm runs in linear time using an integer priority queue
All-Pairs Directed Widest Paths
- Sort all the edges before finding paths (O(m log m))
- Run linear Dijkstra on each vertex (O(mn))
- Total runtime: O(mn + m log m)
- Can also use fast matrix multiplication if the number of edges is large in $O(n^{2.688})$ time
Applications: Voting Systems
- Instant runoff voting: Eliminate the candidate with the fewest amount of voters who have them as their favorite until a candidate has a majority
- Borda count: With c candidates, give candidates points based on their positioning in a voter’s preference ordering (first = c-1, second = c-2, last = 0); candidate with the most points wins
- Desirable properties
- Majority: Candidate with a majority of first-place votes wins
- Weak majority: Candidate with all first-place votes wins
- Symmetry: All voters are treated equally
- Weak Symmetry: There is no super-voter whose preference decides the election
- No spoilers: If candidate X does not win the election, then removing X does not change the outcome (think third-parties)
- Impossible to satisfy weak symmetry, weak majority, and no spoilers
Condorcet Property / Schulze Method
-
Condorcet Property: If there is a candidate X who wins every head-to-head comparison, then X should win the election
- If a cycle exists, then there is no apparent winner (i.e. A beats B, B beats C, C beats A)
- Generate a graph where an edge from candidate U to V is weighted with the number of voters who prefer U to V
- Relabel each edge X to Y with the width of the widest path from X to Y in the original graph
- There is guaranteed to be a winner in this graph
- With randomized quickselect, this method takes $O(candidates^2 \cdot \log candidates)$
Euler Tours
- Defined as a walk through a graph such that each edge is visited once
- Note that, for any finite graph, there should be an even number of odd-degree vertices, as the sum of degrees should be even
- A graph with no odd vertices has an Euler tour that starts and ends at the same vertex
- A graph with two odd vertices has an Euler tour that starts at one of the odd vertices and ends at the other
Fleury’s Algortihm
- Greedy algorithm: Keep adding edges without adding an edge that connects two uncovered parts of the graph
- Basic implementation is $O(m^2)$; check all edges when adding each edge
- Advanced implementations have better times, but still not linear
Hierholzer’s Algorithm
- Keep adding edges without checking whether or not we add an edge that connects to two uncovered parts of the graph
- After getting stuck, return to a vertex with uncovered edges and start a new tour
- Splice the original tour with the new tour to create an Euler tour
- Linear time complexity; each edge is visited once in either the original or a offshoot tour
Traveling Salesman Problem
- Input: A connected undirected graph with nonzero positive edge lengths
- Goal: Find a walk that visits each vertex at least once and returns to the start index, minimizing the sum of edge lengths
- Can also be written as a distance problem
- Input: A symmetric positive matrix of distances between points that satisfies the triangle inequality
- Goal: Find a cylcic order of points that uses each point exactly once, minimizing the sum of distances between points
- These two representations can be converted to each other
- TSP is NP-hard, and even with unit edge lengths, it is NP-complete
- Approximation ratio of 2 can be found with MSTs; theoretical best AR is $\frac{123}{122}$, but the best AR found is $\frac{3}{2} - \frac{1}{10^{36}}$
- These approximations run in polynomial time
2-Approximation
- Use the graph version so repeated vertices are allowed
- Find a MST and make a copy of each edge such that all degrees are now even
- Take an Euler tour of this graph and return it as the approximate TSP tour
3/2-Approximation
- Construct an MST
- Identify the odd vertices in the MST
- The number of odd vertices will be even by the handshaking lemma
- Connect them in pairs using the shortest paths from the original graph, choosing the pairings that have the minimal sum of path lengths
- Combine the pair paths and the MST edges into one graph; return the Euler tour as the TSP tour
Exact DP Algorithm
- Basic algorithm takes O(n!) time (try all permutations)
- Dynamic programming algorithm takes $O(2^nn^2)$ time
- Preprocess the graph to get the all pairs shortest paths distance matrix
- Choose some starting vertex s
- Compute, for every subset of A of remaining vertices, and every vertex in A, the best path from s to v covering all vertices in A
- Choose v such that the path through all vertices to v + one more path back from v towards s has the minimum total length
Algorithm to compute the third step:
L[A, v] is a table containing the length of the shortest tour starting at s, ending at v, covering all vertices in A
V(G) is the set of all vertices
for subset A in 1, 2, ..., 2^(n-1) - 1:
for v in A:
if A - v is empty:
L[A, v] = D[n - 1, v]
else:
L[A, v] = min{L[A - v, u] + D[u, v] for u in A - v}
return min{L[V(G) - s, v] + D[v, n - 1] for v in V(G) - s}
Cliques and Coloring
Centrality
-
Centrality: A measure of how central (AKA how well-connected) a node is
- There exist various measures of centrality, such as degree and pagerank
-
Closeness centrality: $\frac{1}{\text{average distance to other vertices}}$
- Closeness centrality can be calculated by computing the all-pairs shortest paths and averaging the distances; O(mn)
- Approximate algorithm exists where k vertices are chosen and we calculate the distances to these k vertices in O(km)
- Graphs with the small world property (all distances ≤ D) leads to lower variance; setting k to $D \log n / \varepsilon^2$ gives a $(1+\varepsilon)$-approximation in time $O(\frac{m}{\varepsilon^2} \log n)$
-
Betweenness centrality attempts to find a vertex that is on many shortest paths
- Formula: $\sum_{u, w\neq v} \frac{\text{No. of shortest paths from u to w through v}}{\text{No. of shortest paths from u to w}}$
- Optimal algorithm takes O(mn) time
Cores
- A k-core of a graph is the largest subgraph in which all vertices have a degree ≥ k
- Can be found by deleting vertices of degree < k until there are none left
Algorithm for computing all cores simultaneously:
def cores(G):
core = dict()
current_core = 0
while G is non-empty:
v = a minimum-degree vertex
current_core = max(current_core, degree(v))
core[v] = current_core
delete V from G
return core
- Can be made efficient by decorating each vertex with the number of uncovered neighbors (intially its degree) and updating + maintaining a dictionary with degrees as keys and lists of unprocessed nodes as values
- Takes O(sum of all degrees) = O(m) time
Degeneracy
- Many definitions; one is the largest k for which a nonempty k-core exists
Clique
- A clique is a subset of a graph for which all pairs are connected by an edge
- Maximum clique is the biggest clique in the graph; a maximal clique is a clique to which no more vertices can be added while maintaining the clique property
- Independent set problem and clique problem are equivalent; add edges between vertices that are unconnected, and remove existing edges
- NP-hard to find the maximum clique
- There exist various algorithms to find all maximal cliques
- The Moon-Moser Theorem states that the maximum number of maximal cliques is $\leq 3^{n/3}$
Bron-Kerbosch Algorithm
- Uses recursive backtracking with three sets
- R is the set of vertices containing the current clique being built
- P is the set of potential new vertices (that are adjacent to ones in R)
- X is the set of vertices that have already been added to a clique and should not be explored
- Optimizations can be made
- Choose a pivot $p \in P \cup X$ with the smallest number of non-neighbors in P and only loop over these neighbors
- p’s neighbors will also be in a found clique, so the only vertices that can be added to R are its non-neighbors
- Use the reverse degeneracy ordering so we don’t have to loop over so many vertices
- Choose a pivot $p \in P \cup X$ with the smallest number of non-neighbors in P and only loop over these neighbors
- Overall run time is $O(dn3^{d/3})$
Graph Coloring
- Input: A graph
- Output: A coloring of vertices in the graph such that no neighboring vertices have the same color
- Four color theorem: If a planar graph can be drawn without crossing edges, then it can be colored with at most four colors
- Two color graphs are bipartite and are easy to test for, but testing for a 3-coloring is NP-complete
Greedy Algorithm
- Order the vertices
- Number the available colors (1, 2, 3, …)
- For each vertex in the ordering, give it the lowest-numbered color that is not used by one (or more) of its neighbor
- O(degree) per node, O(m) total runtime
- If vertices are ordered in degeneracy ordering, then the greedy algorithm will use at most d + 1 colors
def first_unused(v):
used = { color(w) for w in neighbors of v }
for c in 1, 2, ...
if c not in used: return c
- This algorithm is optimal for interval graphs (using a left-to-right ordering)
Register Allocation and Strahler Numbers
- Problem: Calculate an expression using the minimum number of registers at any given time
- If two expressions A and B require a and b registers respectively, then we will need to store the answer to one of them while doing the other
- If A is calculated first, then max(a, b+1) registers are use; otherwise, max(b, a+1) registers are used
- Calculate intensive expressions first to save the +1 constant
- Bottom-up calculation of optimal ordering
- If a node x is a leaf, then the number of calculations R(x) is 1
- If x has two children y and z, then evaulate the one with the larger subtree first, so R(x) = max(R(y), R(z))
- If R(y) = R(z), then R(x) = R(y) + 1 = R(z) + 1
- This bottom-up calculation also produces the Strahler numbers of nodes in a tree
- This problem can be written as a graph coloring problem
- Vertices are the local variables of the intermediate code
- Edges are local variables that need to be stored at the same time
- Colors are available registers
Chordal Graphs
- A graph is a chordal graph if and only if one of the following conditions are satisfied:
- Every cycle longer than three vertices has a chord, or another edge connecting two vertices of the cycle
- It has an elimination ordering: An ordering of the vertices where the later neighbors of every vertex form a clique
- For every two vertices x and y and every minimal subset S whose deletion would separate x from y, S is a clique
- Definition 1 is analagous to a graph with no holes, where a hole is a cycle of at least length 4 that does not have a chord
Elimination Ordering Algorithm (Lexicographic BFS)
- If G has an elimination ordering, so does G - v
- Slow, basic algorithm: find vertices whose neighbors form a clique and delete it
- We can check validity of an elimination order in linear time; for each vertex v, check that all later neighbors of v are neighbors of w (the neighbor of v that is next in the ordering)
- This can be done using BFS
- Lexicographic ordering: Given an ordering of vertices, define $pred_i(v)$ to be the ith neighbor of v or v itself if there are fewer than i neighbors earlier than v; order these vertices based on the tuple ${pred_1(v), pred_2(v), …, pred_n(v)}$
- Algorithm for finding lexicographic order
- Maintain a sequence of sets of unprocessed vertices, initially in one big set
- While this sequence is nonempty:
- Let $S_i$ be the first set in the sequence and v be any vertex in this set
- Remove v from $S_i$ (and remove $S_i$ if it becomes empty)
- Output v; let N(v) be the set of neighbors of v
- For each set $S_j$ in the sequence, split $S_j$ into two sets $S_j \cap N(v)$ and $S_j \backslash N(v)$
- We can use dynamic sets, linked lists, and dictionaries to get $O(\vert N(v)\vert)$ time
Perfect Graphs
- A graph is perfect if the optimal number of colors equals the number of vertices in the largest clique
- Deleting any vertex in this graph will produce another perfect graph
- This property is true for interval, chordal, and bipartite graphs
- Deleting any vertex in this graph will produce another perfect graph
-
Strong Perfect Graph Theorem: A graph G is perfect if and only if it has no odd-length holes AND its complement has no odd-length holes
- Implies the “weak perfect graph theorem” that complement of perfect graphs are perfect
- Perfect graphs can be found in polynomial time ($O(n^9)$)
- An optimal method for coloring and finding cliques in perfect graphs takes polynomial time but is complex
- Interval graphs $\in$ chordal graphs $\in$ perfect graphs
Interval Graphs and Path Decomposition
- An interval graph is defined as a graph with nodes representing intervals and edges representing overlaps of two intervals
- We can rewrite an interval graph using bags where each interval segment is labeled by the intervals that overlap during it
- Get rid of bags that are not subsets of larger bags
- The intervals can be recovered from the path decomposition
- This is known as the interval graph path decomposition
- Properties of the interval graph path decomposition
- Each vertex belongs to a consecutive sequence of bags
- Edges are pairs of vertices that both appear in the same bag
- The largest clique is the biggest bag
- Each edge has both its endpoints in some bag
-
Path decomposition: get rid of the intervals and use a sequence of bags with the following properties
- Each vertex appears in a consecutive subsequence of bags
- Every edge has ≥ 1 bag containing both endpoints
- One graph can have many different path decompositions
- The width of a decomposition is the size of the biggest bag minus 1
- The pathwidth of a graph is the smallest width of any decomposition
- The pathwidth of a graph can also be defined as the smallest possible max clique size (minus one) of an interval graph containing G
- Vertices of a clique have overlapping subsequences of bags
- Some bag contains the whole clique
Algorithms
Fast clique finding with a low-width path decomposition:
for each bag in the path decomposition:
for each subset of the bag:
if it's a clique bigger than the best found so far
remember it as max clique
- Outer loop is O(n), inner loop is $O(2^w)$, so $O(n2^w)$ time
Fast coloring:
- We can use a dynamic programming algorithm to color a graph fast
for each bag B in left-to-right order, and each coloring C with ≤ k colors:
let B' be the bag just before B
Good(B, C) = true if C is compatible with a good coloring of B'
- G has a k-coloring if and only if its last bad has a good coloring
- Time takes $O(nk^{w+1})$
Trees and Pathwidths
- Trees are easy to find cliques and easy to color but can have large pathwidth - must be a way to decompose them
-
Tree decomposition involves making a tree of bags instead of a sequence of bags
- Every vertex must belong to a connected subtree of bags
- Every edge must have both endpoints in at least one bag
-
Treewidth: the smallest width of any tree decomposition, or the smallest possible max clique size (minus one) of a chordal graph that contains G
- The tree decomposition can be used in the algorithms that use path decompositions but will work better because more graphs have smaller treewidth
FPT
- Fixed parameter tractable: Time is polynomial n with a fixed exponent, but may depend badly on width
- Courcelle’s theorem: There exists an FPT algorithm for any existence or min-weight/max-weight optimization problem on classes of graphs that can be described as a logical formula involving vertices, edges, sets of vertices and edges, and adjacency
Flow
- Flow can be used in many applications; playoff elimination, traffic engineering, etc.
-
Maximum flow is the question of determining the maximum flow out of a starting vertex s into an ending vertex t
- The flow into and out of a vertex should be equal (aside from the start and end vertices)
- Properties of maximum flow
- Can be converted to a linear program and solved; polynomial time
- A maximum flow will always exist, even with irrational numbers
- Integrality: If all the capacities are integers, then there exists a maximum flow in which all flow amounts are integers
- The max flow problem is equivalent to the minimum cut problem
- Min-cut: find the minimum sum of capacities flowing from one subset of the graph to another, where subsets are found by “cutting” the graph in the plane
- For every flow network, the maximum flow is equal to the minimum cut
- For every flow, either there is an equal cut or we can increase the flow; max flow ≥ min cut
Greedy Algorithm
- Start with a flow that is not maximum (all-zero flow)
- If we ever get stuck, we can find a cut with a capacity equal to the flow amount
- Since max flow ≤ min cut ≤ this new cut, and our flow equals our cut, then we have found the maximum flow
Augmenting Path
- Residual graph: A graph with backwards edges containing the capacities of the edge in the current graph minus the flow of the incoming vertex
- While the residual graph has a path $P$ from $s$ to $t$:
- Find the width $w$ of $P$
- Increase flow along $P$ by $w$ units
- Recompute the residual graph
- When the algorithm terminates, it finds a minimum cut (and a maximum flow)
- Stops when there are no residual paths from s to t
- Form a cut of the graph into two subsets $S$ and $T$
- $S$ consists of everything that can be reached from $s$ by a residual path; $T$ is everything else
Matching
-
Bipartite graph: A graph where each edge has a red and blue vertex for its endpoints
- Can test for bipartiteness in linear time
- Matching: A set of edges that don’t touch each other (no two matched edges share a vertex)
- Independent set: A set of vertices that don’t touch each other (no two vertices share an edge)
- Konig’s theorem: Let $M$ be the number of edges in a maximum matching and $I$ be the number of vertices in the maximum independent set; in bipartite graphs, $M+I=n$
- A bipartite maximum matching can be written as a flow problem
- Add vertices s connected to all red vertices and t connected to all blue vertices
- Direct edges $s \rightarrow red \rightarrow blue \rightarrow t$ with capacity 1
- Find the integer maximum flow
- If the found flow is not maximum, then the augmented path will be the maximum flow; guaranteed to find the maximum matching
- Time bounds: $O(mn)$ for basic algorithm, $O(m\sqrt{n})$ for Hopcroft-Karp-Karzanov, $O(m^{1+\varepsilon})$ for any $\varepsilon > 0$ (Chen)
Complete Bipartite Matching
-
Complete bipartite graph: All edges that go from one independent set to another are included
- $K_{a,b}$ refers to a complete bipartite graph with $a$ vertices on one side, $b$ on the other
- Balanced means that the graph has the same number on each side: $K_{n/2,n/2}$
-
Perfect matching: A matching that matches every vertex
- In a balanced complete bipartite graph, every matching can be turned into a perfect matching
- Matchings can be described a permutations
-
Assignment problem: Find a minimum-weight maximum matching in a weighted bipartite graph
- In many applications, the graph is complete bipartite and maximum matchings are perfect matchings
- Can turn the question into: Find a minimum-weight perfect matching in a weighted bipartite graph
-
Hungarian algorithm: Solves assignment problem
- Simple steps: Start from an empty matching, then find a minimum-weight alternating path $n/2$ times
- The weight of an alternating path is how much it increases the weight of a matching
- Complex steps: Adjust the weight (original weight - heights of both endpoints) of each edge such that they’re all non-negative, use Dijkstra’s to find the shortest alternating paths
- Runtime: $O(nm + n^2)$ (run Dijkstra’s n/2 times)
Full algorithm:
-
Initialize heights to make adjusted weights ≥ 0
-
Repeat n/2 times:
- Add an artificial start vertex s, with edges of weight 0 to all unmatched red vertices, direct all unmatched edges red-to-blue and all matched edges blue-to-red
- Use Dijkstra’s to find the adjusted distances from s to all other vertices, including the shortest alternating path
- Adjust heights: subtract distance at red vertices, add distance at blue
- Use the shortest alternating path to increase the size of matching