EDA T8 — Graphs

Ilario Bonacina

This presentation is an adaptation of material from Antoni Lozano and Conrado Martinez

Agenda

Last theory class

  • Graphs
    • Intro/Notation
    • Representation
  • Depth First Search
  • Topological Ordering

Today

  • BFS
  • Dijkstra
  • Minimum Spanning Trees

Before we start: Questions on previous material?

Breadth-First Search (BFS)

BFS

In a Breadth-First Search (BFS) (cat: recorregut en amplada, esp: recorrido en anchura) of a graph G

  • we start from some vertex s
  • we visit all vertices in the connected component of s in increasing distance from s, layer by layer

BFS in words

  • When a vertex is visited, all its adjacent non-visited vertices are put into a queue of vertices yet to be visited.

Note

If d= the distance to s of the last visited vertex,

  • all vertices of G at distance < d from s have already been visited
  • the queue contains only vertices at distance d or d+1 and the ones at distance d in the queue precede the vertices at distance d+1.

BFS code

#include <queue>

list<int> bfs (const graph& G) {
  int n = G.size();
  list<int> L;
  queue<int> Q;
  vector<boolean> vis(n, false);
  for (int u = 0; u < n; ++u) {
    Q.push(u);
    while (not Q.empty()) {
      int v = Q.front(); 
      Q.pop();
      if (not vis[v]) {
        vis[v] = true; 
        L.push_back(v);
        for (int w : G[v]) {
          Q.push(w);
        } 
      } 
    } 
  }
  return L;
}
1
bfs return the list of visited vertices
2
bfs is the same as an iterative depth-first search, but using a queue instead of a stack (see last class)
3
here we assume the graph is implemented with an adjacency list
Cost of BFS
  • each vertex is added and removed from the queue Q exactly once
  • for each vertex we go through its adjacent vertices and push them into the queue, if not visited

Hence the total cost of BFS is

\begin{align*} \text{BFS-cost}&=\sum_v(\Theta(1)+\Theta(\deg(v))) \\ &=\Theta(|V|)+\Theta(|E|) \end{align*}

Visualization of a BFS

Jutge exercises on BFS

6 degrees of separation

Let G be a connected graph

The diameter of G is the maximal distance between a pair of vertices.

The center of G is the vertex such that the maximal distance to any other vertex is minimal.

We can compute the diameter using a BFS starting from each vertex in G. The cost is \Theta(n(n+m)) = \Theta(nm), since the graph is connected m\geq n−1.

Many graphs/networks arising in many areas, specially in social sciences are large, but their diameter is small. \to 6-degrees of separation

Erdős number
Kevin Bacon number

Shortest Paths

Weighted graphs

A weighted directed graph G= (V,E) is a directed graph where the weights are given by a function w\colon E\to \mathbb R.

We assume the weights to be always positive.

Examples of weighted graphs?



Given two vertices u,v in G, their distance1 d_G(u,v) in the weighted directed graph G is the minimum weight of a path from u to v, where the weight of a path is taken as the sum of the weights of the edges in it.



Weighted shortest path

Input
a weighted directed graph G and two vertices vertex s,t in it
Output
a shortest path from s to t

Why do we assume the weights to be non-negative?

If weights were allowed to be negative then the directed graph can contain negative weight cycles, and then shortest paths are not well defined.



Dijkstra’s algorithm (1959) solves this problem finding all shortest paths from s to all other vertices in G.

Dijkstra’s algorithm

Dijkstra’s algorithm works iteratively.

The input is a weighted directed graph G=(V,E) and a vertex s.

For the moment we focus on returning just the minimum distance from s to any other vertex.

The algorithm will return a vector D, where D[u] is the minimum distance of u from the starting point s.

On each iteration the set of vertices is partitioned in two parts:

  • the visited ones and
  • the candidates, those that we haven’t been visited yet.

We want to maintain the invariant that

  1. For all visited vertices u, D[u] is the shortest distance from s to u,
  2. For all candidate vertices u, D[u] is the weight of the shortest path from s to u which passes exclusively through visited vertices, i.e., only the last vertex in the path, u, is a candidate.

At each iteration we

  • select a candidate vertex u,
  • update D and
  • mark u as visited.

When all vertices have been visited, we have the desired answer.

Which candidate vertex do we select on each iteration?

Choosing as candidate u with minimal D[u] allows to maintain the invariant.

Why?

Let u be a candidate vertex with minimal D[u]. By the invariant, D[u] is minimal among all paths from s to u passing only through visited vertices.

Now we mark u as visited, so D[u] must be the smallest also among all paths from s to u.

If there were a shortest path from s to u going through a candidate vertex x, the weight of that path would be D[x] + d_G(x,u) < D[u]. Since d_G(x,u) \geq 0 we would have D[x] < D[u], and that’s a contradiction since by assumption D[u] is minimal among all candidate vertices.

We need now to consider how to maintain the second part of the invariant. Consider a candidate vertex v. Since u becomes a visited vertex in the current iteration, we may have a new shortest path from s to v going only through visited vertices that now include u. A simple reasoning shows that if such shortest path includes u then u must be the immediate predecessor of v in the path (assuming the contrary leads to a contradiction).

It follows that D[v] can change if and only if v is a successor of u; in particular, if D[v] >D[u] + w\ , where w is the weight of the edge (u,v), then D[v] must be updated D[v] := D[u] + w. The condition above is never true if v has already been visited.

See a run of Dijkstra’s algorithm!

How to get the paths?

To compute the shortest paths (not just their weights) we can use the following key observation:

  • if the shortest path from s to u goes through x then the subsequence going from s to x must be a shortest path from s to x.

We can hence compute an (implicit) shortest paths tree.

#include<queue>
#include<limits.h>

using vertex = int;
using graph = vector<vector<pair<vertex,double>>>;
using P = pair<double,vertex>;
using Pvector = pair<vector<int>,vector<int>>

Pvector dijkstra(const graph& G, vertex s) {
  int n = G.size();
  d = vector<double>(n, INT_MAX);
  d[s] = 0;
  p = vector<int>(n, -1);
  vector<boolean> visited(n, false);
  priority_queue<P, vector<P>, greater<P>> Q;
  Q.push({0, s});
  while (not Q.empty()) {
    int u = Q.top().second; 
    Q.pop();
    if (not visited[u]) {
      visited[u] = true;
      for (auto a : G[u]) {
        vertex v = a.first;
        double c = a.second;
        if (d[v] > d[u] + c) {
          d[v] = d[u] + c;
          p[v] = u;
          Q.push({d[v], v});
        } 
      } 
    } 
  } 
  return {d,p};
}
1
for the adjacency list of a weighted directed graph
2
distance, vertex
3
d stores all the distances from s
4
p stores implicitly a shortest path tree
  • Why in a P the weight comes first?
  • What would happen if we used priority_queue<P> Q;?
  • When we do Q.push({d[v],v}); the element {k,v} (for some k) was already in the priority queue, why there is no need to remove it?

Cost of Dijkstra’s Algorithm

If all operations in the priority queue have cost \mathcal O(\log n), then the cost of dijkstra is

\begin{align*} \text{cost}(n,m)&=\sum_{v\in V}(\mathcal O(\log n) + (1+\textsf{out-degree}(v))) \\ &=\mathcal O(n\log n)+\mathcal O(\log n)\sum_{v\in V}\textsf{out-degree}(v)\\ &=\mathcal O(n\log n)+\mathcal O(m\log n)=\mathcal O((n+m)\log n)\ . \end{align*}

The cost of the initializations is \mathcal O(n\log n). Then the total cost is \mathcal O((n+m)\log n).

The worst-case is actually \Theta((n+m)\log n) but the cost in practice is usually smaller.

Jutge

Minimum Spanning Trees

Given a weighted undirected connected graph G= (V,E), with weights w: E \to \mathbb R, a minimum spanning tree (MST) (cat: arbre d’expansió mínim) of G is a spanning tree T with total weight w(T)=\sum_{e\in E(T)}w(e) minimum among all possible spanning trees of G.

There are many different algorithms to compute MSTs.

All them follow the following greedy scheme.



A \gets \emptyset \qquad(A are the edges in the tree so far)
C \gets E \qquad(C are the candidates edges to add to the tree)
while |A|\neq |V(G)|−1 do
\qquad Select an edge e\in C that does not close a cycle in T=(V,A)
\qquad A\gets A\cup\{e\}
\qquad C \gets C - \{e\}
end while

Definition 1 (Promising set of edges) A subset of edges A\subseteq E(G) is promising if and only if

  1. A does not contains a cycle
  2. A is a subset of the edges of some MST of G

Definition 2 (Cut in a Graph) A cut in a graph G is a partition of the set of vertices V in the two subsets C, and its complement V-\{C\}. An edge e respects a cut if both ends of e belong to the same set, otherwise, we say that e crosses the cut.

MST Theorem

Theorem 1 (MST Theorem) Let A be a promising set of edges that respects some cut of G. Let e be an edge of minimum weight among those crossing the cut. Then A\cup\{e\} is promising.

Proof

Let A' be the set of edges of some MST T' such that A\subseteq A'(this is well defined, as A is assumed to be promising). As A respects the cut there must exist at least one edge e'\in A' crossing the cut, otherwise T' would not be connected.

If e' is of minimum weight then A\cup\{e'\} is promising and we are done.

Otherwise, if to T we add the edge e we get a cycle crossing the cut in e and say e''. Exchanging in T e'' for e we get a new spanning tree of smaller weight. Hence also a MST.

The MST theorem gives us a recipe to design MST algorithms:

  • start with an empty set of edges A;
  • for each iteration, define what is the cut of G, then select the edge with minimum weight among those crossing the cut and add the edge to A.

Since A respects the cut and e crosses it, the addition of e to A cannot create a cycle. Any algorithm following the scheme above is automatically guaranteed correct.

Jarník-Prim’s Algorithm

Jarník-Prim algorithm is a greedy algorithm that given an input a weighted undirected connected graph finds a minimum spanning tree in it.

In this algorithm we maintain a subset of visited vertices;

  • each iteration of the algorithm selects the edge of minimum weight that joins a visited vertex u with a non-visited vertex v.

Such edge cannot create a cycle, and it is added to the current set of edges for the spanning tree; furthermore vertex v is marked as visited.

Visualize Jarník-Prim’s algorithm

#include<vector>
#include<queue>

using edge = pair<int,int>;
using wedge = pair<double,edge>;//weighted edge
using graph = vector<list<wedge>>;

void Prim(const graph& G, list<edge>& MST, double& MST_weight) {
  int n = G.size();
  vector<bool> visited(n, false);
  priority_queue<wedge,vector<wedge>,greater<wedge>> Candidates;

  visited[0] = true;
  MST_weight = 0.0;

  for (wedge e : G[0]) Candidates.push(e);

  while (MST.size() < n-1) {
    int weight = Candidates.top().first;
    edge uv = Candidates.top().second;
    Candidates.pop();
    int v = uv.second;
    if (not visited[v]) {
      visited[v] = true;
      MST.push_back(uv); 
      MST_weight += weight;
      for (wedge e : G[e]) Candidates.push(e);
    }
  }
}

The cost of Jarník-Prim’s algorithm depends how do we implement the selection of the minimum weight edge. If we use a priority queue we get as total cost \Theta(m\log n).

Jutge

Recap pre-partial exam

April 3rd: partial exam on the following topics

  • asymptotic notation
  • cost of iterative and recursive algorithms
  • Master Theorems
    • additive recurrences
    • multiplicative recurrences
  • binary search
  • mathematical algorithms
    • Strassen
    • Karatsuba
  • sorting algorithms
    • selection-sort
    • insertion-sort
    • merge-sort
    • quick-sort (and quickselect)
  • Hash tables
  • Binary Search Trees
    • AVL trees

Questions?