Day27_graphs.tex
Graphs come up all over computer science. They're models of social networks (who knows whom; who emailed whom), maps (which roads intersect, where do highways go), and call graphs (which function calls which).
Note that "graphs" in computer science typically mean things that have nodes and edges, not bar charts or pie graphs or histograms or other graphs that people use to summarize statistics.
Graphs made up of nodes and edges.
A node or vertex or point is a thing on graph.
An edge or arrow or link or transition connects two nodes. Edges are relationships between things.
In a directed graph a/k/a a digraph, edges are interpreted directionally, i.e., a node $a$ may relate to a node $b$, but $b$ may not relate back to $a$.
We can model a directed graph $G$ as a pair of sets, $V$ and $E$, where $V$ is a set of nodes and $E \subseteq V \times V$.
That is, in a directed graph, edges are merely a binary relation on nodes.
Note that we could have a graph with no edges (i.e., $E = \emptyset$); we could also have a complete graph, where $E = V \times V$, i.e., every node is related to every other node.
In a directed graph, edges $E$ are a binary relation on vertices $V$. So you can have an edge from $a$ to $b$ (i.e., $(a,b) \in E$) and one from $b$ to $a$ (i.e., $(b,a) \in E$), but that's it: you can't have two different edges from $a$ to $b$ or back.
By making edges into a relation, i.e., a set of pairs, there can be no multiple edges in directed graphs. People do care about graphs with multiple edges between nodes exist; they're called multi-graphs. We won't talk about them more.
If a node has an edge to itself, we call that edge a self loop. A self loop on a node $a$ corresponds to $(a,a) \in E$. Sometimes people only want to consider graphs without self loops; if so, they'll specify.
Undirected graphs don't care about edge direction: if $a$ is related to $b$, then $b$ is related to $a$. It's possible to model undirected graphs as a relation $E \subseteq V \times V$ where $E$ is symmetric.
An undirected graph with no self loops is called a simple graph. We can define them formally by saying that $E$ is symmetric but irreflexive or antireflexive, i.e., $(a,a) \not\in E$.
Every directed graph has an underlying undirected one: just erase the arrows on the edges, i.e., symmetrize the relations.
The degree of a node is the number of edges going in or out. People also talk about in degree or out degree to count the number of edges going into or out of a node, respectively.
Formally:
\[ \newcommand{\deg}{\operatorname{deg}} \newcommand{\din}{\deg_\mathsf{in}} \newcommand{\dout}{\deg_\mathsf{out}} \din(v) = |\{ v' \mid (v',v) \in E \}| \hspace{1.5em} \dout(v) = |\{ v' \mid (v,v') \in E \}| \hspace{1.5em} \deg(v) = \din(v) + \dout(v) \]
People will also write $\delta(v)$ to mean $\deg(v)$, and $\delta^+(v)$ to mean $\dout(v)$ and $\delta^-(v)$ to mean $\din(v)$.
A path in a graph is a trace of edges that connects, i.e., zero or more edges $(v_1, v_2), (v_2, v_3), \dots, (v_{n-1}, v_n)$ is a path from $v_1$ to $v_n$. People might also write the path in terms of the nodes it traverses, i.e., $v_1 v_2 v_3 \dots v_{n-1} v_n$. A path is simple if no node occurs more than once in such a list.
There is such a thing as the empty path, which---depending on who you're talking to---may differ from a path that never leaves a given node.
A node $u$ is reachable from $v$ if there exists a path from $u$ to $v$. It's a theorem that if $u$ is reachable from $v$, then there exists a simple path from $v$ to $u$. People say that every node is reachable from itself.
A circuit or a cycle is a path that ends where it starts, i.e., the whole path is a loop. Circuits are never simple paths. There is, however, the concept of a simple circuit, which is a circuit that doesn't repeat any node other than its first/last one.
Note that you can have loops in other places on a path, but to be a circuit or cycle you need the whole path to start and end at the same place. A path with a loop on the end is sometimes called a lasso.
People care about a variety of special graphs.
Given a relation $R \subseteq A \times A$, we can interpret $R$ as a directed graph where elements of $A$ are the nodes and $R$ gives us the edges.
One might also restrict the nodes to those $A$ that ever get mentioned, saying that the vertices of the graph are $V = \{ a \in A \mid \exists b \in A, (a,b) \in R \vee (b,a) \in R \}$. Note that $R \subseteq V \times V$ by definition.
Using graphs to think about relations yields nice insights, e.g., reflexive relations have self-loops at every node in the graph.
An graph is connected (or weakly connected) when every node is reachable from every other node, ignoring direction. A directed graph is connected iff its underlying undirected graph is connected.
A directed graph is strongly connected when every node is reachable from every other node, respecting direction.
The weak/strong distinction only matters for directed graphs.
We say a graph is bipartite when (a) we can partition its vertices $V$ into two disjoint sets $V_1$ and $V_2$ two such that $V = V_1 \cup V_2$ and $V_1 \cap V_2 = \emptyset$, and (b) edges only go from one partition to the other, i.e., $E \subseteq (V_1 \times V_2) \cup (V_2 \times V_1)$.
To put it another way: you should be able to divide the verticies into two teams, with relationships happening only between the teams, never within them.
One common use of bipartite graphs is to model graphs on heterogeneous verticies, i.e., let $V_1 = \texttt{list}(\mathbb{N})$ and $V_2 = \mathbb{N}$, and say $E = { (l, \texttt{length}(l)) }$.
Another common use of bipartite graphs is for matching, i.e., pairing up members of one group with another.
A graph is complete when every vertex is connected to every other vertex. These graphs are named $K_i$ as the graph with $i$ nodes and edges between every node.
People are typically using simple graphs when they talk about complete graphs.
There are also complete bipartite graphs $K_{i,j}$ where $V_1$ has $i$ nodes and $V_2$ has $j$ nodes, and every node from $V_1$ hits every node from $V_2$.
A planar graph is a graph that can be drawn with no edge crossings, i.e., a graph that can be drawn on the plane.
Planar graphs are good models of subdivisions of a map into, e.g., states, provinces, countries, ecological zones.
A graph is a tree if there are no undirected circuits in the graph.
We've already seen trees: every inductive data type is a tree, which may explain some of our early notations for drawing data types.
One may designate a certain vertex as the root of a tree. A tree is $m$-ary if nodes have no more than $m$ children. A tree is a full $m$-ary tree if every non-leaf node has exactly $m$ children.
A directed acyclic graph a/k/a a DAG is a directed graph where there are no directed circuits in the graph. DAGs are a very useful modeling tool, as they capture the idea of a "dependency chart". For example, we might have the following recipe for peanut butter and jelly:
Steps 1, 2, and 3 depend on step 0. Steps 2 and 3 depend on step 1. Step 4 depends on both 2 and 3, and step 5 depends on step 4. Step 6 depends on step 5 (or maybe step 4, depending on your PBJ persuasion).
We can model this as a DAG, where arrows all point downwards:
Notice that this graph isn't a tree, since 0-1-2-0 forms an undirected circuit. But it is acyclic if you consider edge directions. Any ordering relation will have a DAG as its relational graph.