Day25_relations.tex
We've defined relations like $\le$ in Coq... what are they like in mathematics?
The conventional account of relations is that they are sets: a unary relation or "property" is just the set of elements that satisfy the relation; a binary relation is a set of pairs; and an $n$-ary relation is a set of $n$-tuples.
So we might say, concretely, that unary relation $\mathsf{isPrime}$ is just the set ${ 2, 3, 5, 7, 11, \dots }$, while the binary relation $\ge$ is the set ${ (0,0), (1, 0), (2, 0), \dots, (1, 1), (2, 1), \dots }$. We've already seen ternary relations: say, the mystery relation $R$ from day 16.
Here's an example of a binary relation we can use throughout: $B \subseteq \mathsf{RPS} \times \mathsf{RPS}$ defines the "beats" relation for rock/paper/scissors:
\[ B = \{ (\mathsf{rock}, \mathsf{scissors}), (\mathsf{scissors}, \mathsf{paper}), (\mathsf{paper}, \mathsf{rock}) \} \]
For a unary relation $R$---a relation on one thing---we might write:
to say that the relation holds. For a binary relation $S$, we might write:
to say that $x$ is related to $y$ in $S$. The tuple membership notation using $\in$ is universally understood, but there are many different notations people use.
Concretely, we might say that $\mathsf{rock} \mathrel{B} \mathsf{scissors}$ or $(\mathsf{rock}, \mathsf{scissors}) \in B$ to say that $\mathsf{rock}$ beats $\mathsf{scissors}$.
To get nicer typesetting, use \mathrel{P}
to tell $\LaTeX$ to
interpret $P$ as a relation. Compare the spacing: x P y
gives
$x P y$ and x \mathrel{P} y
gives $x \mathrel{P} y$. You can even
use a macro, writing \newcommand{\P}{\mathrel{P}}
to define a
newcommand \P
. Check out \mathbin
(for binary operators,
like $+$) and \mathord
(to treat an operator or relation as an
ordinary symbol), too.
Each relation has a type; whenever you introduce a relation, you should give its type.
Unary relations are essentially subsets, i.e., $\mathsf{isPrime} \subseteq \mathbb{N}$ and $\mathsf{isPrime} \in \mathcal{P}(\mathbb{N})$ are two ways to give a type to $\mathsf{isPrime}$.
Binary (and on through higher arity) relations are subsets of a product set. So $\mathord{\ge} \subseteq \mathbb{N} \times \mathbb{N}$, and the mystery relation $R \subseteq \mathbb{N} \times \mathbb{N} \times \mathbb{N}$. Equivalently, we might say that $\mathord{\ge} \in \mathcal{P}(\mathbb{N} \times \mathbb{N})$ or that $R \in 2^{\mathbb{N} \times \mathbb{N} \times \mathbb{N}}$.
I gently prefer the $R \subseteq \dots$ notation to the various powerset notations.
In Coq, we use inductive definitions to define relations. How does
that work here? Let's recall the definition of sorted
:
Inductive sorted: list nat -> Prop :=
| sorted_nil : sorted []
| sorted_1 (x:nat) : sorted [x]
| sorted_cons (x y:nat) (l:list nat) (Hxy : x <= y) (H : sorted (y::l)) : sorted (x::y::l).
We can turn the inductive Coq definition into an inductively defined unary relation $\newcommand{\sorted}{\textsf{sorted}}\sorted \subseteq \texttt{list}(\mathbb{N})$ as follows:
sorted_nil
) $\newcommand{\nil}{\texttt{[]}}\nil \in \sorted$sorted_1
) $\texttt{[$x$]} \in \sorted$ for all $x \in \mathbb{N}$sorted_cons
) if $y::l \in \sorted$ and $x \le y$ then $x::y::l \in \sorted$Notice how the axioms sorted_nil
and sorted_1
just become
assertions about some members of the $\sorted$ relation, but the
inference rule sorted_cons
turns into an if/then statement about
membership.
We could also use function notation, writing:
sorted_nil
) $\sorted(\nil)$sorted_1
) $\sorted(\texttt{[$x$]})$ for all $x \in \mathbb{N}$sorted_cons
) if $\sorted(y::l)$ and $x \le y$ then $\sorted(x::y::l)$Or we could even just use English:
All three of these approaches are good, common ways to define relations.
What about a binary relation, like $\newcommand{\perm}{\textsf{permutation}}\perm$? The idea is the same, but the syntax changes to account for being a binary relation. First, recall the Coq definition:
Inductive Permutation {A : Type} : list A -> list A -> Prop :=
| perm_nil: Permutation [] []
| perm_skip (x:A) (l l':list A) (H : Permutation l l') : Permutation (x::l) (x::l')
| perm_swap (x y:A) (l:list A) : Permutation (y::x::l) (x::y::l)
| perm_trans (l1 l2 l3 : list A) (H12 : Permutation l1 l2) (H23 : Permutation l2 l3)
: Permutation l1 l3.
Here's a way to define it directly using tuples:
It's not uncommon for binary relations to use infix notation. We might define a relation $\newcommand{\P}{\mathrel{P}}\mathord{\P} \subseteq \texttt{list}(A) \times \texttt{list}(A)$ for any set $A$ as:
Here, we'd read $x \P y$ as "$x$ permutes $y$" or "$x$ is a permutation of $y$".
We've already seen several properties of relations---let's meet their formal, set-theoretic definitions.
For the first three of these properties, we'll talk about relations "on" a set, i.e., relations which relate a set to itself, i.e., $R \subseteq A \times A$.
A relation $R \subseteq A \times A$ is reflexive when it relates every element of $A$ to itself. Here are some examples of reflexive relations:
And here are some relations that aren't reflexive:
Formally, we say $R \subseteq A \times A$ is reflexive iff $\forall a \in A, (a,a) \in R$.
A relation $R \subseteq A \times A$ is symmetric when you can swap places. Here are some examples of symmetric relations:
And here are some relations that aren't symmetric:
Formally, we say $R \subseteq A \times A$ is symmetric iff $\forall a, b \in A, (a,b) \in R \Rightarrow (b,a) \in R$. Notice that we don't bother writing an if-and-only-if---one direction is enough, because $a$ and $b$ range over arbitrary elements from the set $A$.
A relation $R$ is antisymmetric when symmetry only occurs for equal things. That is, formally, $\forall a, b \in A, (a,b) \in A \wedge (b,a) \in A \Rightarrow a = b$. Partial orders like $\le$ and $\ge$ are classic examples of antisymmetric relations.
A relation $R \subseteq A \times A$ is transitive when you can combine hops. Here are some examples of transitive relations:
And here are some relations that aren't transitive:
Formally, we say $R \subseteq A \times A$ is transitive iff $\forall a, b, c \in A, (a,b) \in R \wedge (b,c) \in R \Rightarrow (a,c) \in R$.
Finally, a relation $R \subseteq A \times B$ is deterministic when everything is related to at most one thing. Here some examples of deterministic relations:
And here are some that are not deterministic:
Formally, we say that $R \subseteq A \times B$ is deterministic when: \[ \forall a \in A, b_1, b_2 \in B, (a, b_1) \in R \wedge (a, b_2) \in R \Rightarrow b_1 = b_2. \]
We can talk about higher arity relations being deterministic, too: we treat the last position specially. So $Q \subseteq A \times B \times C \times D$ is deterministic when $(a,b,c,d_1) \in R$ and $(a,b,c,d_2) \in R$ implies $d_1 = d_2$.
You might be surprised to see equality in the list of deterministic relations. How can equality be deterministic when $4 = 2 + 2$ but $4 = 3 + 1$?
The real question here is: what even is equality? Equality on the natural numbers is defined as:
\[ E \doteq \{ (n,n) \mid n \in \mathbb{N} \} \]
Where we say $n = m$ iff $(n,m) \in E$. (We write $\doteq$ in the definition to emphasize the difference between the defining equal sign and the symbol of the $=$ itself.)
If you wanted to talk about equality on some other set---say, the set of arithmetic expressions that evaluate to the same answer---then indeed that notion of equality is not deterministic.
Given a relation $R \subseteq A \times B$, we can talk about its inverse, written $R^{-1} \subseteq B \times A$, defined as $R^{-1} = \{ (b,a) \mid (a,b) \in R \}$. To put it another way:
\[ a \mathrel{R} b \Leftrightarrow b \mathrel{R^{-1}} a \]
Concretely, the inverse of $\le$ is $\ge$, i.e., $a \ge b$ iff $b \le a$. The inverse of our $B$ relation above is:
\[ B^{-1} = \{ (\mathsf{scissors},\mathsf{rock}), (\mathsf{paper},\mathsf{scissors}), (\mathsf{rock},\mathsf{paper}) \} \]
We might call $B^{-1}$ the "loses to" relation.
The notation here is an analogy to $x^{-1} = \frac{1}{x}$, i.e., the multiplicative inverse of $x$. We'll learn more about inverses when we look at inverses of functions.
Relations can be composed or chained together. Suppose we had the following two relations: $R \subseteq \mathsf{City} \times \mathsf{Dish}$ and $S \subseteq \mathsf{Dish} \times \mathsf{Ingredient}$. We define them as:
\[ \begin{array}{rcl} R &=& \{ (\mathsf{Vienna}, \mathsf{schnitzel}), (\mathsf{Philadelphia}, \mathsf{cheesesteak}), (\mathsf{Puebla}, \mathsf{al pastor}) \} \\ S &=& \{ (\mathsf{schnitzel}, \mathsf{veal}), (\mathsf{schnitzel}, \mathsf{pork}), \\ && \phantom{\{} (\mathsf{cheesesteak}, \mathsf{wiz}), (\mathsf{cheesesteak}, \mathsf{provolone}), (\mathsf{hargao}, \mathsf{shrimp}) \} \\ \end{array} \]
We can compose these two relations together to get a relation between cities and ingredients: think of it as using the dish as a transitive "hop", a mathematical layover flight. We'd compose them to find a new relation:
\[ T = \{ (\mathsf{Vienna}, \mathsf{veal}), (\mathsf{Vienna}, \mathsf{pork}), (\mathsf{Philadelphia}, \mathsf{wiz}), (\mathsf{Philadelphia}, \mathsf{provolone}) \} \]
We say that $T = R;S$ or $T = S \circ R$. These differing notations with differing orderings are indeed confusing. Here the types of $R$ and $S$ make it unambiguous---there's only one way to compose these two relations. If the types make it ambiguous, always check which way folks mean! I tend to prefer the $R;S$ notation, but we'll meet $\circ$ again.
Formally, we define composition as:
\[ R;S = \{ (a,c) \mid (a,b) \in R, (b, c) \in S \} \]
Use this definition to check for yourself that $T = R;S$. Notice that some things are left unrelated: $\mathsf{al pastor}$ isn't related to anything by $S$, and $\mathsf{hargao}$ isn't related to anything by $R$... so $\mathsf{Puebla}$ and $\mathsf{shrimp}$ are left out of $T$.
It's not uncommon to have a "starter" relation that one defines and then modifies in some way. Since relations are just sets, all of the set operations apply---union, intersection, difference, and restriction (a/k/a set builder notation, $\{(a,b) \in R \mid P(a, b)\}$) are all meaningful and useful things to do relations.
There are three so-called closure operations that take a relation and produce a similar relation that also enjoys a useful relational property.
Given a relation $R \subseteq A \times A$, we can make it reflexive by taking the reflexive closure, i.e., taking $R$ and adding in some tuples to make sure that our new relation is reflexive.
For example, the reflexive closure of $<$ is $\le$ and the reflexive closure of $>$ is $\ge$. The reflexive closure $\ne$ is the total relation---the relation that relates everything.
Formally, we define the reflexive closure of $R \subseteq A \times A$ as $R \cup \{ (a,a) \mid a \in A \}$.
People sometimes call $\Delta_A = \{ (a,a) \mid a \in A \}$ the diagonal relation on $A$. Why?
One way to visualize a relation $R \subseteq A \times A$ is as a (possibly infinite) matrix. Let the rows and columns correspond to elements of $A$ in some fixed order, and $i$th row and the $j$th column is $\top$ iff $(a_i, a_j) \in R$.
Here's how we'd write $B$ for $\mathsf{RPS}$:
\[ \begin{array}{c|ccc} & \mathsf{rock} & \mathsf{paper} & \mathsf{scissors} \\ \hline \mathsf{rock} & \bot & \bot & \top \\ \mathsf{paper} & \top & \bot & \bot \\ \mathsf{scissors} & \bot & \top & \bot \\ \end{array} \]
Notice that $\mathsf{paper}$ beats $\mathsf{rock}$, so the row for $\mathsf{paper}$ has $\top$ in the column for $\mathsf{rock}$.
Now, suppose we have some arbitrary set $A$. Here's $\Delta_A$:
\[ \begin{array}{c|cccc} & a_1 & a_2 & a_3 & \dots \\ \hline a_1 & \top & \bot & \bot & \dots \\ a_2 & \bot & \top & \bot & \dots \\ a_3 & \bot & \bot & \top & \dots \\ \vdots & \vdots & \vdots & \vdots & \ddots \\ \end{array} \]
The diagonal of the matrix for $\Delta_A$ is all $\top$ and everything else is $\bot$!
Question: if $R$ is already reflexive, does taking the reflexive closure produce the same relation or a different one?
Question: does it make sense to take the reflexive closure of a relation $R \subseteq A \times B$?
Given a relation $R \subseteq A \times A$, we can generate a similar relation that enjoys symmetry by adding in $R$'s inverse. For example, the symmetric closure of $\le$ is the total relation; the symmetric closure of $<$ is $\ne$.
Formally, we define the symmetric closure of a relation $R \subseteq A \times A$ as $R \cup R^{-1}$.
Question: how would you prove that the symmetric closure produces symmetric relations?
Question: what is the symmetric closure of $B \subseteq \mathsf{RPS} \times \mathsf{RPS}?$
Question: if $R$ is already symmetric, does the taking the symmetric closure produce the same relation or a different one?
Question: does it make sense to take the symmetric closure of a relation $R \subseteq A \times B$?
Finally, we can take a relation $R \subseteq A \times A$ and produce a similar relation that's transitive by iteratively composing $R$ with itself, i.e., taking $R \cup (R;R) \cup (R;R;R) \cup \dots$.
Let the relation $A = \{ (\top, \bot), (\bot, \top) \}$, i.e., $A
\subseteq \texttt{bool} \times \texttt{bool}$ is the relation
corresponding to negb
, boolean negation.
What is $A;A$? It's defined as $\{ (x,z) \mid (x,y) \in A \wedge (y,z) \in A \}$. On the one hand, we could say $x = \top$ and $y=\bot$ (first tuple) and $z=\top$ (second tuple). So $(\top, \top) \in A;A$. On the other hand, we could say $x = \bot$ and $y = \top$ (second tuple) and $z = bot$ (first tuple). So $(\bot, \bot) \in A;A$.
We've looked at all pairs of tuples in $A$, so $A;A = \{ (\top,\top), (\bot, \bot) \} = \Delta_{\texttt{bool}}$. To compute the transitive closure of $A$, we really need to compute $\bigcup_{1 \le i} A^i = A^1 \cup A^2 \cup A^3 \cup \dots = A \cup (A;A) \cup (A;A;A) \cup \dots$.
We've already got the first two parts of that infinite union: $A$ and $A;A$. $A \cup A;A = \{ (\top, \bot), (\bot, \top) \} \cup \{ (\top,\top), (\bot, \bot) \} = \texttt{bool} \times \texttt{bool}$. Okay, we can stop---we've saturated, there's no way to add anything more.
While we formally define transitive closure using an infinite union, there's a clean operational intuition for finding the transitive closure of some relation $R$. Pick a tuple $(a,b) \in R$. Find every tuple of the form $(b,c) \in R$ for some $c$, and now add $(a,c)$ to $R$. Keep this process up until you can't find any new tuples---and then you're done!
What we've just described is called a least fixed point operation. They turn up all over computer science.
For example, the transitive closure of the successor relation $N \subseteq \mathbb{N} \times \mathbb{N} = \{ (n, S(n)) \mid n \in \mathbb{N} \}$ is $<$. How come? Here's a matrix for $N$ itself:
\[ \begin{array}{c|cccc} & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & \bot & \top & \bot & \bot & \bot & \dots \\ 1 & \bot & \bot & \top & \bot & \bot & \dots \\ 2 & \bot & \bot & \bot & \top & \bot & \dots \\ 3 & \bot & \bot & \bot & \bot & \top & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \\ \end{array} \]
Here's $N;N$:
\[ \begin{array}{c|cccc} & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & \bot & \bot & \top & \bot & \bot & \dots \\ 1 & \bot & \bot & \bot & \top & \bot & \dots \\ 2 & \bot & \bot & \bot & \bot & \top & \dots \\ 3 & \bot & \bot & \bot & \bot & \bot & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \\ \end{array} \]
And here's $N;N;N$:
\[ \begin{array}{c|cccc} & 0 & 1 & 2 & 3 & 4 & \dots \\ \hline 0 & \bot & \bot & \bot & \top & \bot & \dots \\ 1 & \bot & \bot & \bot & \bot & \top & \dots \\ 2 & \bot & \bot & \bot & \bot & \bot & \dots \\ 3 & \bot & \bot & \bot & \bot & \bot & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \\ \end{array} \]
Notice that with each composition, the $\top$'s "march" one step further right. If we take infinitely many, we can find $(n, n+m) \in N;\dots;N$, where we've composed $N$ with itself $m$ times.
People use the following notation:
\[ \begin{array}{rcl} R^0 &=& \Delta_A \\ R^1 &=& R \\ R^{n+1} &=& R;R^n \\ \end{array} \]
That is, $R^n$ is $R$ composed with itself $n$ times.
And then we formally define transitive closure as:
\[ R^+ = \bigcup_{1 \le i} R^i \]
People also define the reflexive, transitive closure as:
\[ R^* = \bigcup_{0 \le i} R^i = \Delta_A \cup R^+ \]
The star in $R^*$ is called "Kleene star", after Stephen Cole Kleene. You'll learn more about it in CS 101!
Question: if $R$ is already transitive, does the taking the transitive closure produce the same relation or a different one?
Question: does it make sense to take the transitive closure of a relation $R \subseteq A \times B$?
We've spent most of this chapter understanding relations as a special kind of set. It turns out that we can understand a function as a special kind of relation. Recall our "beats" relation, $B \subseteq \mathsf{RPS} \times \mathsf{RPS}$:
\[ B = \{ (\mathsf{rock}, \mathsf{scissors}), (\mathsf{scissors}, \mathsf{paper}), (\mathsf{paper}, \mathsf{rock}) \} \]
We say $\mathsf{rock} \mathrel{B} \mathsf{scissors}$ to say that $\mathsf{rock}$ beats $\mathsf{scissors}$. But we could also think of $B$ as a function: given an element $x$ of $\mathsf{RPS}$, our relation $B$ identifies a unique other element that $x$ beats. We might even go so far as to say $B(\mathsf{rock}) = \mathsf{scissors}$, using function notation.
But we have to be careful---not every relation is a function! For example, the $<$ relation isn't a function: $0 < 1$ and $0 < 2$.
Which relations are functions, and what justifies using the function notation $f(x)$? Two characteristics are critical:
A relation is deterministic when: \[ \forall a \in A, b_1, b_2 \in B, (a, b_1) \in R \wedge (a, b_2) \in R \Rightarrow b_1 = b_2. \]
To be considered a function, a relation must be deterministic. Such relations are also called functional, i.e., they can be treated like functions. That is, a relation $R \subseteq A \times B$ is a function from $A \rightarrow B$ when each $a \in A$ is related to at most one element in $B$.
So $B$ is a function, and so are the successor relation ($\{ (n, S(n)) \mid n \in \mathbb{N} \}$) and the predecessor relation ($\{ (S(n), n) \mid n \in \mathbb{N} \}$). The relation that relates a list to its length is a function, but the relation that relates a length to every list of that length isn't. None of $=$ and $\ne$ and $<$ and $\perm$ are functions.
We said a functional relation $R \subseteq A \times B$ relates each element in $A$ to at most one element in $B$. Two of our examples---the $B$ relation on $\mathsf{RPS}$ and the successor relation on $\mathbb{N}$---relate every $A$ to a $B$. But the predecessor relation doesn't relate $0$ to anything.
We say that $B$ and successor are total functions, while predecessor is a partial function. Formally, a relation $R \subseteq A \times B$ is total when:
\[ \forall a \in A, \exists b \in B, (a, b) \in R \]
Some of the nomenclature here is confusing: a relation "is total" when it relates everything to at least one thing; the "total relation" relates everything to everything.
When a relation $R \subseteq A \times B$ is deterministic and total, we can justify writing (1) $R : A \rightarrow B$ to say that $R$ is a function, and (2) $R(a)$ to uniquely determine an element $b \in B$ such that $(a, b) \in R$. Why?
Since $R$ is total, we know that there is at least one $b \in B$ such that $(a, b) \in R$. Since $R$ is deterministic, we know that at most one such $b \in B$. So there must be exactly one $b \in B$ such that $(a,b) \in R$. We say that $R(a)$ denotes that unique $b$.
People will sometimes write $\exists^! b \in B, (a,b) \in R$ to mean "there exists exactly one". You can think of this notation as:
\[ \exists^! x \in S, P(x) \equiv \exists x \in S, \left[ P(x) \wedge \forall y \in S, P(y) \Rightarrow x =y \right] \]
One final note: the general convention is that $f$ and $g$ and the like are common function variable names, while $R$ and $S$ are common relation names. When we start defining relations and then showing that they're functions, we'll end up with confusing utterances like $R : A \rightarrow B$. There's no avoiding it for now, but when you're doing you're own work, you can try to plan names that are appropriate (i.e., lowercase ones for functions and uppercase ones for relations).
If we write $A \rightarrow B$ as $B^A$ (as we saw in day 24), we can see an interesting fact: a function from $A$ to $B$ is an $A$-fold product of $B$'s, i.e., for each element of $A$ we have a $B$. Concretely, a function $f : \texttt{bool} \rightarrow \mathbb{N}$ can be described as $\mathbb{N} \times \mathbb{N}$: the number for $f(\top)$ and the number for $f(\bot)$. That is, we could model the function
\[ f(b) = \begin{cases} 47 & b = \top \\ 54 & b = \bot \\ \end{cases} \]
As a relation $f = \{ (\top, 47), (\bot, 54) \} \subseteq \texttt{bool} \times \mathbb{N}$. But we can also see that $f = (47, 54) \in \mathbb{N}^\texttt{bool}$ using the ordering $\top, \bot$. Or we might write a mapping, as in $f = \langle \top \mapsto 47, \bot \mapsto 54 \rangle$. More generally:
\[ \begin{array}{rc} A \rightarrow B &= B^A =& B \times B \times \dots \times B \\ && \text{$|A|$ times} \\ \end{array} \]
So we might say:
\[ f = (b_1, b_2, \dots, b_{|A|}) = \langle a_1 \mapsto b_1, a_2 \mapsto b_2, \dots, a_{|A|} \mapsto b_{|A|} \rangle \]
The first notation simply fixes some order for the elements of $A$; the second notation is a bit more explicit, using angle brackets to differentiate this $A$-wise mapping from an ordinary tuple. (Folks use square or curly brackets, too.) You can think of this mapping as a record or object, where each field of that object is an element of $A$.
In any case, if $A$ is infinite, the dots above can only be informally interpreted... but maybe this intuition helps you!
There are some key terms for thinking about functions that are useful to have in your pocket: domain and co-domain, and pre-image and image.
When $f : A \rightarrow B$, we say that $f$'s domain is $A$ and its codomain is $B$. People write $\operatorname{dom}(f)$ to mean the domain and $\operatorname{cod}(f)$ to mean the codomain. So $B : \mathsf{RPS} \rightarrow \mathsf{RPS}$ has $\operatorname{dom}(B) = \operatorname{cod}(B) = \mathsf{RPS}$, while $\texttt{sum} : \texttt{list}(\mathbb{N}) \rightarrow \mathbb{N}$ has domain $\texttt{list}(\mathbb{N})$ and codomain $\mathbb{N}$.
Recall the successor relation, $\{ (n, S(n)) \mid n \in \mathbb{N} \}$. Let's call it a function, $\mathit{succ}$. (Now, note that $\mathit{succ}(n) = S(n)$, so this definition doesn't get us anywhere we couldn't go already---but let's ignore that for now!)
The image of $\mathit{succ}$ is the set of things that it maps to, i.e., $\mathrm{Image}(\mathit{succ}) = \{ m \mid \exists n, \mathit{succ}(n) = m \}$. Here, the image of $\mathit{succ}$ is the positives: there is no $n$ such that $\mathit{succ}(n) = 0$!
Formally, we say that:
\[ \mathrm{Image}(f) = \{ y \in \operatorname{cod}(f) \mid \exists x \in \operatorname{dom}(f), f(x) = y \} \]
Observe that $\mathrm{Image}(f) \subseteq \operatorname{cod}(f)$ by definition.
Consider the function $\texttt{sum} : \texttt{list}(\mathbb{N}) \rightarrow \mathbb{N}$. Which lists $l$ yield $sum(l) = 0$? Well, the empty list $\texttt{[]}$ is surely one. But so is $\texttt{[$0$]}$, and $\texttt{[$0$, $0$]}$, and so on.
Another way to phrase our question is to ask what the pre-image (a/k/a the inverse image) of $\{ 0 \}$ is for $\texttt{sum}$. For a given set $B \subseteq \operatorname{cod}(f)$, the pre-image of $B$ under $f$ is the set of inputs which will cause $f$ to produce an element in $B$. Formally:
\[ \mathrm{Pre}_f(B) = \{ x \in \operatorname{dom}(f) \mid f(x) \in B \} \]
So $\mathrm{Pre}_{\mathit{succ}}(\{ 0 \}) = \{ l \mid \forall x, x \text{ in } l \Rightarrow x = 0 \}$.
Question: It's a theorem that $\mathrm{Pre}_f(\operatorname{cod}(f)) = \operatorname{dom}(f)$ iff $f$ is a total function. How would you prove it?
You may have heard the word "range" when talking about a function. There's a problem, though: some people think a function's range is its codomain, and some think a function's range is its image. Rather than have to constantly clarify, it's better to simply avoid the word and use more precise language.
There are a number of important properties of functions. There are three in particular we'll talk about: partiality/totality, injectivity, and surjectivity.
We learned about partial functions already: a function $f : A \rightarrow B$ is called total when $f(x)$ is defined for all $x \in A$. A function is partial when it isn't total, often writing $f : A \rightharpoonup B$ to say that $f$ is partial.
We've already met many partial functions: $\texttt{pred} : \mathbb{N} \rightharpoonup \mathbb{N}$ is a partial function, as is subtraction on naturals. But note that in Coq, every function was total---we made $\texttt{pred}$ total by saying $\texttt{pred}(0) = 0$. In math, folks are usually content to leave $\texttt{pred}(0)$ undefined.
Undefinedness is a harzard to navigation. If a function $g$ is partial, then you must always be careful: if you say $g(x)$, need to know that $g$ is defined for $x$!
Other examples of partial functions are: square root (not defined on
negative numbers, at least for the reals $\mathbb{R}$), division (not
defined when its second argument is 0), the head
and tail
functions on lists (not defined on empty lists), and the mean
function on lists (not defined on empty lists). You'll meet many
others as you learn more!
Confusingly, some people say total functions are a subset of the partial functions, i.e., every total function is also a partial function. I find that confusing!
A function is injective (from Latin for "throwing in") when no two
inputs yield the same output. Some examples of injective functions are
successor and (true) predecessor. Some non-injective functions are
$f(x) = x^2$ (since $f(-5) = f(5)$) or the Coq predecessor function
(which says pred 1 = pred 0 = 0
).
Formally, a function $f : A \rightarrow B$ is injective when:
\[ \forall a_1, a_2 \in A, f(a_1) = f(a_2) \Rightarrow a_1 = a_2 \]
Note that we've seen a very similar but not identical property in determinism. A relation is deterministic when no input has two outputs; a function is injective when no output has two inputs.
People will say that an injective function is one-to-one or 1:1, meaning that each (one) input is mapped uniquely to one input.
A function is surjective (from Latin for "throwing on") when everything in the codomain is a possible output, i.e., there's no part of the codomain that the function can't 'hit'. Examples of surjective functions are predecessor (true or Coq's truncating), and list length. Some non-surjective functions are successor (0 isn't the successor of any natural) and absolute value on the integers, $\mathbb{Z}$, or reals, $\mathbb{R}$ (absolute value doesn't hit negative numbers).
Formally, a function $f : A \rightarrow B$ is surjective when:
\[ \forall b \in B, \exists a \in A, f(a) = b \]
Or, to put it another way, a function is onto when its codomain and its image coincide.
Note that we've seen a very similar but not identical property in totality. A relation is total when every input has an output; a function is surjective when every output has an input. In both cases, the existentials are of the "at least one" form.
People will say that surjective function onto, meaning that the domain is mapped 'onto' the codomain in its entirety.
A function bijective (from Latin for "throwing both"... but it's really just bi- "both" from the common -jective "throwing) when it's both injective and surjective.
Bijective functions are also called invertible functions, isomorphisms (from Greek isos "same, equal", morphos "shape, form"), or---and this is most confusing---a one-to-one correspondence, not to be confused with a function being "one to one". (Best to know about but not use this form.)
An example of a bijection on $\mathbb{N}$ is following the $\mathit{swap}$ function:
\[ \mathit{swap}(n) = \begin{cases} n+1 & n \text{ even} \\ n-1 & n \text{ odd} \\ \end{cases} \]
Question: How would you prove that $\mathit{swap}$ was bijective?
We've already seen relation composition. If functions are relations, what does composition of functions mean? Let's consider two functions on the naturals, $f, g : \mathbb{N} \rightarrow \mathbb{N}$:
\[ \begin{array}{rcl} f(x)) &=& x + 1 \\ g(y) &=& 2\cdot y \\ \end{array} \]
Using composition, we can build a new function, $h : \mathbb{N} \rightarrow \mathbb{N}$ where $h(z) = 2 \cdot z + 1$. We could simply say that:
\[ h = f \circ g = g ; f\]
(We use $\circ$ rather than $;$, as the former notation is slightly more common on functions.) How does that work? Recall the definition of composition:
\[ S \circ R = R;S = \{ (a,c) \mid (a,b) \in R, (b, c) \in S \} \]
Let's compute the definition of $h$:
\[ \begin{array}{rcl} h &=& \{ (a,c) \mid (a,b) \in g, (b, c) \in f \} \\ &=& \{ (a,c) \mid g(a) = b, f(b) = c \} \\ &=& \{ (a,c) \mid 2 \cdot a = b, b + 1 = c \} \\ &=& \{ (a,b + 1) \mid 2 \cdot a = b \} \\ &=& \{ (a,(2 \cdot a) + 1) \} \\ \end{array} \]
That is, $h(z) = 2 \cdot z + 1$!
We don't need a special definition for function composition---it's just a special case of relation composition---but it may helpful to have it spelled out. Formally, if $f : B \rightarrow C$ and $g : A \rightarrow B$, then we define the composition of $f$ and $g$, written $f \circ g$ or $g;f$ as:
\[ (f \circ g)(x) = f(g(x)) \]
Question: It's a theorem that $\texttt{map}(f, \texttt{map}(g, l)) = \texttt{map}(f \circ g, l)$. How might you prove it?
One final concept: function inverses. We've defined inverse relations as mere swapping: $R^{-1} = \{ (b,a) \mid (a,b) \in R \}$, so we might expect $f^{-1}(x) = y$ when $f(y) = x$.
The $\mathsf{RPS}$ beats relation, $B$, is a function and has an inverse: the loses-to function. What is $f^{-1}$, and when does it exist?
If $f : A \rightarrow B$, then $f$'s inverse, written $f^{-1} : B \rightarrow A$ is a function that does the "opposite" of $f$. In particular, $(f \circ f^{-1})(x) = x$, i.e., a function and its inverse 'cancel' each other and compose to form the identity.
Recall the function $\mathit{swap}$ (from our discussion of
bijectivity)---$\mathit{swap}$ is its own
inverse. Similarly, negb
is its own inverse, as is complement
and
reverse
... and any other involutive function. We've stumbled on an
interesting fact: involutive functions---like negb
, complement
, and
reverse
---are their own inverses.
Oddly, the inverse itself is involutive: if $f^{-1} = g$, then $g^{-1} = f$. To put it another way, $(f^{-1})^{-1} = f$.
Consider the constant-five function, $g : \mathbb{N} \rightarrow{N}$, where $g(x) = 5$ for all $x$. Could $g$ have an inverse? It's impossible: $g(0) = 5$ and $g(1) = 5$... what could $g^{-1}(5)$ be? It couldn't be the case that $g^{-1}$ is even deterministic! Worse still, what would $g^{-1}(3)$ be? Since there is no $x$ such that $g(x) = 3$, it can't be the case that $g^{-1}$ is total, either.
Question: Before you read the next two sections... when do you think a function has an inverse? If you have an idea, test it with a few examples.
To see which functions have inverses, we'll take a detour through interesting function properties. Let's recall the four we're interested in most for a given function $f : A \rightarrow B$, rewriting determinism and totality to use function notation:
We can pair them up: determinism and injectivity; totality and surjectivity. Where determinism uniquely determines the codomain, injectivity uniquely determines the domain. Where totality covers the domain, surjectivity covers the codomain. There's a mirror image quality here---a duality.
Theorem: If $f$ is injective, then $f^{-1}$ is a deterministic relation (and vice versa).
Theorem: If $f$ is surjective, then $f^{-1}$ is a total relation (and vice versa).
How might you prove these theorems? There are some interesting consequences: injections have inverses, but they might be partial (because not everything in the domain may be 'hit' by the original function); surjections have inverses, but they might not be functions (because they might not be deterministic).
Suppose we have $f : A \rightarrow B$, bijective. That means that $f$ is deterministic and total (since it's a function!) as well as injective and surjective. What do we know about $f^{-1}$?
The two theorems above tell us quite a bit:
That is, $f^{-1}$ is also deterministic, total, injective, and surjective---$f^{-1} : B \rightarrow A$ is also a bijection.
Bijections are called invertible functions because they necessarily have inverses which are themselves bijections.