Set membership

A set is a collection of zero or more distinct elements.

Let's unpack that. A set is a collection, i.e., it's a grouping or mass of things. A set has zero or more elements---that is, an empty set could have zero things in it while other sets could have many more. And the elements in a set are distinct, i.e., there's no notion of something occurring more than once in a set. A given element of a set occurs at most once, because having two occurrences of the same element in a set would mean those two elements wouldn't be distinct.

Mathematicians often think of sets in terms of an "is in" predicate to denote set membership, written $\in$. We say $x \in A$ to say that $x$ is an element of $A$. (Notice the convention that elements use lowercase names and sets use uppercase ones.)

So, $\top \in \texttt{bool}$ and $5 \in \mathbb{N}$ are true statements about the booleans and natural numbers, respectively.

We'll be using set membership as our most basic concept for sets, and building up from there with a variety of rules and axioms. For example, here's an axiom characerizing the empty set, written $\emptyset$.

\[ \forall x, x \not\in \emptyset \]

That is, forall $x$, it is not the case that $x$ is in the empty set.

Defining sets

We've already mentioned $\emptyset$, the empty set with no elements. In Coq, we defined sets using the Inductive keyword, but set theory lets us define things more flexibly.

The simplest way to define a set is by enumerating its elements. We might say:

\[ \texttt{bool} = \{ \top, \bot \} \]

To say that the booleans comprise true ($\top$) and false ($\bot$). Or we might say:

\[ \mathsf{RPS} = \{ \mathsf{rock}, \mathsf{paper}, \mathsf{scissors} \} \]

To define the set of valid moves in the game "rock-paper-scissors". The general convention is that you can denote a finite set by its elements, writing them in curly braces:

\[ A = \{ a_1, a_2, \dots, a_n \} \]

Says that the set $A$ has $n$ elements, $a_1$ through $a_n$.

One other useful set is the unit set, i.e.,:

\[ \mathsf{unit} = \{ \mathsf{tt} \} \]

Where $\mathsf{tt}$ is some designated value, and $\mathsf{unit}$ is the set with exactly one element in it.

Infinite sets

While it's easy enough to define a finite set by enumerating its elements, infinite sets are trickier: who has time to enumerate infinitely many things?

We really need a finite description of infinite sets. There are three approaches.

Inductive definitions

First, you can follow Coq's lead and use inductive definitions. In Coq we say:

Inductive nat : Type :=
  | O
  | S (n : nat).

To define the naturals inductively. In the language of set theory, we might say something like the following.

Definition: Let $\mathbb{N}$, the natural numbers, be the smallest set such that:

Notice the parts of the definition. First, we give the name ($\mathbb{N}$) and an explanation in English of what it means. Then we say it's the smallest set that obeys the two rules we give, namely, that $O \in \mathbb{N}$ and $S(n) in \mathbb{N}$ when $n \in \mathbb{N}$.

Why do we say "smallest" set? Well, we wouldn't want a set like:

\[ \mathbb{M} = \{ O, S(O), S(S(O)), \dots, \top \} \]

Who invited $\top$?! On the one hand, the rules generate $O$, and $S(O)$, and $S(S(O))$ and all of the other naturals we've come to know. But we don't want anything extra!

It may or may not help to see the definition above as a set of inference rules telling us which elements are in $\mathbb{N}$, as in:

\[ \frac{}{O \in \mathbb{N}} ~~~~~~~~ \frac{n \in \mathbb{N}}{S(n) \in \mathbb{N}} \]

Notice that we don't need to specify "smallest" when we use inference rules, since they're always interpreted in that way.

We can define lots of sets in this inductive way. Here's bit strings:

Definition: a bitstring is either:

Function spaces

Without knowing it, we've worked with other infinite sets through the course: sets of functions! The type $A \rightarrow B$ denotes the set of functions that take an $A$ as input and produce a $B$. Such sets of functions are often called function spaces, particularly in linear algebra and analysis.

Function spaces aren't defined inductively. In Coq, they're defined constructively, by literally writing functions. In set theory, they're defined on top of relations---we'll talk about it in greater detail on day 25, which is all about relations and functions, and on day 26, where we'll use functions to talk about the sizes of sets.

While we're talking about function spaces in the "infinite sets" section, remember not every function space is infinite! Consider the set of functions on booleans, i.e., $\texttt{bool} \rightarrow \texttt{bool}$. There are only four functions:

How do we know there are only four? The clearest answer is via truth table. Any truth table for $f : \texttt{bool} \rightarrow \texttt{bool}$ has the form:

$b$$f(b)$
$\top$$b_1$
$\bot$$b_2$

where $f(b) = \begin{cases} b_1 & b = \top \\ b_2 & b = \bot \\ \end{cases}$.

There are two possibilities for $b_1$ and two possibilities for $b_2$; since $2 \cdot 2 = 4$, there can only be four ways to fill in the table. A quick check will show you those ways correspond to the four we enumerated above.

Most function spaces are infinite. For example, $\mathsf{unit} \rightarrow \mathbb{N}$ is infinite; for each $n \in \mathbb{N}$, there is a function $g_n(x) = n$. (A word about this notational convention: we write $g_n$ to mean "the function $g$ for the given number $n$".) That is, this function space is in some sense identical to the naturals themselves.

Other definitions

It's common to define infinite sets in more informal ways, too. For example, we might define the rational numbers, or fractions, $\mathbb{Q}$ (for _q_uotient), as in:

Let the rationals $\mathbb{Q}$ be defined as the set of fractions $\frac{a}{b}$ where $b \ne 0$ and $b$ does not divide $a$.

Or, more complicatedly, we can (not quite accurately) define the real numbers $\mathbb{R}$:

A real number $n \in \mathbb{R}$ is of the form $n = d_0 d_1 d_2 \dots d_i . d_{i+1} d_{i+2} \dots$ for some non-empty, possibly infinite sequence of digits $d_i \in \{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \}$.

Notice that these definitions rely on a little more human interpretation than other ways of defining sets. In fact, our definition of the reals isn't quite right: we'd treat $0$ and $00$ and $00.00$ as different numbers, when not only are they the same, the number $0.99999\dots$ is the same as $1.0$. The reals are... very confusing, and defining them correctly requires some subtle work.

Set builder notation

Finally, we can build sets by "filtering" them by a predicate. For example, to identify the set of even natural numbers, we might say:

\[ E = \{ n \in \mathbb{N} \mid \exists k \in \mathbb{N}, n = 2 \cdot k \} \]

To specify the positives, we might write:

\[ \mathbb{N}^+ ~~~ = ~~~ \{ n \in \mathbb{N} \mid n > 0 \} \]

There are many equivalent definitions:

\[ \begin{array}{rcl} \mathbb{N}^+ &=& \{ n \in \mathbb{N} \mid n \ne 0 \} \\ &=& \{ S(n) \mid n \in \mathbb{N} \} \\ \end{array} \]

The general format is $A = \{ x \in B \mid P(x) \}$, meaning "those $x$ in the set $B$ such that $P(x)$".

For example, we can use set builder notation to define the rationals (where $a \mid b$ means "$a$ divides $b$"):

\[ \mathbb{Q} = \left\{ \frac{a}{b} \mid a \in \mathbb{N}, b \in \mathbb{N}^+ \wedge a \not\mid b \right\} \]

Common set operations

Back on day 8, when we implemented finite set representations on lists (in two ways) and trees, we implemented a variety of operations and predicates. Here we meet them again, set theoretically.

Union

The union of two sets is their combination. We write $A \cup B$ to denote the union, so:

\[ \texttt{bool} \cup \mathsf{RPS} = \{ \top, \bot, \mathsf{rock}, \mathsf{paper}, \mathsf{scissors} \} \]

Since sets can't have duplicates, union sometimes has no visible effect:

\[ \texttt{bool} \cup \{ \top \} = \texttt{bool} \]

We can axiomatize union in terms of disjunction, i.e.:

\[ A \cup B = \{ x \mid x \in A \vee x \in B \} \]

When we do proofs, this axiomatization is useful for element-wise proofs in particular.

There's no "insertion" operation on sets in mathematics. Instead, use union and a singleton set, e.g.:

\[ \texttt{bool} \cup \{ \mathsf{maybe} \} = \{ \top, \bot, \mathsf{maybe} \} \]

People sometimes write $\uplus$ to mean disjoint union, i.e., a union where we really believe we're adding new elements. That is, $\texttt{bool} \uplus \{ \mathsf{maybe} \} = \{ \top, \bot, \mathsf{maybe} \}$ as above, but $\texttt{bool} \uplus \{ \top \}$ would be ill-defined (because $\top \in \texttt{bool}$ already). There's no need to use $\uplus$, but it can make your intentions clearer.

Intersection

The intersection of two sets is the set of elements they have in common. We write $A \cap B$ to denote the intersection, so:

\[ \mathsf{RPS} \cap \{ \mathsf{rock}, \mathsf{roll} \} = \{ \mathsf{rock} \} \]

We can axiomatize intersection in terms of conjunction, i.e.:

\[ A \cap B = \{ x \mid x \in A \wedge x \in B \} \]

Difference

Finally, set difference finds the things that are in one set but not the other. There are two common notations for set difference $A - B$ and $A \setminus B$ are both commonly used. I prefer the latter. For example:

\[ \mathsf{RPS} \setminus \{ \mathsf{scissors} \} = \{ \mathsf{rock}, \mathsf{paper} \} \]

If the sets don't have anything in common, then set difference might not do anything. For example:

\[ \mathsf{RPS} \setminus \texttt{bool} = \mathsf{RPS} \]

We can axiomatize set difference using a negation:

\[ A \setminus B = \{ x \mid x \in A \wedge x \not\in B \} \]

Set complement (and universes of discourse)

The complement of a set comprises those things not in the set. Set complement is a useful concept, but it only makes sense within some universe of discourse.

What, for example, is the complement of the set $\texttt{bool}$? Is it the set containing, well, everything but $\top$ and $\bot$? What things are those? That sounds like a big set! To avoid wild generalities, one should only use set complement when it's clear what the set of all possibilities is.

For example, we might say that our universe of discourse is the 5Cs:

\[ C = \{ \mathsf{CM}, \mathsf{HM}, \mathsf{PO}, \mathsf{PZ}, \mathsf{SC} \} \]

Let the set $\mathsf{CMS} = \{ \mathsf{CM}, \mathsf{HM}, \mathsf{SC} \}$. The complement of $\mathsf{CMS}$ is, written with an overline, as $\overline{\mathsf{CMS}}$; others write it with a superscript $-^C$, as in $\mathsf{CMS}^C$:

\[ \overline{\mathsf{CMS}} = \{ \mathsf{PO}, \mathsf{PZ} \} \]

To be concrete, if we're working in a universe of discourse $U$, the complement of a set $A$ is defined in terms of set difference:

\[ \overline{A} = U \setminus A = \{ x \mid x \in U, x \not\in A \} \]

We'll see at the end of the chapter that universes of discourse are very important.

Products and tuples

In Coq, we defined tuples a/k/a pairs a/k/a the product type a/k/a the Cartesian product a/k/a the timesy boi (okay that last one isn't real) using the prod type:

Inductive prod (X Y : Type) : Type :=
| pair (x : X) (y : Y).

Arguments pair {X} {Y} _ _.

Notation "( x , y )" := (pair x y).

Notation "X * Y" := (prod X Y) : type_scope.

We define the Cartesian product similarly in set theory, using $\times$ instead of *:

\[ A \times B = \{ (x, y) \mid x \in A, y \in B \} \]

Notice that the comma is treated like an 'and', i.e., $\wedge$. The product of two sets is the set of pairs drawn from those sets. To be concrete:

\[ \texttt{bool} \times \texttt{base} = \{ (\top, \texttt{A}), (\top, \texttt{C}), (\top, \texttt{G}), (\top, \texttt{T}), (\bot, \texttt{A}), (\bot, \texttt{C}), (\bot, \texttt{G}), (\bot, \texttt{T}) \} \]

It's common to talk about products of more than one set, i.e., the product of three sets produces triples, the product of four sets produces 4-tuples, and so on---with $n$-fold products producing $n$-tuples. Concretely:

\[ \begin{array}{rl} \texttt{bool} \times \mathsf{RPS} \times \mathsf{unit} = & \{ (\top, \mathsf{rock}, \mathsf{tt}), (\top, \mathsf{paper}, \mathsf{tt}), (\top, \mathsf{scissors}, \mathsf{tt}), \\ & \phantom{\{} (\bot, \mathsf{rock}, \mathsf{tt}), (\bot, \mathsf{paper}, \mathsf{tt}), (\bot, \mathsf{scissors}, \mathsf{tt}), \} \\ \end{array} \]

Extracting things from tuples

When working with tuples in Coq, we defined functions fst and snd to get their elements out, i.e., fst (true, G) = true. In mathematics, there are two common notations for pulling values out of tuples: $\pi_i$ projects the $i$th element out of a tuple, using 1-based indices; $e.i$ means the same thing. So:

\[ \pi_1 (\top, \mathsf{G}) = (\top, \mathsf{G}).1 = \top \]

The power set

The power set of a given set is a the set of all of its subsets. There are two common notations for the power set of $A$: $\mathcal{P}(A)$ and $2^A$. I prefer the former. As a concrete example:

\[ \mathcal{P}(\texttt{bool}) = \{ \emptyset, \{ \top \}, \{ \bot \}, \{ \top, \bot \} \} \]

Note that if $A$ is a set of some particular type of element, then $\mathcal{P}(A)$ is a set of sets of those elements. We can define the power set in general as:

\[ \mathcal{P}(A) = \{ B \mid B \subseteq A \} \]

So we can say equivalently that $B \subseteq A$ or that $B \in \mathcal{P}(A)$.

Exponential notation

You might wonder... where did the $2^A$ notation come from? There are two sources.

First, and more prosaically, if $A$ has $n$ elements, then its powerset has $2^n$ elements (the Binomial Theorem rears its head!).

Second, in some circles, people write the type $A \rightarrow B$ as $B^A$. (People do all kinds of things.) So $2^A$ is the set of functions $A \rightarrow 2$, where $2$ is a fancy name for $\texttt{bool}$ (because there are two booleans; similarly, people sometimes call $\mathsf{unit}$ the set $1$). Why is $A \rightarrow \texttt{bool}$ a way to think of the set of subsets?

If $B \subseteq A$, then $B$ can be thought of as a function from $A \rightarrow \texttt{bool}$ that takes an element $x \in A$ and returns $\top$ if $x \in B$. These so-called characteristic functions are a bit like a bitstring. If there are $n$ elements of $A$, there are $2^n$ such characteristic functions.

Relations on sets

In addition to the various operations on sets, there are important relations on sets. We defined some on day 8. Let's get reacquainted.

Subset

We say $A$ is a subset of $B$, written $A \subseteq B$ (or $A \subset B$), when every element of $A$ is an element of $B$---but not necessarily vice versa.

For example, the empty set $\emptyset$ is a subset of every set. The natural numbers $\mathbb{N}$ are a subset of the integers $\mathbb{Z} = \{ 0, 1, -1, 2, -2, \dots \}$.

We axiomatize subset as follows:

\[ A \subseteq B \Leftrightarrow \forall x \in A, x \in B \]

Proper subsets

Every set is a subset of itself---i.e., $A \subseteq A$. But just as we sometimes want "less than $<$" and we sometimes want "less then or equal to $\le$", we sometimes want to say that $A$ is a subset of $B$ in a nontrivial way, i.e., $B$ contains things $A$ does not! That is, $A$ is a proper subset of $B$.

The least ambiguous notation for proper subsets is $A \subsetneq B$. Some folks will write $A \subseteq B$ to mean proper subset, but others will write to mean a possibly equal subset---confusing! It's best to be explicit.

We can axiomatize proper subsets in a few ways; here are two useful versions:

\[ \begin{array}{rcl} A \subsetneq B &\Leftrightarrow& A \subseteq B \wedge A \ne B \\ &\Leftrightarrow& \forall x \in A, x \in B \wedge \exists y \in B, y \not\in A \end{array} \]

Equality

Sets are equal when they contain the same elements. We use the conventional notation, writing $A = B$ to mean that the sets $A$ and $B$ are equal. We can axiomatize it as follows:

\[ \begin{array}{rcl} A = B &\Leftrightarrow& A \subseteq B \wedge B \subseteq A \\ &\Leftrightarrow& \forall x, x \in A \Leftrightarrow x \in B \\ \end{array} \]

Disjointness

Two sets are disjoint when they have no elements in common. There's no particular notation for disjoint sets---people express that $A$ and $B$ are disjoint by writing $A \cap B = \emptyset$.

For example, $\texttt{bool}$ and $\mathbb{N}$ are disjoint, but $\mathbb{N}$ and $\mathbb{Z}$ are not (because every natural number is also an integer. $\mathsf{RPS}$ and $\{ \mathsf{rock}, \mathsf{roll} \}$ are not disjoint, because they have a common element---$\mathsf{rock}$.

Partitions

People talk about partitioning a set by dividing it into disjoint, non-empty subsets that compose the whole. For example, we can partition the naturals into the evens ($E = \{ n \in \mathbb{N} \mid \exists k, n = 2k \}$) and odds ($O = \{ n \in \mathbb{N} \mid \exists k, n = 2k + 1 \}$). We have $E \cap O = \emptyset$, i.e., the evens and odds are disjoint; we also have $\mathbb{N} = E \cup O$, (or even $E \uplus O$, if we wanted to emphasize the disjointness).

More formally, the sets $A_1, \dots, A_k$ form a $k$-partition of the set $B$ if $A_i \cap A_j = \emptyset$ when $i \ne j$ and $B = \bigcup_{i=1}^k A_i = A_1 \cup \dots \cup A_k$. (The big-union sign here captures the iterative nature of $\sum$, i.e., of a sum.)

Proofs with sets

There are two core approaches to proving properties of sets: you can think about their elements, or you can think about equations involving sets, i.e., algebraically.

Element-wise proofs

Element-wise proofs can be tedious, but are often the simplest and most straightforward way to prove properties of sets.

The general idea is to reduce any proposition involving sets to propositions about members, $\in$. Here's an example.

Theorem: if $A \cap B = A$ then $A \subseteq B$.

Proof: Let sets $A$ and $B$ be given such that $A \cap B = A$. We must prove that $A \subseteq B$, i.e., if $x \in A$, and then $x \in B$.

Let $x \in A$ be given; we must show $x \in B$. Since $A \cap B = A$, we know that $y \in A \cap B$ iff $y \in A$. So it must be the case that $x \in A \cap B$, i.e., $x \in A$ and $x \in B$, i.e., $x \in B$---as desired.

QED.

In that proof, we could even drop the intermediate line "i.e., $x \in A$ and $x \in B$". In any case, notice how the core approach is to unfold definitions down to some logical statement involving set membership: here, set equality turned into mutual inclusion and intersection turned into a conjunction.

Algebraic proofs

Algebraic proofs make use of various (in)equalities to prove things about sets. Here's a list of worthwhile inequalities, drawn from Wikipedia, where $A$ and $B$ and $C$ are arbitrary sets and $U$ is the universe of discourse:

\[ \begin{array}{rclcrclr} A \cup B &=& B \cup A & ~~~~~~ & A \cap B &=& B \cap A & ~~~\text{commutativity} \\ A \cup (B \cup C) &=& (A \cup B) \cup C && A \cap (B \cap C) &=& (A \cap B) \cap C & \text{associativity} \\ A \cup (B \cap C) &=& (A \cup B) \cap (A \cup C) && A \cap (B \cup C) &=& (A \cap B) \cup (A \cap C) & \text{distributivity} \\ A \cup \emptyset &=& A && A \cap U &=& A & \text{identity} \\ A \cup \overline{A} &=& U && A \cap \overline{A} &=& \emptyset & \text{complement} \\ \end{array} \]

These laws are sufficient to deduce every valid equation in the algebra of sets. There are many other useful properties, but these are all you need. Here's an example proof.

Theorem: union is idempotent, i.e., $A \cup A = A$.

Proof: We compute: \[ \begin{array}{rclr} A \cup A &=& (A \cup A) \cap U & \text{$\cap$ identity} \\ &=& (A \cup A) \cap (A \cup \overline{A}) & \text{$\cup$ complement} \\ &=& A \cup (A \cap \overline{A}) & \text{$\cup$ distributivity} \\ &=& A \cup \emptyset & \text{$\cap$ complement} \\ &=& A & \text{$\cup$ identity} \\ &&& \square \\ \end{array} \]

Algebraic proofs can be succinct and clear, but they can be hard to find! They're also a bit trickier to use when working with subset, in which case the following theorem is a useful one.

Theorem: The following are all equivalent:

  1. $A \subseteq B$
  2. $A \cap B = A$
  3. $A \cup B = B$
  4. $A \setminus B = \emptyset$
  5. $\overline{B} \subseteq \overline{A}$

How would you prove this? (We already showed that (2) implies (1) above. A common technique is to show that (1) implies (2) implies (3) implies (4) implies (5) implies (1). By closing the loop---in whatever order---we know that each property implies each other one, and so they're all equivalent... and you've only done five proofs. If you showed (1) iff (2) and (1) iff (3) and so on separately for each property, you'd do eight proofs!)

Finally, there are two particularly important laws named after Augustus De Morgan:

Theorem: De Morgan's laws:

  1. $\overline{A \cup B} = \overline{A} \cap \overline{B}$
  2. $\overline{A \cap B} = \overline{A} \cup \overline{B}$

How would you prove these equations?

Russell's Paradox

We've been working in what's called naive set theory. Everything we've said and proved is valid and okay, but we're actually in a dangerous spot---naive set theory isn't a consistent place to work! The following is merely an interesting interlude, and not a core part of the course. It presages some ideas you'll see in CS 101 (and, to a lesser extent, when we talk about countability on day 26.

To see why, we'll start with a story about a barber. Then we'll meet Bertrand Russell, who broke everything and was then kind enough to fix it (with help from Alfred North Whitehead).

The Barber 💈

Claremont has quite a few barbershops and hair salons, but in the town of Logicville, there's just one barber. The barber in Logicville shaves only and exactly the people who do not shave themselves.

So far, so good. If you move to Logicville, you have to decide to either (a) shave yourself, or (b) have the barber do it.

Who shaves the barber?

We've reached an impasse: the barber can't shave himself, but he also can't not shave himself. What gives?

The Paradox 😱

Let the set $Y = \{ X \mid X \not\in X \}$, i.e., let $Y$ be a set containing those sets $X$ which do not contain themselves. You might balk that sets holding other sets, but we've already seen that in the power set---it's not a big deal.

Here's a question: is $Y \in Y$?

The Resolution 🤝

What are we going to do about this set $Y$? Neither $Y$ nor the barber seem logically acceptable---we've reached a contradictory state where a proposition can be neither true nor false, which is Bad News™. To avoid paradox, $Y$'s definition needs to be outlawed. It's time for the sheriff of Logicville to take that barber down. 🤠

There are two core issues. First, we are considering the very possibility that a set could be a member of itself. Second, our definition of $Y$ doesn't specify any notion of universe.

Russell and Whitehead's solution was to invent the theory of types in general and the ramified hierarchy of types in particular. We can define a set like $Y$, but it's ill typed to consider $Y$ as a member of itself: each level of the type hierarchy can only reference things beneath it.

Type theory is a neat solution to both issues. The hierarchy simply outlaws the possibility that $X \in X$ for any $X$. It also ensures that we work in a sort of "terrace" of universes, with each level only talking about levels beneath it.

Russell and Whitehead's theory aren't the conventional foundation for mathematics today---most people use Zermelo--Fraenkel set theory, also called ZFC. While ZFC isn't a type theory, it does use a "cumulative hierarchy" to the same effect.

These foundational issues are not the everyday fodder of mathematicians or computer scientists, but these foundations are nevertheless very important. The story of 20th Century mathematics is one of radical reimagining and revision: after Russell's paradox, numerous other foundational results have had a huge influence on the direction of mathematics: Gödel's completeness and incompleteness theorems, Turing's proof of undecidability, Cohen's proof of the independence of the axiom of choice. And the work is still going on: there's plenty of active research on new foundations in interesting type theories (homotopy type theory and univalence, in particular).

You'll learn about some of these important ideas in CS 101!