Catalan Numbers
Definition, Recurrence, Generating Function, and Combinatorial Interpretations
Basic
1. Overview
The Catalan numbers form one of the most frequently appearing sequences in combinatorics. Many seemingly unrelated combinatorial problems — correctly balanced parentheses, the structure of binary trees, triangulations of convex polygons — are all enumerated by the same sequence, displaying a striking universality.
The history of this sequence goes back to the 18th century. In 1751, Leonhard Euler studied the number of ways to triangulate a convex polygon with diagonals and discussed this problem with Johann Andreas von Segner in correspondence. Segner derived a recurrence for the sequence in 1758. Later, in 1838, the Belgian mathematician Eugène Charles Catalan studied the sequence systematically, and his name has been attached to it ever since.
The Catalan numbers are defined by the following closed form:
$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} $$The fact that this simple expression enumerates over 200 distinct combinatorial objects reflects deep structural properties of combinatorics.
2. Definition
Definition (Catalan numbers)
For a non-negative integer $n \ge 0$, the $n$-th Catalan number $C_n$ is defined by
$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} = \dfrac{(2n)!}{(n+1)!\,n!} $$Let us compute the first few values:
| $n$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| $C_n$ | 1 | 1 | 2 | 5 | 14 | 42 | 132 | 429 | 1430 |
Example
Compute $C_4$:
$$ C_4 = \dfrac{1}{5}\binom{8}{4} = \dfrac{1}{5} \cdot \dfrac{8!}{4!\,4!} = \dfrac{1}{5} \cdot 70 = 14 $$Another equivalent expression frequently used is
$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} $$This form arises naturally from the reflection-principle proof given later.
3. Recurrence
Theorem (Catalan recurrence)
The Catalan numbers satisfy
$$ C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1 $$Proof
We use the fact that $C_{n+1}$ counts the number of valid sequences of $n+1$ pairs of parentheses.
The first symbol of a valid sequence of $n+1$ pairs of parentheses must be an opening parenthesis "(". We split into cases according to the position of the closing parenthesis ")" that matches this first "(".
Write the sequence as $w_1 w_2 \cdots w_{2(n+1)}$. Then $w_1 = \text{"("}$, and its matching closing parenthesis lies at position $w_{2i+2}$ for some $0 \le i \le n$. The sequence then has the structure
$$ \underbrace{(\;\overbrace{u_1 u_2 \cdots u_{2i}}^{i \text{ pairs}}\;)}_{\text{first pair and its interior}}\;\underbrace{v_1 v_2 \cdots v_{2(n-i)}}_{(n-i) \text{ pairs}} $$Here:
- The inner part $u_1 \cdots u_{2i}$ must be a valid sequence of $i$ pairs of parentheses, giving $C_i$ possibilities.
- The outer (right-hand) part $v_1 \cdots v_{2(n-i)}$ must be a valid sequence of $n - i$ pairs, giving $C_{n-i}$ possibilities.
Bijection check. Given any valid sequence of $n+1$ pairs of parentheses, the position of the closing parenthesis matching the first "(" is uniquely determined. Conversely, fix $i$ and choose a valid sequence $\alpha$ of $i$ pairs and a valid sequence $\beta$ of $n-i$ pairs; then $(\alpha)\beta$ is a valid sequence of $n+1$ pairs whose first "(" is matched by ")" at position $2i+2$. This correspondence is clearly one-to-one and onto.
Hence the count for each value of $i$ is $C_i \cdot C_{n-i}$, and summing over $i = 0, 1, \ldots, n$ gives
$$ C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i \, C_{n-i} $$as required. $\square$
Let us verify the recurrence concretely:
- $C_1 = C_0 \cdot C_0 = 1$
- $C_2 = C_0 C_1 + C_1 C_0 = 1 + 1 = 2$
- $C_3 = C_0 C_2 + C_1 C_1 + C_2 C_0 = 2 + 1 + 2 = 5$
- $C_4 = C_0 C_3 + C_1 C_2 + C_2 C_1 + C_3 C_0 = 5 + 2 + 2 + 5 = 14$
Another useful closed-form recurrence is
$$ C_{n+1} = \dfrac{2(2n+1)}{n+2}\, C_n $$which follows directly from the closed form and is convenient for sequential computation.
4. Generating Function
Theorem (generating function of the Catalan numbers)
The ordinary generating function $\displaystyle C(x) = \displaystyle\sum_{n=0}^{\infty} C_n x^n$ is given by
$$ C(x) = \dfrac{1 - \sqrt{1 - 4x}}{2x} $$with radius of convergence $|x| \le \dfrac{1}{4}$.
Derivation
Multiply both sides of the recurrence $\displaystyle C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i C_{n-i}$ by $x^{n+1}$ and sum over $n = 0, 1, 2, \ldots$. The right-hand side is a Cauchy product (convolution), so
$$ C(x) - 1 = x \cdot C(x)^2 $$that is, $C(x)$ satisfies the quadratic equation
$$ x \, C(x)^2 - C(x) + 1 = 0 $$By the quadratic formula,
$$ C(x) = \dfrac{1 \pm \sqrt{1 - 4x}}{2x} $$The branch satisfying $C(0) = C_0 = 1$ is the one with the minus sign (this can be verified by L'Hôpital's rule as $x \to 0$), giving
$$ C(x) = \dfrac{1 - \sqrt{1 - 4x}}{2x} $$The closed form can also be recovered from the generating function. Using the expansion
$$ \sqrt{1-4x} = \displaystyle\sum_{n=0}^{\infty} \binom{1/2}{n} (-4x)^n $$and extracting the coefficient of $x^n$ in $C(x)$ recovers $\displaystyle C_n = \dfrac{1}{n+1}\binom{2n}{n}$.
5. Combinatorial Interpretations
The most remarkable property of the Catalan numbers is that they enumerate a wide variety of combinatorial objects in a unified way. Six representative interpretations are given below.
5.1 Balanced parentheses
Problem
How many ways can $n$ pairs of parentheses be balanced correctly?
$C_n$ equals the number of valid sequences consisting of $n$ opening parentheses "(" and $n$ closing parentheses ")".
$n = 3$: $C_3 = 5$ sequences
((()))
(()())
(())()
()(())
()()()
5.2 Number of binary trees
Problem
How many ordered binary trees are there with $n + 1$ leaves (external nodes)?
$C_n$ equals the number of distinct ordered full binary trees with $n$ internal nodes (equivalently, $n+1$ leaves).
This can be understood from the recursive structure: assigning $i$ internal nodes to the left subtree and $n - 1 - i$ to the right subtree gives
$$ \displaystyle\sum_{i=0}^{n-1} C_i \, C_{n-1-i} = C_n $$which matches the recurrence.
$n = 3$: $C_3 = 5$ binary trees
There are 5 ordered full binary trees with 3 internal nodes, in one-to-one correspondence with the parenthesizations.
5.3 Triangulations of convex polygons
Problem
How many ways are there to triangulate a convex $(n+2)$-gon using non-crossing diagonals?
$C_n$ equals the number of triangulations of a convex $(n+2)$-gon. This is the problem Euler originally considered and is the historical origin of the Catalan numbers.
Examples
- $n = 1$: a triangle needs no further partition. $C_1 = 1$.
- $n = 2$: a quadrilateral admits $C_2 = 2$ diagonal partitions.
- $n = 3$: a pentagon admits $C_3 = 5$ triangulations.
- $n = 4$: a hexagon admits $C_4 = 14$ triangulations.
Fixing one edge and considering which triangle contains that edge yields a recursive count, recovering the Catalan recurrence.
5.4 Lattice paths (Dyck paths)
Problem
How many monotonic lattice paths are there from $(0, 0)$ to $(n, n)$ that do not cross the diagonal $y = x$ (i.e. satisfy $y \le x$ throughout)?
$C_n$ equals the number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ (using only right and up steps) that stay on or below the diagonal $y = x$.
An equivalent formulation is in terms of Dyck paths: lattice paths of $2n$ steps consisting of "U" (up) and "D" (down) such that the $y$-coordinate never becomes negative.
$n = 3$: $C_3 = 5$ Dyck paths
UUUDDD
UUDUDD
UUDDUD
UDUDUD
UDUUDD
5.5 Noncrossing partitions
Problem
How many noncrossing partitions are there of the set $\{1, 2, \ldots, n\}$?
$C_n$ equals the number of noncrossing partitions of $\{1, 2, \ldots, n\}$. A noncrossing partition is a partition in which, when $1, 2, \ldots, n$ are placed on a circle, the convex hulls of the blocks do not intersect each other.
$n = 3$: $C_3 = 5$ partitions
The noncrossing partitions of $\{1, 2, 3\}$ are:
- $\{\{1\}, \{2\}, \{3\}\}$
- $\{\{1, 2\}, \{3\}\}$
- $\{\{1\}, \{2, 3\}\}$
- $\{\{1, 3\}, \{2\}\}$
- $\{\{1, 2, 3\}\}$
(Note: $\{1, 3\}$ and $\{2\}$ do not cross on the circle and are allowed. By contrast, in the four-element case $\{1, 3\}$ and $\{2, 4\}$ cross, so this partition is not noncrossing.)
5.6 Mountain ranges
The number of mountain ranges of $2n$ steps (alternating up "/" and down "\" without going below the ground) is also $C_n$. This is essentially the same object as a Dyck path, but visualized as a mountain skyline.
6. Proof via the Reflection Principle
We now prove the closed form $\displaystyle C_n = \dfrac{1}{n+1}\binom{2n}{n}$ using the reflection principle (also known as André's reflection method), one of the most elegant proofs based on the lattice-path interpretation.
Theorem
The number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ that never strictly cross the diagonal $y = x$ is
$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} = \dfrac{1}{n+1}\binom{2n}{n} $$Proof
The total number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ is $\displaystyle\binom{2n}{n}$ (one chooses the $n$ right-steps out of $2n$ total steps).
It suffices to subtract the number of "bad paths" (paths that cross the diagonal). A bad path is one that first touches the line $y = x + 1$ at some point.
The reflection argument. Let $P$ be the first point at which a bad path touches the line $y = x + 1$. Reflect the part of the path after $P$ across the line $y = x + 1$ (swap right-steps and up-steps). Under this reflection, bad paths are in bijection with monotonic lattice paths from $(0, 0)$ to $(-1, n+1)$ — equivalently (after relabeling), to $(n-1, n+1)$.
The number of monotonic lattice paths from $(0, 0)$ to $(n-1, n+1)$ is $\displaystyle\binom{2n}{n+1}$.
Since this correspondence is a bijection, the number of good paths is
$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} $$Rearranging,
$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} = \binom{2n}{n}\left(1 - \dfrac{n}{n+1}\right) = \dfrac{1}{n+1}\binom{2n}{n} $$as claimed. $\square$
This proof is also closely related to the ballot problem. If candidate $A$ receives $a$ votes and candidate $B$ receives $b$ votes with $a > b$, the probability that $A$ leads throughout the count is $\displaystyle\dfrac{a - b}{a + b}$. In the case $a = b = n$, the number of vote orderings in which $A$ is always at least as far ahead as $B$ (ties allowed) corresponds to $C_n$.
7. Asymptotic Formula
Theorem (asymptotics of the Catalan numbers)
As $n \to \infty$,
$$ C_n \sim \dfrac{4^n}{n^{3/2}\sqrt{\pi}} $$Derivation
We use Stirling's approximation $\displaystyle n! \sim \sqrt{2\pi n} \left(\dfrac{n}{e}\right)^n$.
$$ \binom{2n}{n} = \dfrac{(2n)!}{(n!)^2} \sim \dfrac{\sqrt{4\pi n}\left(\dfrac{2n}{e}\right)^{2n}}{2\pi n \left(\dfrac{n}{e}\right)^{2n}} = \dfrac{4^n}{\sqrt{\pi n}} $$Hence
$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} \sim \dfrac{1}{n} \cdot \dfrac{4^n}{\sqrt{\pi n}} = \dfrac{4^n}{n^{3/2}\sqrt{\pi}} $$This result shows that Catalan numbers grow exponentially as $4^n$ with a polynomial correction factor $n^{-3/2}$. Let us check the approximation accuracy numerically:
| $n$ | $C_n$ (exact) | $\displaystyle\dfrac{4^n}{n^{3/2}\sqrt{\pi}}$ (approx.) | Relative error |
|---|---|---|---|
| 5 | 42 | 45.6 | 8.6% |
| 10 | 16796 | 17734 | 5.6% |
| 20 | 6564120420 | 6748772354 | 2.8% |
| 50 | $1.978 \times 10^{28}$ | $2.000 \times 10^{28}$ | 1.1% |
The accuracy improves as $n$ grows.
References
- Stanley, R. P. (2015). Catalan Numbers. Cambridge University Press.
- Stanley, R. P. (2012). Enumerative Combinatorics, Vol. 2, Exercise 6.19. Cambridge University Press.
- Graham, R. L., Knuth, D. E., & Patashnik, O. (1994). Concrete Mathematics (2nd ed.). Addison-Wesley.