Catalan Numbers

Definition, Recurrence, Generating Function, and Combinatorial Interpretations

Basic

1. Overview

The Catalan numbers form one of the most frequently appearing sequences in combinatorics. Many seemingly unrelated combinatorial problems — correctly balanced parentheses, the structure of binary trees, triangulations of convex polygons — are all enumerated by the same sequence, displaying a striking universality.

The history of this sequence goes back to the 18th century. In 1751, Leonhard Euler studied the number of ways to triangulate a convex polygon with diagonals and discussed this problem with Johann Andreas von Segner in correspondence. Segner derived a recurrence for the sequence in 1758. Later, in 1838, the Belgian mathematician Eugène Charles Catalan studied the sequence systematically, and his name has been attached to it ever since.

The Catalan numbers are defined by the following closed form:

$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} $$

The fact that this simple expression enumerates over 200 distinct combinatorial objects reflects deep structural properties of combinatorics.

2. Definition

Definition (Catalan numbers)

For a non-negative integer $n \ge 0$, the $n$-th Catalan number $C_n$ is defined by

$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} = \dfrac{(2n)!}{(n+1)!\,n!} $$

Let us compute the first few values:

$n$ 012345678
$C_n$ 112514421324291430

Example

Compute $C_4$:

$$ C_4 = \dfrac{1}{5}\binom{8}{4} = \dfrac{1}{5} \cdot \dfrac{8!}{4!\,4!} = \dfrac{1}{5} \cdot 70 = 14 $$

Another equivalent expression frequently used is

$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} $$

This form arises naturally from the reflection-principle proof given later.

3. Recurrence

Theorem (Catalan recurrence)

The Catalan numbers satisfy

$$ C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i \, C_{n-i}, \qquad C_0 = 1 $$

Proof

We use the fact that $C_{n+1}$ counts the number of valid sequences of $n+1$ pairs of parentheses.

The first symbol of a valid sequence of $n+1$ pairs of parentheses must be an opening parenthesis "(". We split into cases according to the position of the closing parenthesis ")" that matches this first "(".

Write the sequence as $w_1 w_2 \cdots w_{2(n+1)}$. Then $w_1 = \text{"("}$, and its matching closing parenthesis lies at position $w_{2i+2}$ for some $0 \le i \le n$. The sequence then has the structure

$$ \underbrace{(\;\overbrace{u_1 u_2 \cdots u_{2i}}^{i \text{ pairs}}\;)}_{\text{first pair and its interior}}\;\underbrace{v_1 v_2 \cdots v_{2(n-i)}}_{(n-i) \text{ pairs}} $$

Here:

  • The inner part $u_1 \cdots u_{2i}$ must be a valid sequence of $i$ pairs of parentheses, giving $C_i$ possibilities.
  • The outer (right-hand) part $v_1 \cdots v_{2(n-i)}$ must be a valid sequence of $n - i$ pairs, giving $C_{n-i}$ possibilities.

Bijection check. Given any valid sequence of $n+1$ pairs of parentheses, the position of the closing parenthesis matching the first "(" is uniquely determined. Conversely, fix $i$ and choose a valid sequence $\alpha$ of $i$ pairs and a valid sequence $\beta$ of $n-i$ pairs; then $(\alpha)\beta$ is a valid sequence of $n+1$ pairs whose first "(" is matched by ")" at position $2i+2$. This correspondence is clearly one-to-one and onto.

Hence the count for each value of $i$ is $C_i \cdot C_{n-i}$, and summing over $i = 0, 1, \ldots, n$ gives

$$ C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i \, C_{n-i} $$

as required. $\square$

Let us verify the recurrence concretely:

  • $C_1 = C_0 \cdot C_0 = 1$
  • $C_2 = C_0 C_1 + C_1 C_0 = 1 + 1 = 2$
  • $C_3 = C_0 C_2 + C_1 C_1 + C_2 C_0 = 2 + 1 + 2 = 5$
  • $C_4 = C_0 C_3 + C_1 C_2 + C_2 C_1 + C_3 C_0 = 5 + 2 + 2 + 5 = 14$

Another useful closed-form recurrence is

$$ C_{n+1} = \dfrac{2(2n+1)}{n+2}\, C_n $$

which follows directly from the closed form and is convenient for sequential computation.

4. Generating Function

Theorem (generating function of the Catalan numbers)

The ordinary generating function $\displaystyle C(x) = \displaystyle\sum_{n=0}^{\infty} C_n x^n$ is given by

$$ C(x) = \dfrac{1 - \sqrt{1 - 4x}}{2x} $$

with radius of convergence $|x| \le \dfrac{1}{4}$.

Derivation

Multiply both sides of the recurrence $\displaystyle C_{n+1} = \displaystyle\sum_{i=0}^{n} C_i C_{n-i}$ by $x^{n+1}$ and sum over $n = 0, 1, 2, \ldots$. The right-hand side is a Cauchy product (convolution), so

$$ C(x) - 1 = x \cdot C(x)^2 $$

that is, $C(x)$ satisfies the quadratic equation

$$ x \, C(x)^2 - C(x) + 1 = 0 $$

By the quadratic formula,

$$ C(x) = \dfrac{1 \pm \sqrt{1 - 4x}}{2x} $$

The branch satisfying $C(0) = C_0 = 1$ is the one with the minus sign (this can be verified by L'Hôpital's rule as $x \to 0$), giving

$$ C(x) = \dfrac{1 - \sqrt{1 - 4x}}{2x} $$

The closed form can also be recovered from the generating function. Using the expansion

$$ \sqrt{1-4x} = \displaystyle\sum_{n=0}^{\infty} \binom{1/2}{n} (-4x)^n $$

and extracting the coefficient of $x^n$ in $C(x)$ recovers $\displaystyle C_n = \dfrac{1}{n+1}\binom{2n}{n}$.

5. Combinatorial Interpretations

The most remarkable property of the Catalan numbers is that they enumerate a wide variety of combinatorial objects in a unified way. Six representative interpretations are given below.

5.1 Balanced parentheses

Problem

How many ways can $n$ pairs of parentheses be balanced correctly?

$C_n$ equals the number of valid sequences consisting of $n$ opening parentheses "(" and $n$ closing parentheses ")".

$n = 3$: $C_3 = 5$ sequences

((()))(()())(())()()(())()()()

5.2 Number of binary trees

Problem

How many ordered binary trees are there with $n + 1$ leaves (external nodes)?

$C_n$ equals the number of distinct ordered full binary trees with $n$ internal nodes (equivalently, $n+1$ leaves).

This can be understood from the recursive structure: assigning $i$ internal nodes to the left subtree and $n - 1 - i$ to the right subtree gives

$$ \displaystyle\sum_{i=0}^{n-1} C_i \, C_{n-1-i} = C_n $$

which matches the recurrence.

$n = 3$: $C_3 = 5$ binary trees

There are 5 ordered full binary trees with 3 internal nodes, in one-to-one correspondence with the parenthesizations.

Five ordered full binary trees for n=3
Fig. 1: The 5 ordered full binary trees for $n=3$ (blue dots: internal nodes; open circles: leaves).

5.3 Triangulations of convex polygons

Problem

How many ways are there to triangulate a convex $(n+2)$-gon using non-crossing diagonals?

$C_n$ equals the number of triangulations of a convex $(n+2)$-gon. This is the problem Euler originally considered and is the historical origin of the Catalan numbers.

Examples

  • $n = 1$: a triangle needs no further partition. $C_1 = 1$.
  • $n = 2$: a quadrilateral admits $C_2 = 2$ diagonal partitions.
  • $n = 3$: a pentagon admits $C_3 = 5$ triangulations.
  • $n = 4$: a hexagon admits $C_4 = 14$ triangulations.
Five triangulations of a pentagon (n=3) 12 34 5 12 34 5 12 34 5 12 34 5 12 34 5
Fig. 2: The $C_3 = 5$ triangulations of a regular pentagon ($n=3$); blue dashed lines are the diagonals.

Fixing one edge and considering which triangle contains that edge yields a recursive count, recovering the Catalan recurrence.

5.4 Lattice paths (Dyck paths)

Problem

How many monotonic lattice paths are there from $(0, 0)$ to $(n, n)$ that do not cross the diagonal $y = x$ (i.e. satisfy $y \le x$ throughout)?

$C_n$ equals the number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ (using only right and up steps) that stay on or below the diagonal $y = x$.

An equivalent formulation is in terms of Dyck paths: lattice paths of $2n$ steps consisting of "U" (up) and "D" (down) such that the $y$-coordinate never becomes negative.

$n = 3$: $C_3 = 5$ Dyck paths

UUUDDDUUDUDDUUDDUDUDUDUDUDUUDD

Five Dyck paths for n=3 UUUDDD UUDUDD UUDDUD UDUDUD UDUUDD
Fig. 3: The 5 Dyck paths for $n=3$ (blue: path; dashed line: ground $y=0$).

5.5 Noncrossing partitions

Problem

How many noncrossing partitions are there of the set $\{1, 2, \ldots, n\}$?

$C_n$ equals the number of noncrossing partitions of $\{1, 2, \ldots, n\}$. A noncrossing partition is a partition in which, when $1, 2, \ldots, n$ are placed on a circle, the convex hulls of the blocks do not intersect each other.

$n = 3$: $C_3 = 5$ partitions

The noncrossing partitions of $\{1, 2, 3\}$ are:

  • $\{\{1\}, \{2\}, \{3\}\}$
  • $\{\{1, 2\}, \{3\}\}$
  • $\{\{1\}, \{2, 3\}\}$
  • $\{\{1, 3\}, \{2\}\}$
  • $\{\{1, 2, 3\}\}$
Noncrossing vs crossing partitions 1 2 3 4 Noncrossing 1 2 3 4 Crossing
Fig. 4: Noncrossing vs crossing partitions ($n=4$). Left: $\{1,2\},\{3,4\}$ does not cross. Right: $\{1,3\},\{2,4\}$ crosses.

(Note: $\{1, 3\}$ and $\{2\}$ do not cross on the circle and are allowed. By contrast, in the four-element case $\{1, 3\}$ and $\{2, 4\}$ cross, so this partition is not noncrossing.)

5.6 Mountain ranges

The number of mountain ranges of $2n$ steps (alternating up "/" and down "\" without going below the ground) is also $C_n$. This is essentially the same object as a Dyck path, but visualized as a mountain skyline.

6. Proof via the Reflection Principle

We now prove the closed form $\displaystyle C_n = \dfrac{1}{n+1}\binom{2n}{n}$ using the reflection principle (also known as André's reflection method), one of the most elegant proofs based on the lattice-path interpretation.

Theorem

The number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ that never strictly cross the diagonal $y = x$ is

$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} = \dfrac{1}{n+1}\binom{2n}{n} $$

Proof

The total number of monotonic lattice paths from $(0, 0)$ to $(n, n)$ is $\displaystyle\binom{2n}{n}$ (one chooses the $n$ right-steps out of $2n$ total steps).

It suffices to subtract the number of "bad paths" (paths that cross the diagonal). A bad path is one that first touches the line $y = x + 1$ at some point.

The reflection argument. Let $P$ be the first point at which a bad path touches the line $y = x + 1$. Reflect the part of the path after $P$ across the line $y = x + 1$ (swap right-steps and up-steps). Under this reflection, bad paths are in bijection with monotonic lattice paths from $(0, 0)$ to $(-1, n+1)$ — equivalently (after relabeling), to $(n-1, n+1)$.

Illustration of the reflection principle 0 1 2 3 4 0 1 2 3 4 5 y=x y=x+1 (0,0) P (4,4) (3,5) Good path Bad path After reflection
Fig. 5: Illustration of the reflection principle ($n=4$). At the first point $P$ where the bad path (blue) touches $y=x+1$, reflection sends the endpoint from $(4,4)$ to $(3,5)$.

The number of monotonic lattice paths from $(0, 0)$ to $(n-1, n+1)$ is $\displaystyle\binom{2n}{n+1}$.

Since this correspondence is a bijection, the number of good paths is

$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} $$

Rearranging,

$$ C_n = \binom{2n}{n} - \binom{2n}{n+1} = \binom{2n}{n}\left(1 - \dfrac{n}{n+1}\right) = \dfrac{1}{n+1}\binom{2n}{n} $$

as claimed. $\square$

This proof is also closely related to the ballot problem. If candidate $A$ receives $a$ votes and candidate $B$ receives $b$ votes with $a > b$, the probability that $A$ leads throughout the count is $\displaystyle\dfrac{a - b}{a + b}$. In the case $a = b = n$, the number of vote orderings in which $A$ is always at least as far ahead as $B$ (ties allowed) corresponds to $C_n$.

7. Asymptotic Formula

Theorem (asymptotics of the Catalan numbers)

As $n \to \infty$,

$$ C_n \sim \dfrac{4^n}{n^{3/2}\sqrt{\pi}} $$

Derivation

We use Stirling's approximation $\displaystyle n! \sim \sqrt{2\pi n} \left(\dfrac{n}{e}\right)^n$.

$$ \binom{2n}{n} = \dfrac{(2n)!}{(n!)^2} \sim \dfrac{\sqrt{4\pi n}\left(\dfrac{2n}{e}\right)^{2n}}{2\pi n \left(\dfrac{n}{e}\right)^{2n}} = \dfrac{4^n}{\sqrt{\pi n}} $$

Hence

$$ C_n = \dfrac{1}{n+1}\binom{2n}{n} \sim \dfrac{1}{n} \cdot \dfrac{4^n}{\sqrt{\pi n}} = \dfrac{4^n}{n^{3/2}\sqrt{\pi}} $$

This result shows that Catalan numbers grow exponentially as $4^n$ with a polynomial correction factor $n^{-3/2}$. Let us check the approximation accuracy numerically:

$n$ $C_n$ (exact) $\displaystyle\dfrac{4^n}{n^{3/2}\sqrt{\pi}}$ (approx.) Relative error
54245.68.6%
1016796177345.6%
20656412042067487723542.8%
50$1.978 \times 10^{28}$$2.000 \times 10^{28}$1.1%

The accuracy improves as $n$ grows.

References

  • Stanley, R. P. (2015). Catalan Numbers. Cambridge University Press.
  • Stanley, R. P. (2012). Enumerative Combinatorics, Vol. 2, Exercise 6.19. Cambridge University Press.
  • Graham, R. L., Knuth, D. E., & Patashnik, O. (1994). Concrete Mathematics (2nd ed.). Addison-Wesley.