Which algorithm does calx use to compute large factorials?

For n > 228, calx uses a sieve-based prime factorization combined with tournament multiplication (binary splitting). First, all primes up to n are enumerated via the Sieve of Eratosthenes. Then Legendre's formula determines each prime's exponent in n!. The resulting prime powers are combined using balanced binary splitting, which keeps operand sizes roughly equal and maximizes the efficiency of fast multiplication algorithms like Karatsuba and NTT.

Why is tournament multiplication faster than sequential multiplication for factorials?

Sequential multiplication like ((a*b)*c)*d causes one operand to grow rapidly while the other stays small, producing unbalanced multiplications that cannot fully benefit from sub-quadratic algorithms. Tournament multiplication pairs up operands of similar size at each level, ensuring balanced multiplications throughout. This maximizes the efficiency of Karatsuba, Toom-Cook, and NTT.

How does calx compute binomial coefficients efficiently?

calx uses the multiplicative formula C(n,k) = n*(n-1)*...*(n-k+1) / (1*2*...*k) with symmetry reduction (replacing k with n-k when k > n/2). Each step r_{i+1} = r_i * (n-i) / (i+1) is guaranteed to yield an integer, so only exact integer arithmetic is needed — no floating-point or rational numbers.

Int Factorial & Combinatorics Algorithms

Overview

The IntCombinatorics class provides factorial $n!$, double factorial $n!!$, and binomial coefficient $\binom{n}{k}$ as static methods. All return arbitrary-precision Int values, yielding exact results for inputs of any size.

The factorial algorithm is automatically selected based on the magnitude of $n$. For small $n$, a precomputed table returns results instantly; for mid-range $n$, a simple loop is used; and for large $n$, a sieve combined with tournament multiplication provides a high-speed algorithm.

For API usage, see Int API Reference — Combinatorial Functions.

Factorial Algorithm Selection

calx switches between the following factorial algorithms based on the value of $n$. The thresholds were determined through empirical benchmarks.

Range of $n$	Algorithm	Complexity
$n \leq 100$	Table lookup	$O(1)$
$101 \leq n \leq 228$	Simple loop	$O(n \times M(n))$
$n > 228$	Sieve + tournament multiplication	$O(n \log n \times M(n))$

Here $M(n)$ denotes the cost of a multiplication at the size corresponding to $n!$. By Stirling's approximation $\log(n!) \approx n \log n$, the word count of $n!$ grows roughly proportionally to $n$.

Table Lookup

A compile-time precomputed table holds the values of $n!$ for $n = 0$ through $n = 100$. Factorials in this range are returned in $O(1)$.

The magnitude of $100!$ is:

$$100! \approx 9.3 \times 10^{157}$$

This is about 158 decimal digits, fitting in roughly 9 words of 64 bits each. The entire table's memory footprint is tiny and easily fits in cache.

Although $n!$ fits in a 64-bit integer for $n \leq 20$, calx uniformly returns Int, so table lookup is used for the entire $n \leq 100$ range.

Simple Loop

For $101 \leq n \leq 228$, a straightforward sequential multiplication computes $n!$.

$$n! = 1 \times 2 \times 3 \times \cdots \times n$$

Each step performs a "large multi-precision integer $\times$ small integer (1 word)" multiplication. This is far lighter than a full multi-precision-by-multi-precision multiplication: multiplying an $n$-word integer by a 1-word value costs only $O(n)$.

The total loop complexity is $O(n \times M(n))$, where $M(n)$ here refers to the "large integer $\times$ small integer" cost, effectively $O(n)$. However, the cost of each multiplication grows linearly as the result accumulates.

For $n \leq 228$, the overhead of the sieve method (prime sieve construction, exponentiation, tournament recursion) exceeds the simplicity of the sequential loop, making the loop the fastest option. The threshold of 228 was determined by benchmarks.

Sieve + Tournament Multiplication

For $n > 228$, a sieve-based prime factorization method combined with tournament multiplication (binary splitting) is used.

Prime Factorization Representation

The prime factorization of $n!$ is:

$$n! = \prod_{p \leq n,\ p \text{ prime}} p^{v_p(n!)}$$

The exponent $v_p(n!)$ for each prime $p$ is given by Legendre's formula:

$$v_p(n!) = \sum_{k=1}^{\infty} \left\lfloor \frac{n}{p^k} \right\rfloor$$

This sum terminates when $p^k > n$, so it requires only finitely many steps.

Algorithm Steps

Enumerate all primes up to $n$ using the Sieve of Eratosthenes.
For each prime $p$, compute the exponent $v_p(n!)$ via Legendre's formula.
Compute each prime power $p^{v_p(n!)}$.
Combine the resulting array of prime powers via tournament multiplication.

Tournament Multiplication (Binary Splitting)

A naive sequential loop $((a \times b) \times c) \times d$ causes the product to grow rapidly on one side, resulting in unbalanced multiplications. Tournament multiplication recursively splits the array in half and pairs up elements:

$$(a \times b) \times (c \times d)$$

This keeps operand sizes roughly equal at each level, maximizing the efficiency of fast multiplication algorithms such as Karatsuba and Toom-Cook. Unbalanced multiplications (a huge number $\times$ a small number) cannot fully exploit the divide-and-conquer structure of sub-quadratic algorithms, so maintaining balanced operand sizes is critical.

Why the Sieve Method is Superior

While the simple loop computes $1 \times 2 \times 3 \times \cdots \times n$ sequentially, the sieve method offers the following advantages:

Elimination of redundant multiplications: Instead of multiplying composite numbers individually (e.g., $6 = 2 \times 3$), their prime factors are consolidated into exponents.
Balanced operand sizes: Tournament multiplication keeps operands evenly sized, allowing fast multiplication algorithms to operate at peak efficiency.

Complexity: $O(n \log n \times M(n))$

When $M(n)$ is super-linear (Karatsuba or higher), the size-balancing effect of tournament multiplication becomes pronounced, outperforming the simple loop's $O(n \times M(n))$.

Double Factorial

The double factorial $n!!$ is the product obtained by multiplying from $n$ downward in steps of 2.

For even $n$:

$$n!! = n \times (n-2) \times (n-4) \times \cdots \times 4 \times 2$$

For odd $n$:

$$n!! = n \times (n-2) \times (n-4) \times \cdots \times 3 \times 1$$

For example, $10!! = 10 \times 8 \times 6 \times 4 \times 2 = 3840$ and $9!! = 9 \times 7 \times 5 \times 3 \times 1 = 945$.

The double factorial is implemented as a simple loop. Since $n!!$ is substantially smaller than $n!$ (roughly half the number of factors), the overhead of sieve and tournament multiplication is rarely justified, and a simple loop delivers sufficient performance.

Binomial Coefficient

The binomial coefficient $\binom{n}{k}$ is defined as:

$$\binom{n}{k} = \frac{n!}{k!\,(n-k)!}$$

However, computing three full factorials and dividing is highly inefficient because the intermediate values are far larger than the final result. For instance, $\binom{1000}{500}$ has about 299 digits, but $1000!$ has about 2,568 digits.

Multiplicative Formula

Instead, the following multiplicative formula is used:

$$\binom{n}{k} = \frac{n \times (n-1) \times \cdots \times (n-k+1)}{1 \times 2 \times \cdots \times k}$$

This involves $k$ numerator terms and $k$ denominator terms, bypassing full factorials entirely.

Exploiting Symmetry

Since $\binom{n}{k} = \binom{n}{n-k}$, when $k > n/2$, $k$ is replaced with $n - k$ before computation. This can reduce the number of multiplications by up to half. For example, $\binom{100}{95}$ is computed as $\binom{100}{5}$, requiring only 5 multiply-divide steps instead of 95.

Sequential Multiply-Divide

The computation proceeds sequentially as:

$$r_0 = 1, \quad r_{i+1} = r_i \times \frac{n - i}{i + 1} \quad (i = 0, 1, \ldots, k-1)$$

Each step $r_i \times (n - i) / (i + 1)$ is guaranteed to yield an integer. This is a consequence of the combinatorial property that $r_i = \binom{n}{i}$ is always an integer. Therefore, the computation requires only integer multiplication and exact division — no floating-point or rational arithmetic is needed.

Design Decisions

Rationale for the Threshold at 228

The sieve + tournament multiplication method incurs overhead from prime sieve construction, exponentiation of each prime, and the tournament recursion. For small $n$, this overhead exceeds the cost of the simple sequential loop.

$n = 228$ is the empirical break-even point: beyond this value, the advantages of the sieve method (elimination of redundant multiplications and balanced operand sizes) outweigh the overhead.

Importance of Tournament Multiplication

The prime factorization of $10000!$ via the sieve yields approximately 1,229 prime powers (the number of primes up to 10,000). These prime powers vary enormously in size (e.g., $2^{9995}$ is huge while $9973^1$ is small). Sequential multiplication would cause one side to grow rapidly, producing unbalanced operations. Tournament multiplication maintains balanced operand sizes through pairing, maximizing the efficiency of fast multiplication algorithms.