Why did calx choose the Chudnovsky series for pi computation?

The Chudnovsky series converges at approximately 14.18 digits per term, making it among the fastest known series. Combined with Binary Splitting and FFT multiplication, it achieves O(M(n) log n) complexity and is empirically faster than the AGM method.

What is Binary Splitting?

Binary Splitting is a technique that recursively divides the N terms of a series in half and accumulates intermediate results using integer arithmetic. Only one Float division is performed at the final stage, minimizing floating-point operations. Integer multiplications of O(M(n)) are performed at each level with depth O(log N), yielding overall complexity O(M(n) log N).

How is circular dependency in constant computation avoided?

All constant computations use only the four basic arithmetic operations and sqrt, without relying on transcendental functions such as exp or log. For example, log 2 is computed via Binary Splitting of the atanh series, not by calling the log function. The only exception is the computation of gamma, which uses log(n), but this does not depend on gamma itself, so no circular dependency arises.

Float Mathematical Constant Algorithms

Overview

The calx Float class provides the following 7 mathematical constants at arbitrary precision:

Constant	API	Algorithm
$\pi$	`Float::pi(prec)`	Chudnovsky series + Binary Splitting
$e$	`Float::e(prec)`	Factorial series + Binary Splitting
$\log 2$	`Float::log2(prec)`	atanh series + Binary Splitting
$\log 10$	`Float::log10(prec)`	atanh series + Binary Splitting
$\gamma$ (Euler-Mascheroni)	`Float::euler(prec)`	Brent-McMillan method
$G$ (Catalan)	`Float::catalan(prec)`	Euler's fast-convergence formula + Binary Splitting
$\sqrt{2}$	`Float::sqrt2(prec)`	Reduction to integer sqrtRem

Three common design principles underlie all constant computations:

thread_local cache: avoids recomputation at the same precision. No mutex required
Binary Splitting (BS): recursively divides the series using integer arithmetic, performing only one Float division at the final stage. Complexity is $O(M(n) \cdot \log N)$
Circular dependency avoidance: constant computations use only basic arithmetic + sqrt, without depending on transcendental functions such as exp/log

$\pi$ — Chudnovsky Series

calx evaluates the Chudnovsky brothers (1988) series via Binary Splitting. Compared to the AGM (Arithmetic-Geometric Mean) method, Binary Splitting pairs well with FFT multiplication and is empirically faster, hence the choice of the Chudnovsky series.

Formula

$$\pi = 426880 \sqrt{10005} \cdot \frac{Q}{T}$$

where $T/Q$ is given by the series:

$$\frac{T}{Q} = \sum_{k=0}^{N} (-1)^k \cdot \frac{(6k)! \cdot (A + Bk)}{(3k)! \cdot (k!)^3 \cdot C^{3k}}$$

The parameters are $A = 13591409$, $B = 545140134$, $C = 640320$. The series converges at approximately 14.18 digits per term, among the fastest known $\pi$ series.

Structure of Binary Splitting

Why Binary Splitting Is Needed

The Chudnovsky series is truncated at $N$ terms. Naive sequential summation yields correct results, but each addition involves multiprecision Float arithmetic, costing $O(N \cdot M(n))$ overall. Binary Splitting performs all intermediate computation in integers (Int), with only a single Float division at the end, improving the complexity to $O(M(n) \cdot \log N)$.

Expressing Each Term as Integer Functions

Decomposing the $k$-th term of the Chudnovsky series into numerator and denominator yields three integer functions:

Function	Definition	Role
$p(k)$	$(6k-5)(6k-4)(6k-3)(6k-2)(6k-1)(6k)$	Numerator factorial part: 6 consecutive factors of $(6k)!$
$q(k)$	$(3k-2)(3k-1)(3k) \cdot k^3 \cdot C^3$	Denominator: 3 factors of $(3k)!$ $\times$ factor of $(k!)^3$ $\times$ factor of $C^{3k}$
$t(k)$	$(-1)^k \cdot p(k) \cdot (A + B \cdot k)$	Full numerator: sign $\times$ factorial part $\times$ linear term

$p(k)$ is "the factor multiplied into the numerator when advancing from one term to the next", and $q(k)$ is "the factor multiplied into the denominator when advancing from one term to the next". In other words, the ratio of adjacent terms $a_{k}/a_{k-1}$ equals $p(k)/q(k)$ times sign and linear factors.

Meaning of the Triple $(P,\; Q,\; T)$

To maintain the partial sum of an interval $[a, b)$ as integers, define the following three quantities:

$P_{a,b} = \prod_{k=a}^{b-1} p(k)$ — product of numerator factors in the interval
$Q_{a,b} = \prod_{k=a}^{b-1} q(k)$ — product of denominator factors in the interval
$T_{a,b}$ — partial sum of the interval (the numerator when expressed with common denominator $Q_{a,b}$)

$T_{a,b}$ represents the numerator of the partial series sum when $Q_{a,b}$ is used as a common denominator. That is:

$$\frac{T_{a,b}}{Q_{a,b}} = \sum_{k=a}^{b-1} \frac{t(k)}{q(a) \cdot q(a+1) \cdots q(k)}$$

$T$ is defined so that this equality holds.

Derivation of the Recursive Merge

Split the interval $[a, b)$ at midpoint $m = \lfloor (a+b)/2 \rfloor$. The formulas for reconstructing the whole from left half $[a, m)$ and right half $[m, b)$ are:

$$P_{a,b} = P_{a,m} \cdot P_{m,b}$$

$$Q_{a,b} = Q_{a,m} \cdot Q_{m,b}$$

$$T_{a,b} = T_{a,m} \cdot Q_{m,b} + P_{a,m} \cdot T_{m,b}$$

$P$ and $Q$ are products, so splitting merely requires multiplying the halves together. The merge of $T$ is the core and can be understood as follows:

$T_{a,m} \cdot Q_{m,b}$: Cross-multiplying the left-half sum with the right-half denominator. The left half was computed with denominator $Q_{a,m}$, but the full denominator is $Q_{a,m} \cdot Q_{m,b}$, so $T_{a,m}$ is multiplied by $Q_{m,b}$ to equalize denominators
$P_{a,m} \cdot T_{m,b}$: Adjusting the right-half sum by the left-half numerator factors. Each right-half term "starts from term $m$", so $p(a) \cdots p(m-1) = P_{a,m}$ is multiplied into the numerator

Leaf Nodes (Single-Term Case)

When $b - a = 1$ (a single term in the interval), the recursion bottoms out. The $k$-th term is computed directly:

$k = 0$: $P = 1$, $Q = 1$, $T = A = 13591409$
$k \geq 1$: $P = p(k)$, $Q = q(k)$, $T = t(k) = (-1)^k \cdot p(k) \cdot (A + Bk)$

Concrete Example: $N = 4$ Terms

Splitting 4 terms of the series:

              [0, 4)
             /      \
         [0, 2)    [2, 4)
         /   \      /   \
      [0,1) [1,2) [2,3) [3,4)
        ↓     ↓     ↓     ↓
       k=0   k=1   k=2   k=3

First, $(P, Q, T)$ is computed at each leaf node. Then, bottom-up merging is performed. From the final $(P_{0,4},\; Q_{0,4},\; T_{0,4})$:

$$\pi = 426880 \sqrt{10005} \cdot \frac{Q_{0,4}}{T_{0,4}}$$

is obtained via a single Float division. $P_{0,4}$ is not needed for the final result, so the computation of $P$ is omitted at the top-level merge.

Optimizations

4-way recursive unrolling: The standard 2-way split (left/right) is unrolled by 2 levels, processing 4 subproblems in a single function call. This reduces function call overhead and increases parallelism from 2 to 4
Parallel Binary Splitting: When the term count exceeds BS_PARALLEL_THRESHOLD = 1000, left and right subproblems are computed in separate threads. For 1M digits, approximately 71K terms are needed, and the top ~6 levels are parallelized

$e$ — Factorial Series

Napier's constant $e$ is computed via the series:

$$e = \sum_{k=0}^{N} \frac{1}{k!}$$

Structure of Binary Splitting

In this series $p(k) = 1$ (always 1), so tracking $P$ is unnecessary, and the recursion uses only the pair $(Q,\; T)$:

$$q(k) = k + 1$$

$$Q_{a,b} = Q_{a,m} \cdot Q_{m,b}$$

$$T_{a,b} = T_{a,m} \cdot Q_{m,b} + T_{m,b}$$

The final result $e = 1 + T_{0,N} / Q_{0,N}$ is obtained via a single Float division.

Optimizations

factorial_bs applies the same 4-way unrolling and parallel merge as the Chudnovsky series for $\pi$. Since $P$ is omitted, the number of multiplications is smaller, making this a lighter computation than $\pi$.

$\log 2$, $\log 10$ — atanh Series

Logarithmic constants are computed without calling the log function, instead evaluating the atanh series expansion via Binary Splitting. This completely avoids circular dependencies.

Formula

The series expansion of the inverse hyperbolic tangent is:

$$\operatorname{atanh}\!\left(\frac{1}{n}\right) = \frac{1}{n} \sum_{k=0}^{N} \frac{1}{(2k+1) \cdot n^{2k}}$$

Using the relation $\log\!\left(\frac{1+x}{1-x}\right) = 2 \operatorname{atanh}(x)$, the following Machin-like formulas are derived:

$$\log 2 = 2 \operatorname{atanh}\!\left(\frac{1}{3}\right)$$

$$\log 10 = 6 \operatorname{atanh}\!\left(\frac{1}{3}\right) + 2 \operatorname{atanh}\!\left(\frac{1}{9}\right)$$

Structure of Binary Splitting

For the series $\operatorname{atanh}(1/n)$:

$$p(k) = 2k + 1$$

$$q(k) = (2k + 3) \cdot n^2$$

$(P,\; Q,\; T)$ are merged recursively, with a single Float division at the final stage. For $\log 10$, two atanh series are evaluated independently and added together.

$\log 2$ and $\log 10$ are computed using only arithmetic operations, without calling Float::log(). Conversely, Float::log() may internally reference $\log 2$ and $\log 10$, so this independence is essential for avoiding circular references.

$\gamma$ — Brent-McMillan Method

The Euler-Mascheroni constant $\gamma$ is defined by a slowly converging limit:

$$\gamma = \lim_{n \to \infty} \left( \sum_{k=1}^{n} \frac{1}{k} - \ln n \right) \approx 0.5772156649\ldots$$

Brent & McMillan (1980) provided a fast-converging formula using modified Bessel functions.

Algorithm

Choose a sufficiently large parameter $n$ ($n \approx p / \log 2$, where $p$ is the target bit count), and evaluate the following two series via Binary Splitting:

$I = I_0(2n)$: series of the modified Bessel function of the first kind
$S$: series of $I$ weighted by harmonic numbers

The final result is

$$\gamma = \frac{S}{I} - \log(n)$$

where $\log(n)$ can be computed via Float::log() since $n$ is an integer (no circular dependency on $\gamma$ arises).

The Brent-McMillan method is the most computationally expensive among the 7 constants. The number of Bessel function series terms is $O(p)$, and combined with the Binary Splitting depth of $O(\log p)$, the overall complexity is $O(M(p) \cdot \log^2 p)$.

Catalan's Constant $G$

Catalan's constant is defined by the series:

$$G = \sum_{k=0}^{\infty} \frac{(-1)^k}{(2k+1)^2} = 1 - \frac{1}{9} + \frac{1}{25} - \frac{1}{49} + \cdots \approx 0.9159655941\ldots$$

This series converges too slowly for direct use. Instead, Euler's fast-convergence formula is evaluated via Binary Splitting.

Fast-Convergence Formula

Euler's formula involves central binomial coefficients, yielding exponentially fast convergence per term. Binary Splitting recursively merges $(P,\; Q,\; T)$, with a single Float division at the final stage.

As with other constants, only arithmetic operations are used, with no dependence on transcendental functions.

$\sqrt{2}$

$\sqrt{2}$ is computed directly via Float::sqrt(Float(2)) without series expansion.

Internally, dispatch depends on precision:

Low precision ($\leq 64$ bit): sqrt_newton_1 — std::sqrt + 1 Newton step + intrinsic
High precision: Zimmermann's dc_sqrtrem — reduction to integer square root

See Float Square Root Algorithm and Int Square Root Algorithm for details.

thread_local Cache

To avoid recomputing constants, each constant has a thread_local cache.

Cache Structure

For each constant, two thread_local variables are maintained:

thread_local Float cached_pi;
thread_local int  cached_pi_prec = 0;

Cache Behavior

Requested precision ≤ cached precision: return the cached value rounded to the requested precision
Requested precision > cached precision: re-evaluate the series and update the cache

Advantages of thread_local

No mutex required: each thread has its own independent cache, so no inter-thread contention occurs
Lock-free: no atomic operations or mutual exclusion are needed for cache reads and writes
Trade-off: caches are not shared between threads, so when multiple threads request the same constant, each computes it independently. However, since constant computation is typically performed only once, this overhead is small

Complexity of Binary Splitting

Binary Splitting is an efficient evaluation technique for series $\sum_{k=0}^{N} a_k / b_k$, and is the foundation of all constant computations in calx (except $\sqrt{2}$).

Principle

The $N$-term series is recursively split in half, with integer multiplication at each level:

Split interval $[0, N)$ into $[0, m)$ and $[m, N)$ ($m = \lfloor N/2 \rfloor$)
Recursively compute partial sums as integer tuples $(P, Q, T)$
Merge via integer multiplication and addition: $T_{0,N} = T_{0,m} \cdot Q_{m,N} + P_{0,m} \cdot T_{m,N}$
At leaves (single terms), $P$, $Q$, $T$ are computed directly from constant terms
Only one Float division at the final stage: $\text{result} = T_{0,N} / Q_{0,N}$

Complexity Analysis

The recursion depth is $O(\log N)$, and $O(M(n))$ integer multiplications are performed at each level (where $M(n)$ is the cost of $n$-bit integer multiplication). The overall complexity is therefore:

$$O(M(n) \cdot \log N)$$

Only one floating-point division is performed at the final stage, yielding a significant efficiency gain over the naive approach of accumulating the series in Float, which costs $O(N \cdot M(n))$.

Advantages of Staying in Integer Arithmetic

The core insight of Binary Splitting is that all intermediate computation is performed in integers. Floating-point arithmetic accumulates rounding error at each step, but integer arithmetic is exact, so no error is introduced in intermediate results. Rounding occurs only at the single final division, simplifying precision management and facilitating correct rounding.

References

Chudnovsky, D. V. and Chudnovsky, G. V. (1988). "Approximations and Complex Multiplication According to Ramanujan". In: Ramanujan Revisited. Academic Press, pp. 375–472.
Brent, R. P. and McMillan, E. M. (1980). "Some New Algorithms for High-Precision Computation of Euler's Constant". Mathematics of Computation, 34(149), pp. 305–312.
Haible, B. and Papanikolaou, T. (1998). "Fast Multiprecision Evaluation of Series of Rational Numbers". Algorithmic Number Theory (ANTS-III). LNCS 1423, pp. 338–350.
Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 4.8: Binary Splitting.
Borwein, J. M. and Borwein, P. B. (1987). Pi and the AGM. Wiley-Interscience.

Related article: Float Square Root Algorithm (used in the computation of $\sqrt{2}$)