Which GCD algorithms does calx use?

calx uses a two-tier approach: Lehmer GCD for small to medium sizes, and recursive HGCD (Half-GCD, based on Möller 2008) for 126 words or more. This combination achieves performance on par with GMP across all size ranges.

What is the complexity of HGCD?

HGCD has a complexity of O(M(n) log n), where M(n) is the cost of n-word multiplication. With NTT multiplication this becomes O(n log²n log log n), making it asymptotically the fastest known GCD algorithm.

Does calx support the Extended Euclidean Algorithm?

Yes. extendedGcd(a, b, x, y) computes the GCD and Bezout coefficients (x, y satisfying ax + by = gcd(a, b)) simultaneously using the same HGCD infrastructure. It achieves 52x speedup over a naive O(n²) implementation at 5,000 digits.

Int GCD Algorithms

Overview

The GCD in calx is implemented as a two-tier Lehmer GCD + recursive HGCD (Half-GCD) scheme. For small sizes it uses Lehmer GCD, and switches to HGCD (Möller 2008) for 126 words or more.

gcd(a, b) — Computes the greatest common divisor using Lehmer GCD + HGCD. Outperforms GMP at small to medium sizes and matches it at large sizes.
extendedGcd(a, b, x, y) — Computes $\gcd(a, b)$ and the Bezout coefficients $x$, $y$ ($ax + by = \gcd(a, b)$) simultaneously using the same HGCD infrastructure. Efficiently implemented via SignedMpn cofactor tracking.

Complexity: Lehmer GCD is $O(n^2)$; HGCD is $O(M(n) \log n)$ (where $M(n)$ is the multiplication cost).

Performance (vs. GMP mpz_gcd): 0.99x at 16K bits (on par), 1.31x at 4K bits. extendedGcd also achieves 0.99x at 16K bits.

For usage details, see Int API Reference — GCD / LCM.

Lehmer GCD

Lehmer GCD is an improved version of the Euclidean algorithm. It uses only the top 1–2 words of the arbitrary-precision integers to predict multiple quotient steps at once, greatly reducing the number of full-precision divisions.

Basic Principle

Approximate quotients are computed from the leading digits of the arbitrary-precision integers $a$ and $b$. As long as the approximation remains valid, the algorithm advances without full-precision arithmetic. When the approximation becomes unreliable, it falls back to full-precision operations.

Extract the top 1 word (64 bits) of $a$ and $b$ as $\hat{a}$ and $\hat{b}$.
Run the Euclidean algorithm on $\hat{a}$ and $\hat{b}$, accumulating the transformation matrix $\begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix}$.
Stop when $\hat{b}$ becomes too small or the approximation accuracy can no longer be guaranteed.
Apply the full-precision update: $a' = \alpha \cdot a + \beta \cdot b$, $b' = \gamma \cdot a + \delta \cdot b$.
If $b' \neq 0$, go back to step 1.

Complexity: $O(n^2)$ (same order as the Euclidean algorithm, but with an improved constant factor).

HGCD (Half-GCD)

HGCD is a divide-and-conquer algorithm that reduces the GCD complexity to $O(M(n) \log n)$ (where $M(n)$ is the cost of $n$-word multiplication). In calx, it is based on Möller (2008) and is activated for 126 words or more.

Basic Principle

An $n$-word GCD problem is reduced to an $n/2$-word HGCD problem and solved recursively. At each level of recursion, a transformation matrix is computed and merged bottom-up.

Recursively call HGCD on the upper half of $a$ and $b$.
Apply the resulting transformation matrix to the full $a$ and $b$.
Call HGCD again on the remaining portion.
Compose the two transformation matrices and return.

Complexity: $O(M(n) \log n)$. With NTT multiplication this becomes $O(n \log^2 n \log \log n)$.

Threshold and Performance

The crossover threshold between Lehmer GCD and HGCD is 126 words (approximately 2,400 digits). This break-even point was determined by benchmarking — it is where the overhead of HGCD recursion and matrix multiplication drops below the cost of Lehmer's linear scan.

Performance comparison with GMP mpz_gcd (calx / GMP; 1.0x means equal):

4K bits (1,233 digits): 1.31x (calx slightly slower — Int construction overhead)
16K bits (4,932 digits): 0.99x (on par)
64K bits (19,728 digits): 1.27x

HGCD in extendedGcd

The extended GCD (Bezout coefficient computation) uses the same HGCD infrastructure. With SignedMpn-based cofactor tracking and phys_swap for buffer coherence, it achieves speedups of 4.6x at 100 digits, 33x at 500 digits, and 52x at 5,000 digits over the naive $O(n^2)$ approach.

Extended Euclidean Algorithm

The Extended Euclidean Algorithm computes $\gcd(a, b)$ while simultaneously finding integers $x$ and $y$ (Bezout coefficients) satisfying:

$$ax + by = \gcd(a, b)$$

This identity is known as Bezout's identity and shows that $\gcd(a, b)$ can be expressed as an integer linear combination of $a$ and $b$.

HGCD-Based Extended GCD

The extended GCD in calx uses the same Lehmer + HGCD algorithm as gcd(), simultaneously tracking the Bezout coefficients (cofactors) at each step. Cofactors are managed using SignedMpn (signed arbitrary-precision arrays), with phys_swap providing buffer coherence guarantees for efficient implementation.

Algorithm

Starting from $(r_0, x_0, y_0) = (a, 1, 0)$ and $(r_1, x_1, y_1) = (b, 0, 1)$, the HGCD steps update $x$ and $y$ simultaneously with the transformation matrix:

$$q_i = \lfloor r_{i-1} / r_i \rfloor$$

$$r_{i+1} = r_{i-1} - q_i \cdot r_i$$

$$x_{i+1} = x_{i-1} - q_i \cdot x_i$$

$$y_{i+1} = y_{i-1} - q_i \cdot y_i$$

The last nonzero $r_i$ is $\gcd(a, b)$, and the corresponding $x_i$, $y_i$ are the Bezout coefficients.

Applications

Modular inverse: When $\gcd(a, m) = 1$, the $x$ satisfying $ax \equiv 1 \pmod{m}$ can be found via the extended GCD. inverseMod() uses this internally.
Linear Diophantine equations: Integer solutions to $ax + by = c$ can be constructed using the extended GCD when $\gcd(a, b) \mid c$.

Complexity: $O(M(n) \log n)$ (HGCD-based). Compared to GMP mpz_gcdext: 0.99x at 16K bits (on par). Compared to naive extended Euclidean $O(n^2)$: 52x faster at 5,000 digits.

LCM Computation

The least common multiple (LCM) is computed from the GCD as follows:

$$\text{lcm}(a, b) = \frac{|a| \cdot |b|}{\gcd(a, b)}$$

However, to avoid intermediate overflow, the implementation performs the division first:

$$\text{lcm}(a, b) = |a| \times \left( \frac{|b|}{\gcd(a, b)} \right)$$

Since $\gcd(a, b)$ is guaranteed to divide $b$, $|b| / \gcd(a, b)$ can be computed exactly by integer division. By performing the division first, the unnecessarily large intermediate product $|a| \times |b|$ is avoided, reducing both memory usage and computation time.

Design Decisions

Why the Two-Tier Lehmer + HGCD Approach

Several GCD algorithm candidates exist:

Euclidean algorithm: $O(n^2)$, but requires a full-precision division at every step, making it slow.
Binary GCD: $O(n^2)$, operates using only shifts and subtractions. 30–50% faster than the Euclidean algorithm, but slower than HGCD at large sizes.
Lehmer GCD: $O(n^2)$, predicts multiple steps from top words. Fastest at small to medium sizes.
HGCD: $O(M(n) \log n)$, divide-and-conquer. Asymptotically fastest at large sizes.

calx adopts the Lehmer (small sizes) + HGCD (large sizes) two-tier approach. Binary GCD is simple but slower than Lehmer, and the Euclidean algorithm incurs expensive full-precision divisions. The Lehmer + HGCD combination achieves performance on par with or better than GMP across all size ranges.

References

Möller, N. (2008). “On Schönhage’s algorithm and subquadratic integer GCD computation”. Mathematics of Computation, 77(261), 589–607.
Stein, J. (1967). “Computational problems associated with Racah algebra”. Journal of Computational Physics, 1(3), 397–405.
Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.5.2.