Int GCD Algorithms

Overview

The GCD in calx is implemented as a two-tier Lehmer GCD + recursive HGCD (Half-GCD) scheme. For small sizes it uses Lehmer GCD, and switches to HGCD (Möller 2008) for 126 words or more.

  • gcd(a, b) — Computes the greatest common divisor using Lehmer GCD + HGCD. Outperforms GMP at small to medium sizes and matches it at large sizes.
  • extendedGcd(a, b, x, y) — Computes $\gcd(a, b)$ and the Bezout coefficients $x$, $y$ ($ax + by = \gcd(a, b)$) simultaneously using the same HGCD infrastructure. Efficiently implemented via SignedMpn cofactor tracking.

Complexity: Lehmer GCD is $O(n^2)$; HGCD is $O(M(n) \log n)$ (where $M(n)$ is the multiplication cost).

Performance (vs. GMP mpz_gcd): 0.99x at 16K bits (on par), 1.31x at 4K bits. extendedGcd also achieves 0.99x at 16K bits.

For usage details, see Int API Reference — GCD / LCM.

Lehmer GCD

Lehmer GCD is an improved version of the Euclidean algorithm. It uses only the top 1–2 words of the arbitrary-precision integers to predict multiple quotient steps at once, greatly reducing the number of full-precision divisions.

Basic Principle

Approximate quotients are computed from the leading digits of the arbitrary-precision integers $a$ and $b$. As long as the approximation remains valid, the algorithm advances without full-precision arithmetic. When the approximation becomes unreliable, it falls back to full-precision operations.

  1. Extract the top 1 word (64 bits) of $a$ and $b$ as $\hat{a}$ and $\hat{b}$.
  2. Run the Euclidean algorithm on $\hat{a}$ and $\hat{b}$, accumulating the transformation matrix $\begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix}$.
  3. Stop when $\hat{b}$ becomes too small or the approximation accuracy can no longer be guaranteed.
  4. Apply the full-precision update: $a' = \alpha \cdot a + \beta \cdot b$, $b' = \gamma \cdot a + \delta \cdot b$.
  5. If $b' \neq 0$, go back to step 1.

Complexity: $O(n^2)$ (same order as the Euclidean algorithm, but with an improved constant factor).

HGCD (Half-GCD)

HGCD is a divide-and-conquer algorithm that reduces the GCD complexity to $O(M(n) \log n)$ (where $M(n)$ is the cost of $n$-word multiplication). In calx, it is based on Möller (2008) and is activated for 126 words or more.

Basic Principle

An $n$-word GCD problem is reduced to an $n/2$-word HGCD problem and solved recursively. At each level of recursion, a transformation matrix is computed and merged bottom-up.

  1. Recursively call HGCD on the upper half of $a$ and $b$.
  2. Apply the resulting transformation matrix to the full $a$ and $b$.
  3. Call HGCD again on the remaining portion.
  4. Compose the two transformation matrices and return.

Complexity: $O(M(n) \log n)$. With NTT multiplication this becomes $O(n \log^2 n \log \log n)$.

Threshold and Performance

The crossover threshold between Lehmer GCD and HGCD is 126 words (approximately 2,400 digits). This break-even point was determined by benchmarking — it is where the overhead of HGCD recursion and matrix multiplication drops below the cost of Lehmer's linear scan.

Performance comparison with GMP mpz_gcd (calx / GMP; 1.0x means equal):

  • 4K bits (1,233 digits): 1.31x (calx slightly slower — Int construction overhead)
  • 16K bits (4,932 digits): 0.99x (on par)
  • 64K bits (19,728 digits): 1.27x

HGCD in extendedGcd

The extended GCD (Bezout coefficient computation) uses the same HGCD infrastructure. With SignedMpn-based cofactor tracking and phys_swap for buffer coherence, it achieves speedups of 4.6x at 100 digits, 33x at 500 digits, and 52x at 5,000 digits over the naive $O(n^2)$ approach.

Extended Euclidean Algorithm

The Extended Euclidean Algorithm computes $\gcd(a, b)$ while simultaneously finding integers $x$ and $y$ (Bezout coefficients) satisfying:

$$ax + by = \gcd(a, b)$$

This identity is known as Bezout's identity and shows that $\gcd(a, b)$ can be expressed as an integer linear combination of $a$ and $b$.

HGCD-Based Extended GCD

The extended GCD in calx uses the same Lehmer + HGCD algorithm as gcd(), simultaneously tracking the Bezout coefficients (cofactors) at each step. Cofactors are managed using SignedMpn (signed arbitrary-precision arrays), with phys_swap providing buffer coherence guarantees for efficient implementation.

Algorithm

Starting from $(r_0, x_0, y_0) = (a, 1, 0)$ and $(r_1, x_1, y_1) = (b, 0, 1)$, the HGCD steps update $x$ and $y$ simultaneously with the transformation matrix:

$$q_i = \lfloor r_{i-1} / r_i \rfloor$$

$$r_{i+1} = r_{i-1} - q_i \cdot r_i$$

$$x_{i+1} = x_{i-1} - q_i \cdot x_i$$

$$y_{i+1} = y_{i-1} - q_i \cdot y_i$$

The last nonzero $r_i$ is $\gcd(a, b)$, and the corresponding $x_i$, $y_i$ are the Bezout coefficients.

Applications

  • Modular inverse: When $\gcd(a, m) = 1$, the $x$ satisfying $ax \equiv 1 \pmod{m}$ can be found via the extended GCD. inverseMod() uses this internally.
  • Linear Diophantine equations: Integer solutions to $ax + by = c$ can be constructed using the extended GCD when $\gcd(a, b) \mid c$.

Complexity: $O(M(n) \log n)$ (HGCD-based). Compared to GMP mpz_gcdext: 0.99x at 16K bits (on par). Compared to naive extended Euclidean $O(n^2)$: 52x faster at 5,000 digits.

LCM Computation

The least common multiple (LCM) is computed from the GCD as follows:

$$\text{lcm}(a, b) = \frac{|a| \cdot |b|}{\gcd(a, b)}$$

However, to avoid intermediate overflow, the implementation performs the division first:

$$\text{lcm}(a, b) = |a| \times \left( \frac{|b|}{\gcd(a, b)} \right)$$

Since $\gcd(a, b)$ is guaranteed to divide $b$, $|b| / \gcd(a, b)$ can be computed exactly by integer division. By performing the division first, the unnecessarily large intermediate product $|a| \times |b|$ is avoided, reducing both memory usage and computation time.

Design Decisions

Why the Two-Tier Lehmer + HGCD Approach

Several GCD algorithm candidates exist:

  • Euclidean algorithm: $O(n^2)$, but requires a full-precision division at every step, making it slow.
  • Binary GCD: $O(n^2)$, operates using only shifts and subtractions. 30–50% faster than the Euclidean algorithm, but slower than HGCD at large sizes.
  • Lehmer GCD: $O(n^2)$, predicts multiple steps from top words. Fastest at small to medium sizes.
  • HGCD: $O(M(n) \log n)$, divide-and-conquer. Asymptotically fastest at large sizes.

calx adopts the Lehmer (small sizes) + HGCD (large sizes) two-tier approach. Binary GCD is simple but slower than Lehmer, and the Euclidean algorithm incurs expensive full-precision divisions. The Lehmer + HGCD combination achieves performance on par with or better than GMP across all size ranges.

References

  • Möller, N. (2008). “On Schönhage’s algorithm and subquadratic integer GCD computation”. Mathematics of Computation, 77(261), 589–607.
  • Stein, J. (1967). “Computational problems associated with Racah algebra”. Journal of Computational Physics, 1(3), 397–405.
  • Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.5.2.