Int GCD Algorithms
Overview
The GCD in calx is implemented as a two-tier Lehmer GCD + recursive HGCD (Half-GCD) scheme. For small sizes it uses Lehmer GCD, and switches to HGCD (Möller 2008) for 126 words or more.
gcd(a, b)— Computes the greatest common divisor using Lehmer GCD + HGCD. Outperforms GMP at small to medium sizes and matches it at large sizes.extendedGcd(a, b, x, y)— Computes $\gcd(a, b)$ and the Bezout coefficients $x$, $y$ ($ax + by = \gcd(a, b)$) simultaneously using the same HGCD infrastructure. Efficiently implemented via SignedMpn cofactor tracking.
Complexity: Lehmer GCD is $O(n^2)$; HGCD is $O(M(n) \log n)$ (where $M(n)$ is the multiplication cost).
Performance (vs. GMP mpz_gcd): 0.99x at 16K bits (on par), 1.31x at 4K bits. extendedGcd also achieves 0.99x at 16K bits.
For usage details, see Int API Reference — GCD / LCM.
Lehmer GCD
Lehmer GCD is an improved version of the Euclidean algorithm. It uses only the top 1–2 words of the arbitrary-precision integers to predict multiple quotient steps at once, greatly reducing the number of full-precision divisions.
Basic Principle
Approximate quotients are computed from the leading digits of the arbitrary-precision integers $a$ and $b$. As long as the approximation remains valid, the algorithm advances without full-precision arithmetic. When the approximation becomes unreliable, it falls back to full-precision operations.
- Extract the top 1 word (64 bits) of $a$ and $b$ as $\hat{a}$ and $\hat{b}$.
- Run the Euclidean algorithm on $\hat{a}$ and $\hat{b}$, accumulating the transformation matrix $\begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix}$.
- Stop when $\hat{b}$ becomes too small or the approximation accuracy can no longer be guaranteed.
- Apply the full-precision update: $a' = \alpha \cdot a + \beta \cdot b$, $b' = \gamma \cdot a + \delta \cdot b$.
- If $b' \neq 0$, go back to step 1.
Complexity: $O(n^2)$ (same order as the Euclidean algorithm, but with an improved constant factor).
HGCD (Half-GCD)
HGCD is a divide-and-conquer algorithm that reduces the GCD complexity to $O(M(n) \log n)$ (where $M(n)$ is the cost of $n$-word multiplication). In calx, it is based on Möller (2008) and is activated for 126 words or more.
Basic Principle
An $n$-word GCD problem is reduced to an $n/2$-word HGCD problem and solved recursively. At each level of recursion, a transformation matrix is computed and merged bottom-up.
- Recursively call HGCD on the upper half of $a$ and $b$.
- Apply the resulting transformation matrix to the full $a$ and $b$.
- Call HGCD again on the remaining portion.
- Compose the two transformation matrices and return.
Complexity: $O(M(n) \log n)$. With NTT multiplication this becomes $O(n \log^2 n \log \log n)$.
Threshold and Performance
The crossover threshold between Lehmer GCD and HGCD is 126 words (approximately 2,400 digits). This break-even point was determined by benchmarking — it is where the overhead of HGCD recursion and matrix multiplication drops below the cost of Lehmer's linear scan.
Performance comparison with GMP mpz_gcd (calx / GMP; 1.0x means equal):
- 4K bits (1,233 digits): 1.31x (calx slightly slower — Int construction overhead)
- 16K bits (4,932 digits): 0.99x (on par)
- 64K bits (19,728 digits): 1.27x
HGCD in extendedGcd
The extended GCD (Bezout coefficient computation) uses the same HGCD infrastructure. With SignedMpn-based cofactor tracking and phys_swap for buffer coherence, it achieves speedups of 4.6x at 100 digits, 33x at 500 digits, and 52x at 5,000 digits over the naive $O(n^2)$ approach.
Related articles: Binary GCD (Stein's Algorithm) | Advanced Arbitrary-Precision Integer Operations
Extended Euclidean Algorithm
The Extended Euclidean Algorithm computes $\gcd(a, b)$ while simultaneously finding integers $x$ and $y$ (Bezout coefficients) satisfying:
$$ax + by = \gcd(a, b)$$
This identity is known as Bezout's identity and shows that $\gcd(a, b)$ can be expressed as an integer linear combination of $a$ and $b$.
HGCD-Based Extended GCD
The extended GCD in calx uses the same Lehmer + HGCD algorithm as gcd(),
simultaneously tracking the Bezout coefficients (cofactors) at each step.
Cofactors are managed using SignedMpn (signed arbitrary-precision arrays),
with phys_swap providing buffer coherence guarantees for efficient implementation.
Algorithm
Starting from $(r_0, x_0, y_0) = (a, 1, 0)$ and $(r_1, x_1, y_1) = (b, 0, 1)$, the HGCD steps update $x$ and $y$ simultaneously with the transformation matrix:
$$q_i = \lfloor r_{i-1} / r_i \rfloor$$
$$r_{i+1} = r_{i-1} - q_i \cdot r_i$$
$$x_{i+1} = x_{i-1} - q_i \cdot x_i$$
$$y_{i+1} = y_{i-1} - q_i \cdot y_i$$
The last nonzero $r_i$ is $\gcd(a, b)$, and the corresponding $x_i$, $y_i$ are the Bezout coefficients.
Applications
- Modular inverse:
When $\gcd(a, m) = 1$, the $x$ satisfying $ax \equiv 1 \pmod{m}$
can be found via the extended GCD.
inverseMod()uses this internally. - Linear Diophantine equations: Integer solutions to $ax + by = c$ can be constructed using the extended GCD when $\gcd(a, b) \mid c$.
Complexity: $O(M(n) \log n)$ (HGCD-based). Compared to GMP mpz_gcdext: 0.99x at 16K bits (on par). Compared to naive extended Euclidean $O(n^2)$: 52x faster at 5,000 digits.
LCM Computation
The least common multiple (LCM) is computed from the GCD as follows:
$$\text{lcm}(a, b) = \frac{|a| \cdot |b|}{\gcd(a, b)}$$
However, to avoid intermediate overflow, the implementation performs the division first:
$$\text{lcm}(a, b) = |a| \times \left( \frac{|b|}{\gcd(a, b)} \right)$$
Since $\gcd(a, b)$ is guaranteed to divide $b$, $|b| / \gcd(a, b)$ can be computed exactly by integer division. By performing the division first, the unnecessarily large intermediate product $|a| \times |b|$ is avoided, reducing both memory usage and computation time.
Design Decisions
Why the Two-Tier Lehmer + HGCD Approach
Several GCD algorithm candidates exist:
- Euclidean algorithm: $O(n^2)$, but requires a full-precision division at every step, making it slow.
- Binary GCD: $O(n^2)$, operates using only shifts and subtractions. 30–50% faster than the Euclidean algorithm, but slower than HGCD at large sizes.
- Lehmer GCD: $O(n^2)$, predicts multiple steps from top words. Fastest at small to medium sizes.
- HGCD: $O(M(n) \log n)$, divide-and-conquer. Asymptotically fastest at large sizes.
calx adopts the Lehmer (small sizes) + HGCD (large sizes) two-tier approach. Binary GCD is simple but slower than Lehmer, and the Euclidean algorithm incurs expensive full-precision divisions. The Lehmer + HGCD combination achieves performance on par with or better than GMP across all size ranges.
References
- Möller, N. (2008). “On Schönhage’s algorithm and subquadratic integer GCD computation”. Mathematics of Computation, 77(261), 589–607.
- Stein, J. (1967). “Computational problems associated with Racah algebra”. Journal of Computational Physics, 1(3), 397–405.
- Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.5.2.