How does calx compute modular exponentiation?

calx uses Montgomery multiplication with a left-to-right sliding window method. For odd moduli, Montgomery representation eliminates expensive divisions by replacing them with shifts and additions (REDC). The window width is dynamically chosen based on exponent bit length, ranging from 3 to 7.

How does calx compute modular inverses?

calx provides two strategies: the Extended Euclidean Algorithm (default, works for any modulus) and Fermat's little theorem (when the modulus is known to be prime, computing a^(p-2) mod p via powMod). Users select the strategy via the is_prime parameter of inverseMod().

Int Exponentiation & Modular Arithmetic

Q: What is binary exponentiation?

Binary exponentiation (also known as exponentiation by squaring) computes a^e in O(log e) multiplications by scanning the binary representation of the exponent. Each bit triggers a squaring step, and if the bit is 1, an additional multiplication by the base is performed.

Overview

The calx Int class provides the following exponentiation and modular operations:

pow(base, exp) — Computes the power $a^e$
powMod(base, exp, mod) — Computes the modular power $a^e \bmod m$
inverseMod(a, m) — Computes the modular inverse $a^{-1} \bmod m$

All of these are built on binary exponentiation (exponentiation by squaring). By scanning the binary representation of the exponent $e$ and combining "squaring" with "multiplication", the power is computed in $O(\log e)$ multiplications.

In calx, operator^ is defined as the exponentiation operator. In standard C++, ^ denotes bitwise XOR, but for calx's Int, bitwise XOR is available through bitwiseXor().

For usage details, see Int API Reference — Exponentiation and Int API Reference — Modular Arithmetic.

Binary Exponentiation

Binary exponentiation is the fundamental algorithm for computing powers. The exponent $e$ is expressed in binary, and "squaring" and "multiplication" are repeated for each bit.

Basic Idea

Writing the exponent $e$ in binary as $e = (b_{k-1}\, b_{k-2}\, \cdots\, b_1\, b_0)_2$, the power can be decomposed as:

$$a^e = a^{\sum_{i} b_i \cdot 2^i} = \prod_{b_i = 1} a^{2^i}$$

Since $a^{2^i}$ is obtained by squaring $a$ a total of $i$ times, this requires only $O(\log e)$ multiplications, compared to the naive $O(e)$ approach of multiplying $e$ times.

Left-to-Right Scan

pow() uses a left-to-right (MSB to LSB) scan. The algorithm proceeds as follows:

Initialize $r \leftarrow 1$
Scan the bits of $e$ from the most significant to the least significant
For each bit: $r \leftarrow r^2$ (squaring)
If the bit is 1: $r \leftarrow r \times a$ (multiply by the base)

Worked Example: $2^{10}$

Since $10 = (1010)_2$, we scan 4 bits from the MSB:

Step	Bit	Operation	Value of $r$
1	1	$r = 1^2 \times 2$	$2$
2	0	$r = 2^2$	$4$
3	1	$r = 4^2 \times 2$	$32$
4	0	$r = 32^2$	$1024$

Result: $2^{10} = 1024$. The maximum number of multiplications is $2 \lfloor \log_2 e \rfloor$.

Complexity

Complexity: $O(\log e \cdot M(n))$

Here $e$ is the exponent and $M(n)$ is the cost of $n$-word multiplication. Note that intermediate results grow rapidly during exponentiation. The number of digits in $a^e$ is approximately $e \times \log_{10}(a)$; for example, with $a = 10^{100}$ (100 digits) and $e = 1000$, the result has about 100,000 digits. As the intermediate value grows at each step, the multiplication cost becomes heavier in the later steps.

Modular Exponentiation (powMod)

powMod(a, e, m) computes $a^e \bmod m$. This is one of the most frequently used operations in cryptography and number-theoretic algorithms.

Algorithm: Montgomery Modular Exponentiation

When $m$ is odd, calx uses Montgomery multiplication-based modular exponentiation. Montgomery multiplication performs modular multiplication without division, replacing the division needed for $a \times b \bmod m$ with shifts and additions.

The key idea is to use the "Montgomery representation" $\tilde{a} = a \cdot R \bmod m$, where $R = 2^{64n}$ ($n$ is the word count). Montgomery multiplication $\text{Mont}(\tilde{a}, \tilde{b})$ computes $\tilde{a} \cdot \tilde{b} \cdot R^{-1} \bmod m$ without division (REDC).

Sliding Window Method

The exponent is scanned using a left-to-right sliding window method. The bit string of the exponent is scanned with a window of width $w$, and a table of odd powers $\{a^1, a^3, a^5, \ldots, a^{2^{w-1}-1}\}$ is precomputed. The window width is dynamically selected based on the exponent's bit length:

$\leq 64$ bits: $w = 3$
$\leq 256$ bits: $w = 4$
$\leq 1024$ bits: $w = 5$
$\leq 4096$ bits: $w = 6$
$> 4096$ bits: $w = 7$

MASM Specialized Kernels

For $n = 2, 4, 8$ word Montgomery multiplication and squaring, dedicated MASM assembly kernels are used (mpn_mont_mul_4, mpn_mont_sqr_4, etc.). These form dual carry chains using MULX/ADCX/ADOX instructions and keep operands register-resident to minimize memory access. For $n \leq 8$, all buffers are placed on the stack with zero heap allocation.

Preventing Intermediate Value Explosion

The key to modular exponentiation is reducing modulo $m$ at every step. If one naively computes $a^e$ first and then takes $\bmod m$, the intermediate values grow to astronomical sizes. Montgomery multiplication automatically reduces results below $m$ at each step via REDC.

Negative Exponents

When $e < 0$, powMod computes:

$$a^{-|e|} \bmod m = (a^{|e|} \bmod m)^{-1} \bmod m = \text{inverseMod}(a^{|e|} \bmod m,\; m)$$

That is, it first computes $a^{|e|} \bmod m$, then finds the modular inverse of the result.

Applications

RSA encryption: Encryption $c = m^e \bmod n$, decryption $m = c^d \bmod n$
Diffie-Hellman key exchange: Computing $g^a \bmod p$
Miller-Rabin primality test: Computing $a^d \bmod n$ (used by calx's isPrime())

Complexity: $O(\log e \cdot M(n))$, where $n$ is the word count of $m$

Modular Inverse (inverseMod)

inverseMod(a, m) finds $x$ satisfying $a \cdot x \equiv 1 \pmod{m}$. The inverse exists if and only if $\gcd(a, m) = 1$.

Strategy 1: Extended Euclidean Algorithm (Default)

When is_prime = false (the default), the Extended Euclidean Algorithm is used.

The Extended Euclidean Algorithm finds $x$ and $y$ such that $\gcd(a, m) = a \cdot x + m \cdot y$. When $\gcd(a, m) = 1$, we have $a \cdot x + m \cdot y = 1$, and taking both sides modulo $m$ yields $a \cdot x \equiv 1 \pmod{m}$.

This approach works regardless of whether $m$ is prime or composite, making it the most general method. Its complexity is $O(\log^2 m)$ (same as GCD), and it is very fast for small $m$.

Strategy 2: Fermat's Little Theorem (`is_prime = true`)

When is_prime = true is specified, Fermat's little theorem is used:

$$a^{p-1} \equiv 1 \pmod{p} \quad \Rightarrow \quad a^{-1} \equiv a^{p-2} \pmod{p}$$

When $m$ is known to be prime, one simply computes $a^{m-2} \bmod m$ via powMod. This avoids the GCD computation, and for very large primes $m$, powMod with NTT multiplication can be faster than the Extended Euclidean Algorithm.

Warning: Specifying is_prime = true when $m$ is not actually prime will produce incorrect results. The caller must guarantee the primality of $m$.

Preconditions

The inverse does not exist unless $\gcd(a, m) = 1$. Behavior is undefined when this condition is not satisfied.

Modulus Normalization

The C++ % operator preserves the sign of the dividend. For example, -7 % 5 = -2. This differs from the mathematical definition of the remainder.

The calx IntModular::mod always returns a non-negative remainder:

$$\text{mod}(a, m) = r \quad \text{where} \quad r \in [0, |m|)$$

Expression	C++ `%`	calx `mod`
$-7 \bmod 5$	$-2$	$3$
$7 \bmod 5$	$2$	$2$
$-7 \bmod{-5}$	$-2$	$3$

In mathematics, the remainder is always normalized to the range $[0, |m|)$. calx follows this mathematical definition, guaranteeing that the result of any modular operation is always non-negative. This ensures that powMod and inverseMod also always return non-negative results.

Design Decisions

Why pow Uses Left-to-Right Scan

pow() uses a left-to-right scan. In the left-to-right approach the result is built from larger values, giving a slightly more local memory access pattern and improving cache efficiency. Additionally, the right-to-left approach requires updating both the base and the result at each step, whereas left-to-right only needs to update the result.

Why powMod Uses Montgomery + Sliding Window

powMod() uses Montgomery multiplication with the left-to-right sliding window method. Montgomery multiplication avoids division, using REDC (Montgomery reduction) to perform the modular reduction at each step via shifts and additions. The sliding window method requires fewer multiplications than the fixed-window or naive binary exponentiation.

When $m$ is even, Montgomery representation is not applicable, so the implementation falls back to conventional right-to-left binary exponentiation.

Effect of n=2,4,8 Specialized Kernels

For 256-bit (n=4) modular exponentiation, the Montgomery loop executes hundreds of times, so the per-iteration cost of mont_mul/mont_sqr directly impacts overall performance. The n=4 specialized MASM kernels have lower function call overhead than the generic path and keep operands register-resident to avoid L1 cache misses.

Why Two Strategies for inverseMod

The Extended Euclidean Algorithm (HGCD-based) is general-purpose, but the Fermat's little theorem approach can completely avoid GCD computation when $m$ is prime. Especially for very large primes $m$, powMod leveraging NTT multiplication can be more efficient. Providing both strategies enables the optimal choice for each use case.

References

Montgomery, P. L. (1985). “Modular multiplication without trial division”. Mathematics of Computation, 44(170), 519–521.
Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.6.3.
Cohen, H. A Course in Computational Algebraic Number Theory. Springer, 1993.