Int Exponentiation & Modular Arithmetic
Overview
The calx Int class provides the following exponentiation and modular operations:
pow(base, exp)— Computes the power $a^e$powMod(base, exp, mod)— Computes the modular power $a^e \bmod m$inverseMod(a, m)— Computes the modular inverse $a^{-1} \bmod m$
All of these are built on binary exponentiation (exponentiation by squaring). By scanning the binary representation of the exponent $e$ and combining "squaring" with "multiplication", the power is computed in $O(\log e)$ multiplications.
In calx, operator^ is defined as the exponentiation operator.
In standard C++, ^ denotes bitwise XOR, but
for calx's Int, bitwise XOR is available through bitwiseXor().
For usage details, see Int API Reference — Exponentiation and Int API Reference — Modular Arithmetic.
Binary Exponentiation
Binary exponentiation is the fundamental algorithm for computing powers. The exponent $e$ is expressed in binary, and "squaring" and "multiplication" are repeated for each bit.
Basic Idea
Writing the exponent $e$ in binary as $e = (b_{k-1}\, b_{k-2}\, \cdots\, b_1\, b_0)_2$, the power can be decomposed as:
$$a^e = a^{\sum_{i} b_i \cdot 2^i} = \prod_{b_i = 1} a^{2^i}$$
Since $a^{2^i}$ is obtained by squaring $a$ a total of $i$ times, this requires only $O(\log e)$ multiplications, compared to the naive $O(e)$ approach of multiplying $e$ times.
Left-to-Right Scan
pow() uses a left-to-right (MSB to LSB) scan.
The algorithm proceeds as follows:
- Initialize $r \leftarrow 1$
- Scan the bits of $e$ from the most significant to the least significant
- For each bit: $r \leftarrow r^2$ (squaring)
- If the bit is 1: $r \leftarrow r \times a$ (multiply by the base)
Worked Example: $2^{10}$
Since $10 = (1010)_2$, we scan 4 bits from the MSB:
| Step | Bit | Operation | Value of $r$ |
|---|---|---|---|
| 1 | 1 | $r = 1^2 \times 2$ | $2$ |
| 2 | 0 | $r = 2^2$ | $4$ |
| 3 | 1 | $r = 4^2 \times 2$ | $32$ |
| 4 | 0 | $r = 32^2$ | $1024$ |
Result: $2^{10} = 1024$. The maximum number of multiplications is $2 \lfloor \log_2 e \rfloor$.
Complexity
Complexity: $O(\log e \cdot M(n))$
Here $e$ is the exponent and $M(n)$ is the cost of $n$-word multiplication. Note that intermediate results grow rapidly during exponentiation. The number of digits in $a^e$ is approximately $e \times \log_{10}(a)$; for example, with $a = 10^{100}$ (100 digits) and $e = 1000$, the result has about 100,000 digits. As the intermediate value grows at each step, the multiplication cost becomes heavier in the later steps.
Modular Exponentiation (powMod)
powMod(a, e, m) computes $a^e \bmod m$.
This is one of the most frequently used operations in cryptography and number-theoretic algorithms.
Algorithm: Montgomery Modular Exponentiation
When $m$ is odd, calx uses Montgomery multiplication-based modular exponentiation. Montgomery multiplication performs modular multiplication without division, replacing the division needed for $a \times b \bmod m$ with shifts and additions.
The key idea is to use the "Montgomery representation" $\tilde{a} = a \cdot R \bmod m$, where $R = 2^{64n}$ ($n$ is the word count). Montgomery multiplication $\text{Mont}(\tilde{a}, \tilde{b})$ computes $\tilde{a} \cdot \tilde{b} \cdot R^{-1} \bmod m$ without division (REDC).
Sliding Window Method
The exponent is scanned using a left-to-right sliding window method. The bit string of the exponent is scanned with a window of width $w$, and a table of odd powers $\{a^1, a^3, a^5, \ldots, a^{2^{w-1}-1}\}$ is precomputed. The window width is dynamically selected based on the exponent's bit length:
- $\leq 64$ bits: $w = 3$
- $\leq 256$ bits: $w = 4$
- $\leq 1024$ bits: $w = 5$
- $\leq 4096$ bits: $w = 6$
- $> 4096$ bits: $w = 7$
MASM Specialized Kernels
For $n = 2, 4, 8$ word Montgomery multiplication and squaring, dedicated MASM assembly kernels are used
(mpn_mont_mul_4, mpn_mont_sqr_4, etc.).
These form dual carry chains using MULX/ADCX/ADOX instructions
and keep operands register-resident to minimize memory access.
For $n \leq 8$, all buffers are placed on the stack with zero heap allocation.
Preventing Intermediate Value Explosion
The key to modular exponentiation is reducing modulo $m$ at every step. If one naively computes $a^e$ first and then takes $\bmod m$, the intermediate values grow to astronomical sizes. Montgomery multiplication automatically reduces results below $m$ at each step via REDC.
Negative Exponents
When $e < 0$, powMod computes:
$$a^{-|e|} \bmod m = (a^{|e|} \bmod m)^{-1} \bmod m = \text{inverseMod}(a^{|e|} \bmod m,\; m)$$
That is, it first computes $a^{|e|} \bmod m$, then finds the modular inverse of the result.
Applications
- RSA encryption: Encryption $c = m^e \bmod n$, decryption $m = c^d \bmod n$
- Diffie-Hellman key exchange: Computing $g^a \bmod p$
- Miller-Rabin primality test: Computing $a^d \bmod n$ (used by calx's
isPrime())
Complexity: $O(\log e \cdot M(n))$, where $n$ is the word count of $m$
Modular Inverse (inverseMod)
inverseMod(a, m) finds $x$ satisfying $a \cdot x \equiv 1 \pmod{m}$.
The inverse exists if and only if $\gcd(a, m) = 1$.
Strategy 1: Extended Euclidean Algorithm (Default)
When is_prime = false (the default), the Extended Euclidean Algorithm is used.
The Extended Euclidean Algorithm finds $x$ and $y$ such that $\gcd(a, m) = a \cdot x + m \cdot y$. When $\gcd(a, m) = 1$, we have $a \cdot x + m \cdot y = 1$, and taking both sides modulo $m$ yields $a \cdot x \equiv 1 \pmod{m}$.
This approach works regardless of whether $m$ is prime or composite, making it the most general method. Its complexity is $O(\log^2 m)$ (same as GCD), and it is very fast for small $m$.
Strategy 2: Fermat's Little Theorem (is_prime = true)
When is_prime = true is specified, Fermat's little theorem is used:
$$a^{p-1} \equiv 1 \pmod{p} \quad \Rightarrow \quad a^{-1} \equiv a^{p-2} \pmod{p}$$
When $m$ is known to be prime, one simply computes $a^{m-2} \bmod m$ via powMod.
This avoids the GCD computation, and for very large primes $m$, powMod with NTT multiplication
can be faster than the Extended Euclidean Algorithm.
Warning: Specifying is_prime = true when $m$ is not actually prime
will produce incorrect results. The caller must guarantee the primality of $m$.
Preconditions
The inverse does not exist unless $\gcd(a, m) = 1$. Behavior is undefined when this condition is not satisfied.
Related article: Advanced Arbitrary-Precision Integer Algorithms
Modulus Normalization
The C++ % operator preserves the sign of the dividend.
For example, -7 % 5 = -2.
This differs from the mathematical definition of the remainder.
The calx IntModular::mod always returns a non-negative remainder:
$$\text{mod}(a, m) = r \quad \text{where} \quad r \in [0, |m|)$$
| Expression | C++ % | calx mod |
|---|---|---|
| $-7 \bmod 5$ | $-2$ | $3$ |
| $7 \bmod 5$ | $2$ | $2$ |
| $-7 \bmod{-5}$ | $-2$ | $3$ |
In mathematics, the remainder is always normalized to the range $[0, |m|)$.
calx follows this mathematical definition,
guaranteeing that the result of any modular operation is always non-negative.
This ensures that powMod and inverseMod also always return non-negative results.
Design Decisions
Why pow Uses Left-to-Right Scan
pow() uses a left-to-right scan.
In the left-to-right approach the result is built from larger values,
giving a slightly more local memory access pattern and improving cache efficiency.
Additionally, the right-to-left approach requires updating both the base and the result at each step,
whereas left-to-right only needs to update the result.
Why powMod Uses Montgomery + Sliding Window
powMod() uses Montgomery multiplication with the left-to-right sliding window method.
Montgomery multiplication avoids division, using REDC (Montgomery reduction) to perform
the modular reduction at each step via shifts and additions.
The sliding window method requires fewer multiplications than the fixed-window or naive binary exponentiation.
When $m$ is even, Montgomery representation is not applicable, so the implementation falls back to conventional right-to-left binary exponentiation.
Effect of n=2,4,8 Specialized Kernels
For 256-bit (n=4) modular exponentiation, the Montgomery loop executes hundreds of times, so the per-iteration cost of mont_mul/mont_sqr directly impacts overall performance. The n=4 specialized MASM kernels have lower function call overhead than the generic path and keep operands register-resident to avoid L1 cache misses.
Why Two Strategies for inverseMod
The Extended Euclidean Algorithm (HGCD-based) is general-purpose,
but the Fermat's little theorem approach can completely avoid GCD computation when $m$ is prime.
Especially for very large primes $m$,
powMod leveraging NTT multiplication can be more efficient.
Providing both strategies enables the optimal choice for each use case.
References
- Montgomery, P. L. (1985). “Modular multiplication without trial division”. Mathematics of Computation, 44(170), 519–521.
- Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.6.3.
- Cohen, H. A Course in Computational Algebraic Number Theory. Springer, 1993.