Int Square Root Algorithm
Overview
For an integer $n \geq 0$, compute $s = \lfloor \sqrt{n} \rfloor$, i.e., the largest integer $s$ satisfying $s^2 \leq n < (s+1)^2$.
| Function | Meaning |
|---|---|
sqrt(n) | $\lfloor \sqrt{n} \rfloor$ (integer part) |
sqrtRem(n) | $\lfloor \sqrt{n} \rfloor$ and the remainder $r = n - s^2$ |
isSquare(n) | Determines whether $n$ is a perfect square (3-layer filter) |
Internally, the following mpn-level routines are dispatched:
$$\boxed{\text{Int::sqrtRem}} \;\xrightarrow{\text{mpn}}\; \begin{cases} \text{sqrtrem1} & \text{(1 word)} \\ \text{sqrtrem2} & \text{(2 words)} \\ \text{dc\_sqrtrem} & \text{(3+ words)} \end{cases}$$
For API usage, see Int API Reference — Square Root.
Algorithm Selection
calx automatically selects an algorithm based on the number of words in the input.
Zimmermann's divide-and-conquer method (dc_sqrtrem) serves as the base algorithm
for all sizes, while 1–2 word special cases are handled by dedicated fast routines.
| Input Size | Algorithm | Complexity |
|---|---|---|
| $1$ word (64 bit) | sqrtrem1 (table lookup + Newton) | $O(1)$ |
| $2$ words (128 bit) | sqrtrem2 (sqrtrem1 + division extension) | $O(1)$ |
| $\geq 3$ words | dc_sqrtrem (Zimmermann recursion) | $O(M(n))$ |
Here $M(n)$ denotes the cost of multiplying two $n$-word integers, and 1 word = 64 bits.
An earlier implementation used the Babylonian method (Newton's method) for mid-range sizes, but with the introduction of the Zimmermann algorithm, all sizes now achieve $O(M(n))$. The Babylonian method is $O(M(n) \log n)$, which is asymptotically slower than Zimmermann for large inputs.
sqrtrem1 (1-Word Square Root)
Computes the square root and remainder of a normalized 1-word value ($a_0 \geq 2^{62}$). It uses no division at all; instead, it obtains the root via inverse square root table lookup + two Newton iterations.
Algorithm
- Table lookup:
A precomputed 384-entry table
invsqrttabprovides an 8-bit approximation of $1/\sqrt{a_0}$. - Newton iteration 1: Doubles the inverse square root precision to ~16 bits.
- Newton iteration 2: Doubles the inverse square root precision to ~32 bits.
- Forward root: Obtains the square root via $s \approx a_0 \cdot (1/\sqrt{a_0})$, then applies a final correction from the remainder.
Computing the inverse square root $1/\sqrt{a_0}$ first and then converting via $s = a_0 \cdot (1/\sqrt{a_0})$ is done because the Newton iteration requires no division ($x_{k+1} = x_k (3 - a x_k^2) / 2$ uses only multiplications).
sqrtrem2 (2-Word Square Root)
Computes the square root of a normalized 2-word value ($n_1 \geq 2^{62}$).
It applies sqrtrem1 to the upper word and extends the precision with a single division step.
- Apply
sqrtrem1(n_1)to get the upper-word square root $s_0$ and remainder $r_0$ - Divide $[r_0, n_0]$ by $2s_0$ to obtain the quotient $q$
- $s = s_0 \cdot 2^{32} + q$ is the 2-word square root candidate
- Subtract $q^2$ from the remainder; if negative, decrement $s$ by 1 to correct
The result is a 64-bit root, and in-place computation is supported.
Karatsuba Square Root (Zimmermann)
A recursive square root algorithm published by Paul Zimmermann in 1999. It reduces the square root computation to the same order as a single multiplication: $O(M(n))$.
Basic Idea
The $n$-word input $N$ is split into four blocks:
$$N = a_3 B^3 + a_2 B^2 + a_1 B + a_0, \quad B = 2^{n/4 \cdot 64}$$
The algorithm proceeds in 7 steps:
- Recursion: Recursively apply
dc_sqrtremto the upper half $[a_3, a_2]$ to compute the square root $s'$ and remainder $r'$ - Remainder carry adjustment: Handle the carry from the recursive remainder
- Division: Divide the concatenation of $r'$ and $a_1$ by $2s'$ (using Burnikel-Ziegler division) to obtain quotient $q$ and remainder $u$
- Quotient extraction: Process the upper bits of $q$ and form $s = s' \cdot B + q_{\text{low}}$
- Odd quotient correction: If $q$ is odd, adjust the remainder
- Square subtraction: Compute the remainder $r = u \cdot B + a_0 - q_{\text{low}}^2$
- Negative remainder correction: If $r < 0$, set $r \mathrel{+}= 2s - 1$ and $s \mathrel{-}= 1$
Role of sqrtRem
The Zimmermann algorithm returns both the square root $s$ and the remainder $r$ simultaneously:
$$N = s^2 + r, \quad 0 \leq r \leq 2s$$
By returning the remainder, the result of the recursive call can be used directly in the outer step. This design of returning the "root and remainder" pair is the key to making the recursion efficient.
Complexity Intuition
At each level of the recursion, a size-$n$ division ($O(M(n))$) and a size-$n/2$ squaring ($O(M(n/2))$) are performed. The recurrence for the cost $T(n)$ is:
$$T(n) = T(n/2) + O(M(n))$$
If $M(n) \geq cn$ (at least linear), the Master theorem gives $T(n) = O(M(n))$. That is, the square root can be computed at the same cost as a single multiplication.
isSquare (Perfect Square Detection)
Determines whether $n$ is a perfect square (i.e., whether an integer $k$ exists such that $n = k^2$).
Simply calling sqrtRem and checking whether the remainder is zero costs $O(M(n))$,
so calx uses a GMP-style three-layer filter to quickly reject non-squares.
Filter Structure
| Layer | Method | Cost | Rejection Rate |
|---|---|---|---|
| Layer 1 | $n \bmod 256$ bitmask | $O(1)$ | 82.81% |
| Layer 2 | $n \bmod (2^{48}-1)$ + small-prime residue check | $O(n)$ | additional ~15% |
| Layer 3 | Full sqrtRem + verification | $O(M(n))$ | all remaining |
Layer 1: mod 256 Filter
The remainder of a perfect square $k^2$ modulo $256$ can take only 44 out of 256 possible values. By looking up the least significant byte in a bitmask table, 82.81% of non-squares are rejected in $O(1)$.
Layer 2: Small-Prime Residue Filter
mod_34lsub1 efficiently computes $n \bmod (2^{48} - 1)$ and
performs quadratic residue tests against small primes such as $3, 5, 7, 13, 17, 97$.
Since these are factors of $2^{48} - 1$, no additional modular reduction is needed.
Layer 3: Full Verification
Only inputs that pass all filters are subjected to the full sqrtRem computation
to verify whether the remainder $r = 0$.
Across all three layers, the combined rejection rate is 99.62%,
reducing the average cost for non-squares to $O(n)$.
References
- Zimmermann, P. (1999). "Karatsuba Square Root". INRIA Research Report RR-3805.
- Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 1.5: Square Root.
- GMP Documentation. "Integer Roots". https://gmplib.org/manual/
Related articles: Int Square Root & Nth Root (details on nthRoot) / Float Square Root (integer reduction & correct rounding) / Zimmermann's Karatsuba Square Root