Which algorithm does calx use for integer square root?

calx automatically selects an algorithm based on the input size. For 1 word (64 bit), it uses an inverse square root table lookup + two Newton iterations (sqrtrem1). For 2 words, it extends sqrtrem1 with a division step (sqrtrem2). For 3 or more words, it uses Zimmermann's recursive divide-and-conquer (dc_sqrtrem). The complexity is O(M(n)), the same order as a single multiplication.

What is Zimmermann's Karatsuba square root?

It is a recursive square root algorithm published by Paul Zimmermann in 1999. The input is split into four blocks; the algorithm recursively computes the square root of the upper half, then refines the result via division and squaring correction, computing both the root and the remainder simultaneously. Its complexity is O(M(n)), which is asymptotically faster than Newton's method at O(M(n) log n).

How is isSquare (perfect square detection) optimized?

A GMP-style three-layer filter rejects 99.62% of non-squares without computing sqrtrem. Layer 1 uses a mod-256 bitmask (rejecting 82.81%). Layer 2 uses mod (2^48-1) with small-prime quadratic residue checks. Only layer 3 performs the full sqrtrem verification.

Int Square Root Algorithm

Overview

For an integer $n \geq 0$, compute $s = \lfloor \sqrt{n} \rfloor$, i.e., the largest integer $s$ satisfying $s^2 \leq n < (s+1)^2$.

Function	Meaning
`sqrt(n)`	$\lfloor \sqrt{n} \rfloor$ (integer part)
`sqrtRem(n)`	$\lfloor \sqrt{n} \rfloor$ and the remainder $r = n - s^2$
`isSquare(n)`	Determines whether $n$ is a perfect square (3-layer filter)

Internally, the following mpn-level routines are dispatched:

$$\boxed{\text{Int::sqrtRem}} \;\xrightarrow{\text{mpn}}\; \begin{cases} \text{sqrtrem1} & \text{(1 word)} \\ \text{sqrtrem2} & \text{(2 words)} \\ \text{dc\_sqrtrem} & \text{(3+ words)} \end{cases}$$

For API usage, see Int API Reference — Square Root.

Algorithm Selection

calx automatically selects an algorithm based on the number of words in the input. Zimmermann's divide-and-conquer method (dc_sqrtrem) serves as the base algorithm for all sizes, while 1–2 word special cases are handled by dedicated fast routines.

Input Size	Algorithm	Complexity
$1$ word (64 bit)	sqrtrem1 (table lookup + Newton)	$O(1)$
$2$ words (128 bit)	sqrtrem2 (sqrtrem1 + division extension)	$O(1)$
$\geq 3$ words	dc_sqrtrem (Zimmermann recursion)	$O(M(n))$

Here $M(n)$ denotes the cost of multiplying two $n$-word integers, and 1 word = 64 bits.

An earlier implementation used the Babylonian method (Newton's method) for mid-range sizes, but with the introduction of the Zimmermann algorithm, all sizes now achieve $O(M(n))$. The Babylonian method is $O(M(n) \log n)$, which is asymptotically slower than Zimmermann for large inputs.

sqrtrem1 (1-Word Square Root)

Computes the square root and remainder of a normalized 1-word value ($a_0 \geq 2^{62}$). It uses no division at all; instead, it obtains the root via inverse square root table lookup + two Newton iterations.

Algorithm

Table lookup: A precomputed 384-entry table invsqrttab provides an 8-bit approximation of $1/\sqrt{a_0}$.
Newton iteration 1: Doubles the inverse square root precision to ~16 bits.
Newton iteration 2: Doubles the inverse square root precision to ~32 bits.
Forward root: Obtains the square root via $s \approx a_0 \cdot (1/\sqrt{a_0})$, then applies a final correction from the remainder.

Computing the inverse square root $1/\sqrt{a_0}$ first and then converting via $s = a_0 \cdot (1/\sqrt{a_0})$ is done because the Newton iteration requires no division ($x_{k+1} = x_k (3 - a x_k^2) / 2$ uses only multiplications).

sqrtrem2 (2-Word Square Root)

Computes the square root of a normalized 2-word value ($n_1 \geq 2^{62}$). It applies sqrtrem1 to the upper word and extends the precision with a single division step.

Apply sqrtrem1(n_1) to get the upper-word square root $s_0$ and remainder $r_0$
Divide $[r_0, n_0]$ by $2s_0$ to obtain the quotient $q$
$s = s_0 \cdot 2^{32} + q$ is the 2-word square root candidate
Subtract $q^2$ from the remainder; if negative, decrement $s$ by 1 to correct

The result is a 64-bit root, and in-place computation is supported.

Karatsuba Square Root (Zimmermann)

A recursive square root algorithm published by Paul Zimmermann in 1999. It reduces the square root computation to the same order as a single multiplication: $O(M(n))$.

Basic Idea

The $n$-word input $N$ is split into four blocks:

$$N = a_3 B^3 + a_2 B^2 + a_1 B + a_0, \quad B = 2^{n/4 \cdot 64}$$

The algorithm proceeds in 7 steps:

Recursion: Recursively apply dc_sqrtrem to the upper half $[a_3, a_2]$ to compute the square root $s'$ and remainder $r'$
Remainder carry adjustment: Handle the carry from the recursive remainder
Division: Divide the concatenation of $r'$ and $a_1$ by $2s'$ (using Burnikel-Ziegler division) to obtain quotient $q$ and remainder $u$
Quotient extraction: Process the upper bits of $q$ and form $s = s' \cdot B + q_{\text{low}}$
Odd quotient correction: If $q$ is odd, adjust the remainder
Square subtraction: Compute the remainder $r = u \cdot B + a_0 - q_{\text{low}}^2$
Negative remainder correction: If $r < 0$, set $r \mathrel{+}= 2s - 1$ and $s \mathrel{-}= 1$

Role of sqrtRem

The Zimmermann algorithm returns both the square root $s$ and the remainder $r$ simultaneously:

$$N = s^2 + r, \quad 0 \leq r \leq 2s$$

By returning the remainder, the result of the recursive call can be used directly in the outer step. This design of returning the "root and remainder" pair is the key to making the recursion efficient.

Complexity Intuition

At each level of the recursion, a size-$n$ division ($O(M(n))$) and a size-$n/2$ squaring ($O(M(n/2))$) are performed. The recurrence for the cost $T(n)$ is:

$$T(n) = T(n/2) + O(M(n))$$

If $M(n) \geq cn$ (at least linear), the Master theorem gives $T(n) = O(M(n))$. That is, the square root can be computed at the same cost as a single multiplication.

isSquare (Perfect Square Detection)

Determines whether $n$ is a perfect square (i.e., whether an integer $k$ exists such that $n = k^2$). Simply calling sqrtRem and checking whether the remainder is zero costs $O(M(n))$, so calx uses a GMP-style three-layer filter to quickly reject non-squares.

Filter Structure

Layer	Method	Cost	Rejection Rate
Layer 1	$n \bmod 256$ bitmask	$O(1)$	82.81%
Layer 2	$n \bmod (2^{48}-1)$ + small-prime residue check	$O(n)$	additional ~15%
Layer 3	Full sqrtRem + verification	$O(M(n))$	all remaining

Layer 1: mod 256 Filter

The remainder of a perfect square $k^2$ modulo $256$ can take only 44 out of 256 possible values. By looking up the least significant byte in a bitmask table, 82.81% of non-squares are rejected in $O(1)$.

Layer 2: Small-Prime Residue Filter

mod_34lsub1 efficiently computes $n \bmod (2^{48} - 1)$ and performs quadratic residue tests against small primes such as $3, 5, 7, 13, 17, 97$. Since these are factors of $2^{48} - 1$, no additional modular reduction is needed.

Layer 3: Full Verification

Only inputs that pass all filters are subjected to the full sqrtRem computation to verify whether the remainder $r = 0$. Across all three layers, the combined rejection rate is 99.62%, reducing the average cost for non-squares to $O(n)$.

References

Zimmermann, P. (1999). "Karatsuba Square Root". INRIA Research Report RR-3805.
Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 1.5: Square Root.
GMP Documentation. "Integer Roots". https://gmplib.org/manual/

Related articles: Int Square Root & Nth Root (details on nthRoot) / Float Square Root (integer reduction & correct rounding) / Zimmermann's Karatsuba Square Root