How does calx compute the integer nth root?

calx uses Newton's method on f(x) = x^n - A with a precision-doubling technique. Early iterations run at low precision, and the working precision is doubled at each step. This concentrates the computational cost in the final few iterations, achieving O(M(n) log n) overall complexity.

What is the Babylonian method for square root?

The Babylonian method is a classical iterative algorithm originating from ancient Babylon, equivalent to Newton's method on f(x) = x^2 - n. It uses the recurrence x_{k+1} = (x_k + n/x_k) / 2 and converges quadratically. calx uses it for inputs smaller than 17 words.

How does the isSquare modular filter work?

calx applies a three-layer modular filter inspired by GMP. Layer 1 checks n mod 256 against a bitmask (O(1), rejects ~82%). Layer 2 computes n mod (2^24-1) and tests quadratic residues for small primes (rejects ~97.8% of remaining). Only candidates passing both filters undergo the full sqrtRem computation.

Int Square Root & Nth Root

Overview

The Int class provides the following root-related operations as static methods of IntSqrt:

Integer square root: Computes $\lfloor \sqrt{n} \rfloor$
Integer nth root: Computes $\lfloor n^{1/k} \rfloor$
Perfect square detection: Quickly determines whether $n$ is a perfect square (isSquare)

All operations automatically select the optimal algorithm based on the operand size. Users never need to choose an algorithm explicitly.

For API usage, see Int API Reference — Square Root & Nth Root.

Square Root Algorithm Selection

calx automatically selects from three algorithm tiers based on the input size.

Size	Algorithm	Complexity
MSB $< 52$ bit	Double-precision floating point	$O(1)$
$< 17$ words	Babylonian method (Newton variant)	$O(n \cdot M(n))$
$\geq 17$ words	Karatsuba square root (Zimmermann)	$O(M(n))$

Here $M(n)$ denotes the cost of multiplying two $n$-word integers. 1 word = 64 bits, so 17 words corresponds to approximately 1,088 bits (about 327 decimal digits).

For inputs of 52 bits or fewer, hardware double-precision arithmetic yields exact results, so std::sqrt is used directly.

Babylonian Method (Newton's Method)

A classical iterative method for square roots originating from ancient Babylon. It is equivalent to Newton's method on $f(x) = x^2 - n$ and converges via the recurrence:

$$x_{k+1} = \frac{x_k + n / x_k}{2}$$

Quadratic convergence: The number of correct digits doubles at each iteration. For example, 4 digits of accuracy become 8 at the next step, then 16, and so on.

Initial Approximation

Convergence speed depends heavily on the quality of the initial approximation. calx computes the initial value using hardware std::sqrt on the most significant word, providing about 52 bits of accuracy and minimizing the number of iterations.

Termination Condition for Integer Square Root

While real-valued Newton's method stops when $|x_{k+1} - x_k|$ is sufficiently small, integer square root requires a different termination condition. The iteration stops when $x_{k+1} \geq x_k$, at which point $x_k$ equals $\lfloor \sqrt{n} \rfloor$. This is based on the fact that the sequence $\{x_k\}$ is monotonically decreasing toward $\lfloor \sqrt{n} \rfloor$.

Complexity

Complexity: $O(n \cdot M(n))$

At first glance, each iteration involves a single $n$-word division and the number of iterations is $O(\log n)$, suggesting $O(M(n) \log n)$. However, since the cost of each division is $M(n)$ and intermediate values change size across iterations, a precise analysis yields $O(n \cdot M(n))$. This is asymptotically slower than the Karatsuba square root's $O(M(n))$, but for small inputs, the simplicity and low constant factor of this method make it faster.

Karatsuba Square Root (Zimmermann)

A recursive square root algorithm published by Paul Zimmermann in 1999. It reduces the square root computation to a multiplication-sized operation, achieving the optimal asymptotic complexity of $O(M(n))$. This means the square root can be computed in the same order as a single multiplication.

Basic Idea

The input $n$ is split into four blocks of $n/4$ words each:

$$n = a_3 \cdot B^3 + a_2 \cdot B^2 + a_1 \cdot B + a_0 \quad (B = 2^{16n})$$

The square root of the upper half $a_3 \cdot B + a_2$ is computed recursively, then refined to the full square root via a single multiplication-sized division.

sqrtRem

The Zimmermann algorithm returns both the square root $s$ and the remainder $r$ simultaneously:

$$n = s^2 + r, \quad 0 \leq r \leq 2s$$

By returning the remainder, the result of the recursive call can be used directly in the outer step. This design of returning the "root and remainder" pair is the key to making the recursion efficient.

Complexity

Complexity: $O(M(n))$ — the same asymptotic cost as a single multiplication

Used for inputs of 17 or more words. Compared with the Babylonian method's $O(n \cdot M(n))$, the gap widens as the input grows larger.

Perfect Square Detection (isSquare)

Determines whether $n$ is a perfect square. The naive approach of computing the square root and verifying is costly for large integers. calx uses modular filtering to reject the vast majority of non-squares without computing the square root at all.

Modular Filter

A perfect square must be a quadratic residue modulo any modulus $m$. By detecting that $n \bmod m$ is not a quadratic residue, non-squares can be identified without computing the square root.

calx applies a three-layer filter inspired by GMP's perfsqr.c:

Filter 1: mod 256 — Tests the least significant byte against a quadratic residue table. Rejects approximately 82% of non-squares in $O(1)$.
Filter 2: mod_34lsub1 — Computes the limb sum modulo $2^{24} - 1 = 16{,}777{,}215$ and checks quadratic residues for small primes (3, 5, 7, 11, 13, etc.). Rejects an additional ~97.8% in $O(n)$.
Filter 3: sqrtRem — Only candidates passing the filters undergo the actual square root computation to verify $s^2 = n$.

Combining all three layers, over 99.9% of non-squares are rejected without computing the square root.

The mod 256 table occupies just 32 bytes (a bit table) and stays resident in cache. The mod_34lsub1 scan traverses all limbs once, simultaneously testing multiple small primes.

Nth Root

Computes $\lfloor A^{1/n} \rfloor$, the integer nth root of $A$. Newton's method on $f(x) = x^n - A$ is used:

$$x_{k+1} = \frac{(n-1) \cdot x_k + A / x_k^{n-1}}{n}$$

Precision-Doubling Technique

Taking advantage of Newton's method quadratic convergence, the working precision is doubled at each iteration. Early iterations are performed at low precision, with precision increasing as the method converges.

Specifically:

Determine the final required precision $P$
Estimate the number of iterations as $\lceil \log_2 P \rceil$
Run the first iteration at precision $P / 2^k$, the next at $P / 2^{k-1}$, ..., and the last at full precision $P$

Since the early iterations deal with small numbers, they are lightweight. The bulk of the computational cost is concentrated in the final few iterations. This approach dramatically reduces total cost compared with running all iterations at full precision.

Initial Approximation

The nth root of the most significant word is computed in floating point to serve as the initial approximation. This provides about 52 bits of accuracy, ensuring sufficient precision for the first step of the precision-doubling scheme.

Complexity

Complexity: $O(M(n) \cdot \log n)$

With precision doubling, the cost of each iteration grows geometrically, but the total cost is dominated by the last iteration, resulting in $O(M(n) \cdot \log n)$ overall.

nthRootRem

Like the square root, an nthRootRem function is provided that returns the nth root and remainder as a pair:

$$A = s^n + r, \quad 0 \leq r \leq (s+1)^n - s^n - 1$$

Design Decisions

Why the Zimmermann Threshold is 17 Words

The Karatsuba square root is a recursive algorithm with splitting and merging overhead. For inputs below 17 words, the Babylonian method's simple iteration outperforms this overhead. The threshold of 17 words (approximately 1,088 bits) was determined through empirical benchmarks.

Why isSquare Uses Modular Filtering

Computing the square root costs at least $O(M(n))$, while modular filtering runs in $O(1)$ (constant time independent of word count). In the typical use case where non-squares vastly outnumber perfect squares, filtering reduces the number of square root computations by a factor of over 1,000, yielding a dramatic overall performance improvement.

References

Zimmermann, P. (1999). "Karatsuba Square Root". INRIA Research Report.
Bernstein, D. J. "Detecting perfect powers in essentially linear time".
Crandall, R. and Pomerance, C. Prime Numbers: A Computational Perspective. Springer, 2005.