Int Division Algorithms

Overview

Division of Int values automatically selects the optimal algorithm based on the word count of the divisor. There is no need for the user to choose explicitly. For small divisors, the low-overhead Schoolbook division is used; as the divisor grows larger, the implementation switches to Burnikel-Ziegler, which leverages fast multiplication. When the dividend is extremely large relative to the divisor (an unbalanced case), Newton reciprocal iteration is employed.

One word = 64 bits, corresponding to approximately 19 decimal digits.

For API usage, see Int API Reference — Arithmetic Operators. Additionally, specialized division variants such as floorDiv, ceilDiv, and divExact are available through IntOps.

Algorithm Selection and Thresholds

calx determines thresholds based on empirical benchmarks. The following table shows the algorithm selection for $a \div b$ with respect to the size of the divisor $b$ and its relationship to the dividend $a$.

Case	Algorithm	Complexity
1 word $\div$ 1 word	Native 64-bit division	$O(1)$
Divisor $< 64$ words	Schoolbook (Knuth Algorithm D)	$O(nm)$
Divisor $\geq 64$ words	Burnikel-Ziegler	$O(M(n) \log n)$
Unbalanced (dividend $\gg$ divisor)	Newton reciprocal iteration	$O(M(n))$
Power-of-two divisor	Right shift	$O(n)$

Here $n$ is the word count of the dividend, $m$ is the word count of the divisor, and $M(n)$ is the cost of multiplying two $n$-word numbers.

Unlike multiplication, asymmetry between operands is fundamental to division. The size ratio of dividend to divisor significantly affects algorithm selection.

Thresholds are based on benchmarks on the target hardware (x86-64, AVX2). Optimal thresholds may differ on other architectures.

Schoolbook Division (Knuth Algorithm D)

This is the classical long division algorithm described in Knuth's The Art of Computer Programming Vol. 2, §4.3.1, performed in units of 64-bit words.

Normalization

Before starting the division, both the dividend and divisor are left-shifted so that the most significant bit of the divisor's most significant word (MSW) becomes 1. This normalization improves the accuracy of the trial quotient estimate and limits the number of correction steps to at most 2.

$$d' = d \cdot 2^s, \quad a' = a \cdot 2^s \quad (s = \text{clz}(d_{\text{MSW}}))$$

Here $\text{clz}$ denotes the count of leading zeros.

Trial Quotient Estimation

At each iteration, the trial quotient $\hat{q}$ is obtained by dividing the top two words of the dividend by the most significant word of the divisor:

$$\hat{q} = \left\lfloor \frac{a'_{j+n} \cdot B + a'_{j+n-1}}{d'_{n-1}} \right\rfloor \quad (B = 2^{64})$$

Thanks to normalization, $\hat{q}$ satisfies $q_j \leq \hat{q} \leq q_j + 2$ with respect to the true quotient $q_j$. Therefore, the correct quotient is obtained after at most 2 correction subtractions.

Complexity: $O(nm)$ ($n$ = word count of the dividend, $m$ = word count of the divisor)

When the divisor is less than 64 words, the overhead of divide-and-conquer exceeds the cost of Schoolbook, making Schoolbook the fastest option.

Burnikel-Ziegler Division

This is a division algorithm published by Burnikel and Ziegler in 1998 that leverages fast multiplication. By reducing division to multiplication, it runs faster than Schoolbook division for large operands.

Basic Idea

The problem of dividing a $2n$-word dividend by an $n$-word divisor is split into subproblems of roughly $3n/2$ words, which are solved recursively. At each recursive step, multiplication (Karatsuba / Toom-Cook / NTT) is used to verify the quotient and remainder, rather than Schoolbook division.

Specifically, the dividend $a$ is split into an upper block $a_1$ and a lower block $a_0$. First, $a_1$ is divided by the divisor $d$ to obtain a partial quotient $q_1$ and remainder $r_1$. Then the concatenation of $r_1$ and $a_0$ is divided by $d$ to obtain $q_0$ and $r_0$. The final quotient is $q = q_1 \cdot B^k + q_0$, and the remainder is $r = r_0$.

Complexity: $O(M(n) \log n)$ ($M(n)$ is the cost of multiplying two $n$-word numbers)

This algorithm is used when the divisor is 64 words or more. Because it can leverage fast multiplication algorithms (Karatsuba / Toom-Cook / NTT), it significantly outperforms Schoolbook division for large divisors.

Newton Reciprocal Iteration

This algorithm is specialized for unbalanced cases where the dividend is extremely large relative to the divisor. It computes an approximate reciprocal $1/d$ of the divisor $d$ via Newton's method, then obtains the quotient by multiplying with the dividend.

Newton Iteration

Newton's method applied to finding the root of $f(x) = 1/x - d$ yields the following iteration:

$$x_{k+1} = 2x_k - d \cdot x_k^2$$

This iteration converges quadratically — that is, the number of correct bits doubles with each step. The initial value is computed as a low-precision reciprocal from the top few words of the divisor, and the iteration continues until the target precision is reached.

Computing the Quotient

Once a sufficiently precise reciprocal $x \approx 1/d$ is obtained, the quotient is computed as:

$$q \approx a \times x$$

The remainder is then calculated as $r = a - q \cdot d$, and the quotient is adjusted if necessary to ensure $0 \leq r < d$.

Complexity: $O(M(n))$ — equivalent to a single multiplication

The implementation follows GMP's mpn_ni_invertappr (mu_div_qr). Since computing the reciprocal itself incurs significant cost, this method is used only when the size ratio of dividend to divisor is sufficiently large (dynamic threshold).

Division Variants

In addition to the standard truncated division (operator/), calx provides the following division variants through IntOps. For API details, see the Int API Reference.

floorDiv — Floor Division

$$\text{floorDiv}(a, b) = \lfloor a / b \rfloor$$

This is equivalent to Python's // operator. While C++'s operator/ truncates toward zero, floorDiv always rounds toward negative infinity. The results differ when negative operands are involved:

$$(-7) / 2 = -3 \quad (\text{C++ truncation}), \quad \text{floorDiv}(-7, 2) = -4 \quad (\text{floor division})$$

ceilDiv — Ceiling Division

$$\text{ceilDiv}(a, b) = \lceil a / b \rceil$$

Division that rounds toward positive infinity. This is useful in scenarios such as computing memory allocation sizes — determining "how many units are needed at minimum."

$$\text{ceilDiv}(10, 3) = 4, \quad \text{ceilDiv}(9, 3) = 3$$

divExact — Exact Division

Used when it is known in advance that the dividend is exactly divisible by the divisor. Because the remainder verification is skipped, it runs faster than regular division.

The result is undefined if the input is not exactly divisible.

Design Decisions

Why the Schoolbook / Burnikel-Ziegler Threshold Is 64 Words

Burnikel-Ziegler uses multiplication internally, which introduces overhead. For small divisors, the $O(nm)$ Schoolbook algorithm has a lower constant factor. Benchmarks on x86-64 showed that 64 words (approximately 1,200 digits) is the crossover point.

Why Newton Reciprocal Iteration Is Limited to Unbalanced Cases

Although Newton reciprocal iteration is asymptotically the fastest at $O(M(n))$, computing the reciprocal itself costs the equivalent of several multiplications. When the dividend and divisor are of similar size, the overhead of the reciprocal computation exceeds the cost of the division itself. The reciprocal approach is beneficial in cases such as:

The dividend is significantly larger than the divisor (the reciprocal computation cost becomes relatively small compared to the overall division)
Dividing repeatedly by the same divisor (the reciprocal can be reused)

calx dynamically evaluates the size ratio of dividend to divisor and switches to Newton only when it is advantageous.

References

Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.3.1.
Burnikel, C. and Ziegler, J. (1998). "Fast Recursive Division". Max-Planck-Institut für Informatik Research Report, MPI-I-98-1-022.
GMP Development Team. GNU Multiple Precision Arithmetic Library. https://gmplib.org/