Int Division Algorithms
Overview
Division of Int values automatically selects the optimal algorithm based on the word count of the divisor.
There is no need for the user to choose explicitly.
For small divisors, the low-overhead Schoolbook division is used;
as the divisor grows larger, the implementation switches to Burnikel-Ziegler, which leverages fast multiplication.
When the dividend is extremely large relative to the divisor (an unbalanced case), Newton reciprocal iteration is employed.
One word = 64 bits, corresponding to approximately 19 decimal digits.
For API usage, see Int API Reference — Arithmetic Operators.
Additionally, specialized division variants such as floorDiv, ceilDiv, and divExact
are available through IntOps.
Algorithm Selection and Thresholds
calx determines thresholds based on empirical benchmarks. The following table shows the algorithm selection for $a \div b$ with respect to the size of the divisor $b$ and its relationship to the dividend $a$.
| Case | Algorithm | Complexity |
|---|---|---|
| 1 word $\div$ 1 word | Native 64-bit division | $O(1)$ |
| Divisor $< 64$ words | Schoolbook (Knuth Algorithm D) | $O(nm)$ |
| Divisor $\geq 64$ words | Burnikel-Ziegler | $O(M(n) \log n)$ |
| Unbalanced (dividend $\gg$ divisor) | Newton reciprocal iteration | $O(M(n))$ |
| Power-of-two divisor | Right shift | $O(n)$ |
Here $n$ is the word count of the dividend, $m$ is the word count of the divisor, and $M(n)$ is the cost of multiplying two $n$-word numbers.
Unlike multiplication, asymmetry between operands is fundamental to division. The size ratio of dividend to divisor significantly affects algorithm selection.
Thresholds are based on benchmarks on the target hardware (x86-64, AVX2). Optimal thresholds may differ on other architectures.
Schoolbook Division (Knuth Algorithm D)
This is the classical long division algorithm described in Knuth's The Art of Computer Programming Vol. 2, §4.3.1, performed in units of 64-bit words.
Normalization
Before starting the division, both the dividend and divisor are left-shifted so that the most significant bit of the divisor's most significant word (MSW) becomes 1. This normalization improves the accuracy of the trial quotient estimate and limits the number of correction steps to at most 2.
$$d' = d \cdot 2^s, \quad a' = a \cdot 2^s \quad (s = \text{clz}(d_{\text{MSW}}))$$
Here $\text{clz}$ denotes the count of leading zeros.
Trial Quotient Estimation
At each iteration, the trial quotient $\hat{q}$ is obtained by dividing the top two words of the dividend by the most significant word of the divisor:
$$\hat{q} = \left\lfloor \frac{a'_{j+n} \cdot B + a'_{j+n-1}}{d'_{n-1}} \right\rfloor \quad (B = 2^{64})$$
Thanks to normalization, $\hat{q}$ satisfies $q_j \leq \hat{q} \leq q_j + 2$ with respect to the true quotient $q_j$. Therefore, the correct quotient is obtained after at most 2 correction subtractions.
Complexity: $O(nm)$ ($n$ = word count of the dividend, $m$ = word count of the divisor)
When the divisor is less than 64 words, the overhead of divide-and-conquer exceeds the cost of Schoolbook, making Schoolbook the fastest option.
Related article: Basic Operations on Arbitrary-Precision Integers
Burnikel-Ziegler Division
This is a division algorithm published by Burnikel and Ziegler in 1998 that leverages fast multiplication. By reducing division to multiplication, it runs faster than Schoolbook division for large operands.
Basic Idea
The problem of dividing a $2n$-word dividend by an $n$-word divisor is split into subproblems of roughly $3n/2$ words, which are solved recursively. At each recursive step, multiplication (Karatsuba / Toom-Cook / NTT) is used to verify the quotient and remainder, rather than Schoolbook division.
Specifically, the dividend $a$ is split into an upper block $a_1$ and a lower block $a_0$. First, $a_1$ is divided by the divisor $d$ to obtain a partial quotient $q_1$ and remainder $r_1$. Then the concatenation of $r_1$ and $a_0$ is divided by $d$ to obtain $q_0$ and $r_0$. The final quotient is $q = q_1 \cdot B^k + q_0$, and the remainder is $r = r_0$.
Complexity: $O(M(n) \log n)$ ($M(n)$ is the cost of multiplying two $n$-word numbers)
This algorithm is used when the divisor is 64 words or more. Because it can leverage fast multiplication algorithms (Karatsuba / Toom-Cook / NTT), it significantly outperforms Schoolbook division for large divisors.
Related article: Advanced Operations on Arbitrary-Precision Integers
Newton Reciprocal Iteration
This algorithm is specialized for unbalanced cases where the dividend is extremely large relative to the divisor. It computes an approximate reciprocal $1/d$ of the divisor $d$ via Newton's method, then obtains the quotient by multiplying with the dividend.
Newton Iteration
Newton's method applied to finding the root of $f(x) = 1/x - d$ yields the following iteration:
$$x_{k+1} = 2x_k - d \cdot x_k^2$$
This iteration converges quadratically — that is, the number of correct bits doubles with each step. The initial value is computed as a low-precision reciprocal from the top few words of the divisor, and the iteration continues until the target precision is reached.
Computing the Quotient
Once a sufficiently precise reciprocal $x \approx 1/d$ is obtained, the quotient is computed as:
$$q \approx a \times x$$
The remainder is then calculated as $r = a - q \cdot d$, and the quotient is adjusted if necessary to ensure $0 \leq r < d$.
Complexity: $O(M(n))$ — equivalent to a single multiplication
The implementation follows GMP's mpn_ni_invertappr (mu_div_qr).
Since computing the reciprocal itself incurs significant cost,
this method is used only when the size ratio of dividend to divisor is sufficiently large (dynamic threshold).
Related article: Advanced Operations on Arbitrary-Precision Integers
Division Variants
In addition to the standard truncated division (operator/),
calx provides the following division variants through IntOps.
For API details, see the Int API Reference.
floorDiv — Floor Division
$$\text{floorDiv}(a, b) = \lfloor a / b \rfloor$$
This is equivalent to Python's // operator.
While C++'s operator/ truncates toward zero,
floorDiv always rounds toward negative infinity.
The results differ when negative operands are involved:
$$(-7) / 2 = -3 \quad (\text{C++ truncation}), \quad \text{floorDiv}(-7, 2) = -4 \quad (\text{floor division})$$
ceilDiv — Ceiling Division
$$\text{ceilDiv}(a, b) = \lceil a / b \rceil$$
Division that rounds toward positive infinity. This is useful in scenarios such as computing memory allocation sizes — determining "how many units are needed at minimum."
$$\text{ceilDiv}(10, 3) = 4, \quad \text{ceilDiv}(9, 3) = 3$$
divExact — Exact Division
Used when it is known in advance that the dividend is exactly divisible by the divisor. Because the remainder verification is skipped, it runs faster than regular division.
The result is undefined if the input is not exactly divisible.
Design Decisions
Why the Schoolbook / Burnikel-Ziegler Threshold Is 64 Words
Burnikel-Ziegler uses multiplication internally, which introduces overhead. For small divisors, the $O(nm)$ Schoolbook algorithm has a lower constant factor. Benchmarks on x86-64 showed that 64 words (approximately 1,200 digits) is the crossover point.
Why Newton Reciprocal Iteration Is Limited to Unbalanced Cases
Although Newton reciprocal iteration is asymptotically the fastest at $O(M(n))$, computing the reciprocal itself costs the equivalent of several multiplications. When the dividend and divisor are of similar size, the overhead of the reciprocal computation exceeds the cost of the division itself. The reciprocal approach is beneficial in cases such as:
- The dividend is significantly larger than the divisor (the reciprocal computation cost becomes relatively small compared to the overall division)
- Dividing repeatedly by the same divisor (the reciprocal can be reused)
calx dynamically evaluates the size ratio of dividend to divisor and switches to Newton only when it is advantageous.
References
- Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997. §4.3.1.
- Burnikel, C. and Ziegler, J. (1998). "Fast Recursive Division". Max-Planck-Institut für Informatik Research Report, MPI-I-98-1-022.
- GMP Development Team. GNU Multiple Precision Arithmetic Library. https://gmplib.org/