Polynomial Factorization
Overview
Factoring an integer-coefficient polynomial $f \in \mathbb{Z}[x]$ into irreducible factors is one of the central operations in computer algebra. sangi implements it as a pipeline:
- Content extraction — split off the GCD of all coefficients.
- Square-free decomposition — Yun's algorithm.
- Finite-field factorization — Cantor-Zassenhaus (DDF + EDF).
- Hensel lifting — from $\mathrm{GF}(p)$ to $\mathbb{Z}$.
- Combination recovery — Zassenhaus enumeration or van Hoeij LLL.
Related API: factorize, squareFreeFactorization.
Pipeline at a glance
A typical call to factorize(f) unfolds as:
$$f = c_0 \cdot \mathrm{prim}(f) \;\to\; \prod_i f_i^{e_i} \;\to\; \prod_{i,j} g_{i,j}.$$
Here $c_0$ is the content (GCD of all coefficients), $\mathrm{prim}(f)$ is the primitive part, $f_i^{e_i}$ are the Yun factors, and $g_{i,j}$ are the irreducible factors obtained by factoring each $f_i$. The final form is $f = c_0 \prod_{i,j} g_{i,j}^{e_i}$.
Related articles: Factorization over finite fields / Factorization over the integers
Yun's Square-Free Decomposition
Decompose $f = \prod_i f_i^i$ where each $f_i$ is square-free and the $f_i$ are pairwise coprime. Yun's algorithm (1976) refines the naive "$\gcd(f, f')$ then divide" loop.
Algorithm
Assume $\mathrm{char}(K) = 0$ or $\mathrm{char}(K) \nmid \deg f$:
- $g := \gcd(f, f')$, $C_1 := f/g$, $D_1 := f'/g - C_1'$.
- Iterate: $a_i := \gcd(C_i, D_i)$, $C_{i+1} := C_i / a_i$, $D_{i+1} := D_i / a_i - C_{i+1}'$.
- Output $f_i = a_i$. Continue until $C$ becomes a constant.
Complexity: dominated by GCD computations; polynomial in the input size when built on top of SR-PRS.
sangi implementation
squareFreeFactorization(f) implements Yun on top of the SR-PRS-based intPolyGcd.
It does not test irreducibility, so the output $f_i$ are passed to the next stage for irreducible factorization.
Factorization over GF(p)
Reducing a square-free $\mathbb{Z}[x]$ factor modulo a prime $p$ gives a square-free polynomial over $\mathrm{GF}(p)[x]$ (as long as $p \nmid \mathrm{disc}(f)$). Its irreducible decomposition is computed via linear algebra (Berlekamp) or probabilistic GCDs (Cantor-Zassenhaus).
Berlekamp's method (1967)
Consider the Frobenius map $\phi: x \mapsto x^p$ acting on $\mathbb{F}_p[x]/(f)$ and compute a basis of its eigenspace (the Berlekamp subalgebra) via linear algebra. For each basis vector $v(x)$, iterating $\gcd(f, v(x) - c)$ over $c \in \mathrm{GF}(p)$ peels off the irreducible factors. Best when $p$ is small.
Cantor-Zassenhaus (1981)
A two-stage probabilistic algorithm:
- DDF (distinct-degree factorization): for $d = 1, 2, \ldots$, compute $\gcd(f, x^{p^d} - x)$ to peel off the product of all irreducible factors of degree $d$.
- EDF (equal-degree factorization): with a random polynomial $h(x)$, compute $\gcd(f, h^{(p^d-1)/2} - 1)$ to split the same-degree product with probability $\approx 1/2$.
CZ outperforms Berlekamp's linear algebra ($O(n^3)$) when $p$ is large, and is used in sangi's fast path.
Related article: Factorization over finite fields
Hensel Lifting
Hensel lifting takes the factorization $f \equiv f_1 f_2 \cdots f_r \pmod{p}$ and doubles its modulus iteratively to reach $f \equiv F_1 F_2 \cdots F_r \pmod{p^k}$. The target $k$ is chosen from the Mignotte bound $\|f\|_\infty \cdot \binom{n}{n/2}$ so that $p^k$ uniquely determines the integer factor coefficients.
Linear Hensel (one factor pair)
If $f \equiv u v \pmod{p^k}$ with $\gcd(u, v) = 1$ in $\mathrm{GF}(p)$, then lift to $\bmod p^{2k}$ via:
$$U = u + p^k \cdot s, \quad V = v + p^k \cdot t.$$
$s, t$ are linear corrections derived from the Bezout coefficients. Each iteration doubles the precision ($p^k \to p^{2k}$), so $\log k$ iterations suffice.
Multi-factor Hensel (binary tree)
For $r$ factors, organize them into a binary tree and lift one level at a time from leaves toward the root. Total cost is roughly $O(M(n p^k) \log r)$. sangi's fast path uses this binary-tree Hensel.
Related article: Factorization over the integers
Zassenhaus Combination Recovery
After Hensel lifting we have $f \equiv F_1 F_2 \cdots F_r \pmod{p^k}$, but the true integer factor over $\mathbb{Z}[x]$ may correspond to a subset product of the $F_i$ (an irreducible factor over $\mathbb{Z}[x]$ can split into multiple factors over $\mathrm{GF}(p)$). Choosing the correct subset is the remaining problem.
Zassenhaus enumerates the $2^r$ subsets in order of increasing size and tries to divide $f$ by the (normalized) subset product mod $p^k$:
- Size-1 subsets: $r$ candidates; each $F_i$ is tested as a trial divisor.
- Size-2: $\binom{r}{2}$ candidates; trial-divide by $F_i F_j$.
- ...up to size $\lfloor r/2 \rfloor$.
The enumeration is tractable when $r$ is small ($\leq 12$ or so).
sangi's factorize uses this path for $r \leq 12$.
Van Hoeij's LLL Method (Hard Cases)
For adversarial inputs such as Swinnerton-Dyer polynomials $\prod (x \pm \sqrt{p_1} \pm \sqrt{p_2} \pm \cdots)$, $r$ grows with the degree and Zassenhaus's $2^r$ enumeration becomes infeasible. Van Hoeij (2002) recast the subset selection as a shortest-vector problem in an integer lattice, achieving polynomial time.
Idea
Build a lattice whose rows encode $\log$-like data of the Hensel factors $F_i$ and apply LLL basis reduction (Lenstra-Lenstra-Lovász, 1982) to find short vectors. Short vectors correspond to $0/1$ combinations, which directly reveal the true integer factor selection.
Complexity: $O(n^{O(1)})$ — polynomial time, depending on LLL precision parameters.
sangi's factorize switches to this path for $r > 12$.
The LLL implementation is bundled in linalg/LLL.hpp, so no additional linkage is required.
Related article: Factorization over the integers — Hensel lifting and LLL
factorize API Structure
sangi exposes a single factorize(f) entry point that dispatches internally between rational-root searches, Cantor-Zassenhaus, Hensel lifting, Zassenhaus enumeration, and van Hoeij LLL based on input degree and the Hensel factor count $r$.
Supported coefficient types are int, int64_t, Int, and Rational; the full pipeline is available without extra linkage.
| Stage | Path | Notes |
|---|---|---|
| Small inputs ($\deg \lesssim 10$) | rational-root theorem + trial division | fast path when coefficients are small |
| Factoring over $\mathbb{F}_p$ | Cantor-Zassenhaus | probabilistic, expected $O(n^2 \log p)$ |
| Lift to $\mathbb{Z}$ | Hensel lifting | $O(M(n L) \log L)$ |
| Combination recovery, $r \leq 12$ | Zassenhaus enumeration | $2^r$ subsets |
| Combination recovery, $r > 12$ | van Hoeij LLL | polynomial time, bundled linalg/LLL.hpp |
Int and Rational coefficients are delegated to the int64_t or Int code path internally.
Complexity and Practical Limits
| Stage | Complexity | Notes |
|---|---|---|
| Content + rational roots | $O(n \cdot d(c_0) \cdot d(c_n))$ | $d$ counts divisors |
| Yun square-free | $O(n^2)$ polynomial ops | via SR-PRS GCD |
| Cantor-Zassenhaus | $O(n^2 \log p)$ expected | probabilistic |
| Hensel lifting | $O(M(n L) \log L)$ | $L = k \log p$ |
| Zassenhaus enumeration | $O(2^r \cdot n)$ | practical for $r \leq 12$ |
| van Hoeij LLL | polynomial time | required for $r > 12$ |
In practice, $\deg f \leq 100$ with coefficient bit-length $\leq 1024$ completes under a second for most inputs. Swinnerton-Dyer-style adversarial inputs break the Zassenhaus path and require van Hoeij.
Design Decisions
Single factorize API
factorize() is a single entry point that ships the full pipeline — Cantor-Zassenhaus, Hensel lifting, Zassenhaus enumeration, and van Hoeij LLL — without separate build tiers or extra linkage.
Users only choose the coefficient type and degree of the input polynomial.
Why $r \leq 12$ for Zassenhaus
$2^r$ enumeration is 4,096 trials at $r = 12$, 65,536 at $r = 16$, and a million at $r = 20$. With trial-division cost $O(n L)$ dominating, empirical measurements place the break-even with van Hoeij LLL near $r = 12$. Beyond that, the LLL setup pays off and the polynomial-time path wins.
References
- Berlekamp, E. R. (1967). "Factoring polynomials over finite fields". Bell System Technical Journal, 46(8), 1853–1859.
- Cantor, D. G. and Zassenhaus, H. (1981). "A new algorithm for factoring polynomials over finite fields". Mathematics of Computation, 36(154), 587–592.
- Yun, D. Y. Y. (1976). "On square-free decomposition algorithms". SYMSAC '76, 26–35.
- Zassenhaus, H. (1969). "On Hensel factorization, I". Journal of Number Theory, 1(3), 291–311.
- van Hoeij, M. (2002). "Factoring polynomials and the knapsack problem". Journal of Number Theory, 95(2), 167–189.
- Lenstra, A. K., Lenstra, H. W., and Lovász, L. (1982). "Factoring polynomials with rational coefficients". Mathematische Annalen, 261(4), 515–534.