Polynomial Arithmetic Algorithms

Overview

The sangi Polynomial<T> template accepts an arbitrary numerical coefficient type (double, Rational, Complex<T>, Int, etc.). Basic arithmetic, evaluation, differentiation, and integration are all header-only.

For the API surface see Polynomial API. This page focuses on the internal representation and the choice of arithmetic algorithms.

Internal Representation (dense, ascending order)

Polynomial<T> stores coefficients in a dense std::vector<T>. Indices are in ascending power order: coeffs_[i] holds the coefficient $c_i$ of $x^i$.

$$p(x) = c_0 + c_1 x + c_2 x^2 + \cdots + c_n x^n$$

Dense vs Sparse

There are two common representations:

  • Dense: all coefficients (including zeros) stored contiguously. Coefficient access, addition, and Horner evaluation are $O(1)$ index operations.
  • Sparse: only non-zero terms stored as (exponent, coefficient) pairs. Favorable for very sparse polynomials such as $x^{10^6} + 1$.

sangi uses dense storage. The target workloads (root finding, factorization, calculus, orthogonal polynomials) have degrees in the tens to low hundreds, where dense contiguous memory wins on cache locality, vectorization, and direct interfacing with FFT. Extremely sparse polynomials can be handled by RationalFunction or MultivariatePolynomial, which use sparse internals.

Why ascending order

Mapping $c_i$ to index $i$ has several practical benefits:

  • Shifting $p(x)$ by $x^k$ becomes a push_back at the tail rather than a left shift at the head — amortized $O(1)$.
  • The index convention matches FFT/NTT polynomial multiplication, so no conversion layer is needed.
  • Differentiation reduces to "multiply the coefficient at index $i$ by $i$ and place it at index $i-1$".

Descending order (the classroom Horner convention) is used only for human-readable output, never internally.

Addition and Subtraction

The sum of $p = \sum c_i x^i$ and $q = \sum d_i x^i$ is obtained by adding coefficients of the same degree:

$$(p + q)(x) = \sum_{i=0}^{\max(\deg p, \deg q)} (c_i + d_i) x^i$$

Complexity: $O(\max(n, m))$

The result vector is resized to the longer of the two and then passed through normalize() to strip trailing zero coefficients and determine the actual degree. Subtraction is symmetric.

Multiplication — Naive, Karatsuba, FFT

Polynomial multiplication is the convolution of coefficient sequences:

$$(p \cdot q)(x) = \sum_{k=0}^{n+m} \left( \sum_{i+j=k} c_i d_j \right) x^k$$

Naive Multiplication (default)

A direct $O(nm)$ double loop. With degrees in the tens to low hundreds, the constant factor is small enough to be the fastest path in practice. Polynomial::operator* uses this by default.

Complexity: $O(nm)$ coefficient multiplications and additions

Karatsuba Polynomial Multiplication

Split the polynomial into high and low halves and reduce four half-size products to three. Karatsuba on polynomials applies to the number of terms and gives $O(n^{\log_2 3}) = O(n^{1.585})$. When $T$ is Int or Rational and the degree is large enough, Karatsuba beats the naive loop.

Karatsuba on coefficient bit-length (in Int) and Karatsuba on term count (in Polynomial) compose independently: for large degrees with large coefficients, both savings multiply.

FFT Polynomial Multiplication

Treating the coefficient sequence as a signal, multiplication becomes pointwise product in the transform domain:

$$p \cdot q = \mathcal{F}^{-1}\!\left( \mathcal{F}(p) \odot \mathcal{F}(q) \right)$$

Complexity: $O(n \log n)$

Useful when $T$ is double or Complex<double> and the degree is large. For typical Polynomial workloads (degree of a few hundreds or less) the FFT setup cost rarely beats the naive loop. To preserve exact integer coefficients without rounding, the NTT (number-theoretic transform) path can be used instead.

For why sangi uses NTT for arbitrary-precision integer multiplication and the implementation details, see Int Multiplication — NTT. The notion of "length" differs between polynomials (term count) and integers (word count), so the two paths and thresholds are managed separately.

Evaluation by Horner's Method

Naively evaluating $p(x) = c_0 + c_1 x + \cdots + c_n x^n$ by computing each $x^k$ separately requires $O(n^2)$ multiplications, or $O(n)$ with precomputed powers but still $O(n)$ extra multiplications. Horner's method rewrites it as a nested form needing exactly $n$ multiplications and $n$ additions:

$$p(x) = c_0 + x \bigl( c_1 + x \bigl( c_2 + x ( \cdots + x \, c_n) \bigr) \bigr)$$

sangi's Polynomial::operator()(x) uses this scheme. The loop runs in descending order:

T result = coeffs_[n];
for (size_t i = n; i-- > 0; )
    result = result * x + coeffs_[i];
return result;

Complexity: $n$ multiplications + $n$ additions (i.e. $O(n)$ coefficient operations)

When x is a polynomial or matrix

The argument x need not be of type $T$. Passing a Polynomial<T> performs polynomial composition $p(q(x))$, and passing a matrix produces the matrix polynomial $p(A)$. Horner's recurrence only requires associative multiplication, so the same code works for all of these.

Estrin's scheme (a high-parallelism variant)

evaluateEstrin(x) first computes $x^2, x^4, x^8, \ldots$, groups coefficients into $c_{2i} + c_{2i+1} x$ pairs, and combines them via powers. It uses slightly more multiplications than Horner but has a shallower dependency chain, which benefits SIMD and pipelined execution. Selection between Horner and Estrin is explicit.

Synthetic Division

When the divisor is a linear factor $x - r$, polynomial division collapses to a forward recurrence on the coefficient list. This is synthetic division (Ruffini-Horner):

$$p(x) = (x - r) \cdot q(x) + p(r)$$

The quotient $q(x) = \sum_{i=0}^{n-1} b_i x^i$ satisfies the recurrence

$$b_{n-1} = c_n, \quad b_{i-1} = c_i + r \cdot b_i \quad (i = n-1, \ldots, 1),$$

and the remainder is $p(r) = c_0 + r \cdot b_0$. The recurrence is exactly Horner evaluation: evaluation and division use the same procedure.

Complexity: $O(n)$

The factorization pipeline uses synthetic division to remove $(x - r)$ when $r$ is a candidate root produced by the rational root theorem. The general-purpose divmod() performs $O(n \cdot m)$ long division but dispatches to synthetic division internally when the divisor is linear.

polyder / polyint and Automatic Differentiation

Differentiation and integration of polynomials are simple coefficient transformations.

Differentiation (polyder)

$$p(x) = \sum_{i=0}^n c_i x^i \;\Rightarrow\; p'(x) = \sum_{i=1}^n i \cdot c_i \, x^{i-1}$$

sangi's derivative() shifts the coefficient vector down by one and multiplies each element by its index — an $O(n)$ operation. For the $n$-th derivative, derivative(n) multiplies each $c_i$ by $i!/(i-n)!$ and drops the leading $n$ entries in a single pass.

Integration (polyint)

$$\int p(x)\,dx = C + \sum_{i=0}^n \frac{c_i}{i+1} x^{i+1}$$

sangi's integral() fixes $C = 0$, prepends a zero to the coefficient vector, and scales each entry by $1/(i+1)$. With $T = $ Rational the result is exact; with $T = $ double it is floating-point.

Connection to automatic differentiation

When $p(x_0)$ and $p'(x_0)$ are needed simultaneously, a doubled Horner loop computes both in one pass:

T fx = coeffs_[n], dfx = T{0};
for (size_t i = n; i-- > 0; ) {
    dfx = dfx * x + fx;       // advance p'
    fx  = fx  * x + coeffs_[i]; // advance p
}
// fx = p(x), dfx = p'(x)

This is precisely the first-order forward-mode automatic differentiation specialized to Horner. The inner loops of Newton's and Halley's methods use this directly to compute $p(x)/p'(x)$ in a single pass.

One can equivalently call derivative() followed by two operator() invocations, but the doubled-Horner form avoids the intermediate allocation and is preferred inside iterative solvers.

Pseudo-Division and Pseudo-Remainder

For GCD and factorization of integer-coefficient polynomials, we want quotients and remainders to stay integral. Ordinary polynomial division assumes a coefficient field ($\mathbb{Q}$ or $\mathbb{R}$) and is not closed over $\mathbb{Z}[x]$.

Pseudo-division pre-multiplies the dividend by a power of the divisor's leading coefficient to keep the result integral:

$$\mathrm{lc}(g)^{\deg f - \deg g + 1} \cdot f \;=\; q \cdot g + r, \quad \deg r < \deg g.$$

Example: divide $f = x^2$ by $g = 2x + 1$ in $\mathbb{Z}[x]$. Multiplying $f$ by $\mathrm{lc}(g)^{2-1+1} = 4$:

$$4 x^2 = (2x - 1)(2x + 1) + 1.$$

Pseudo-quotient $q = 2x - 1$ and pseudo-remainder $r = 1$ are both integral.

Subresultant PRS

A naive pseudo-remainder sequence has exponentially growing coefficients because the leading-coefficient power accumulates at every step. The subresultant PRS (subresultant pseudo-remainder sequence) divides out an explicit common factor at each step, keeping coefficient growth polynomial. sangi's intPolyGcd() uses this algorithm, and pseudoRemainder() is exposed as its building block.

Related articles: Polynomial GCD / Resultant

Design Decisions

Unifying dense + ascending order

Polynomials are foundational in computer algebra, numerical analysis, and signal processing. Divergent representations across modules incur conversion costs. sangi uses dense + ascending power order everywhere — Polynomial, orthogonal polynomial generators, Chebyshev interpolation, FFT, and root finding all share the same coeffs_ convention.

Why naive multiplication by default

Karatsuba and FFT only beat naive $O(nm)$ multiplication once the degree is large enough, and the precise crossover depends on $T$ and the cost of $T \times T$. For the typical Polynomial uses (inner loops of root finding/factorization, three-term recurrences for orthogonal polynomials) the degree stays well below the crossover, so the default operator* takes the naive path. FFT convolution is still callable via math/fft when needed.

References

  • Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997.
  • von zur Gathen, J. and Gerhard, J. Modern Computer Algebra. 3rd ed. Cambridge University Press, 2013.
  • Brown, W. S. (1971). "On Euclid's algorithm and the computation of polynomial greatest common divisors". Journal of the ACM, 18(4), 478–504.