Complex Arithmetic and Representation
Overview
Complex<T> is implemented as a simple two-member structure
holding the real part re and the imaginary part im directly.
The template parameter T accepts any type satisfying the ComplexScalar concept,
including double, float, Int, Float, and Rational.
Arithmetic dispatch depends on the characteristics of T.
For multiprecision types, where a real multiplication is far more expensive than a real addition,
Strassen's three-real-multiplication trick is used.
For floating-point types, Smith's method guards complex division against intermediate overflow.
For API usage, see Complex API Reference — Arithmetic Operators.
Internal Representation and Memory Layout
The class body consists of two public member variables only.
template <ComplexScalar T>
struct Complex {
T re;
T im;
// ...
};
There are no virtual functions or extra metadata, so $\operatorname{sizeof}(\texttt{Complex}\langle T\rangle) = 2 \operatorname{sizeof}(T)$.
The following static_assert checks verify the layout at compile time:
sizeof(Complex<T>) == 2 * sizeof(T)offsetof(Complex<T>, re) == 0offsetof(Complex<T>, im) == sizeof(T)
This layout guarantee allows an array of Complex<double> to be reinterpreted
as an array of std::complex<double> or C99 _Complex double
via reinterpret_cast.
This is the requirement for zero-copy interoperation with C libraries such as FFTW.
Related articles: Definition and Basic Properties of Complex Numbers / The Complex Plane
Addition and Subtraction
Addition and subtraction are simple component-wise operations.
$$(a+bi) \pm (c+di) = (a \pm c) + (b \pm d) i$$
Each operation issues two real additions (or subtractions) in T.
For T = double, the compiler may fold these into a single SSE / AVX packed-double instruction.
For multiprecision types (Int, Float), the T add/sub is invoked directly
and runs in linear time $O(\max(\text{words}(a), \text{words}(c)))$ for each part.
Complexity: two real additions or subtractions.
Scalar mixed operations $z \pm s$ act on the real part only:
$$(a+bi) \pm s = (a \pm s) + b i$$
The implementation takes the path z.re += s so that no copy or assignment is performed
on the imaginary part's T object.
Multiplication — Strassen's Three-Real-Multiplication Trick
Naive Four-Multiplication Form
The textbook formula uses four real multiplications and two real additions:
$$(a+bi)(c+di) = (ac - bd) + (ad + bc) i$$
For T = double / float, sangi uses this naive form.
Floating-point multiplication has comparable throughput to floating-point addition,
so spending two extra additions to save one multiplication is rarely advantageous.
Karatsuba Symmetric Form (Three Real Multiplications)
For multiprecision types (Int, Float, Rational),
a real multiplication can cost tens to hundreds of times more than a real addition.
Trading additions for multiplications becomes a net win,
and sangi applies Karatsuba's algorithm to the degree-one polynomial $a + b X$ with $X = i$, $X^2 = -1$ in the following symmetric three-multiplication form:
$$\begin{aligned} ac &= a \cdot c, \\ bd &= b \cdot d, \\ (a+bi)(c+di) &= (ac - bd) \;+\; \bigl((a+b)(c+d) - ac - bd\bigr)\,i. \end{aligned}$$
Three real multiplications ($ac$, $bd$, $(a+b)(c+d)$) and five real additions reproduce the product. This form is mathematically equivalent to Strassen's $k_1 = (a+b)c$, $k_2 = a(d-c)$, $k_3 = b(c+d)$ identity; sangi prefers the symmetric layout because additions distribute evenly across the real and imaginary parts.
Identity Verification
Expanding confirms equality:
$$\begin{aligned} \mathrm{Re} &= ac - bd, \\ \mathrm{Im} &= (a+b)(c+d) - ac - bd = ac + ad + bc + bd - ac - bd = ad + bc. \end{aligned}$$
Complexity: three real multiplications and five real additions. With $M(n)$ for multiplication and $A(n)$ for addition, the total is $3 M(n) + 5 A(n)$. Since $A(n) = O(n)$ and $M(n)$ is super-linear above the Karatsuba threshold, this is smaller than $4 M(n) + 2 A(n)$.
Division — Smith's Method
Naive Formula and Overflow
The textbook formula is:
$$\frac{a+bi}{c+di} = \frac{(ac + bd) + (bc - ad) i}{c^2 + d^2}$$
For T = double, computing $c^2 + d^2$ directly can overflow when $|c|$, $|d|$ are large,
or underflow when they are small.
For instance, $c = d = 10^{200}$ yields $c^2 + d^2 = 2 \times 10^{400}$, beyond the range of double.
Smith's Method
When T is a floating-point type, sangi uses Smith's (1962) method.
By normalizing with the larger of $|c|$ and $|d|$, every intermediate value is kept within the input magnitudes.
If $|c| \geq |d|$:
$$r = d / c, \quad t = c + d \, r, \quad \mathrm{re} = (a + b r) / t, \quad \mathrm{im} = (b - a r) / t.$$
If $|c| < |d|$:
$$r = c / d, \quad t = d + c \, r, \quad \mathrm{re} = (a r + b) / t, \quad \mathrm{im} = (b r - a) / t.$$
In either branch, $|r| \leq 1$ and $t$ stays on the scale of $\max(|c|, |d|)$. The total cost is one branch, four divisions, three multiplications, and three additions or subtractions.
Multiprecision Types
For T = Int / Float / Rational, the notion of overflow differs.
Int is arbitrary precision, and Float has a vast exponent range,
so $c^2 + d^2$ can be computed safely.
In this case sangi uses the naive formula with $\mathrm{denom} = c^2 + d^2$.
The implementation calls denom = normSq(w) and constructs
the real part $(a c + b d) / \mathrm{denom}$ and the imaginary part $(b c - a d) / \mathrm{denom}$.
If $c + di = 0$, the result is NaN; no exception is thrown and the program does not abort.
Smith, R. L. (1962). "Algorithm 116: Complex division". Communications of the ACM, 5 (8), 435.
Absolute Value and Norm — hypot Safety
normSq
The squared norm is computed straightforwardly:
$$|z|^2 = a^2 + b^2$$
This is a common building block (e.g. in FFT) and is faster than the absolute value because no square root is taken.
abs
The absolute value $|z| = \sqrt{a^2 + b^2}$, computed naively as sqrt(a*a + b*b),
can overflow or underflow at the $a^2$ or $b^2$ stage.
For floating-point T, sangi calls std::hypot(a, b).
This function is required by IEEE 754 to compute $|z|$ without intermediate overflow.
Typical implementations normalize by $m = \max(|a|, |b|)$ and return $m \sqrt{(a/m)^2 + (b/m)^2}$.
For multiprecision T, sangi implements the same normalization strategy
by combining multiprecision abs() / normSq() and sqrt().
Because Float can address an enormous exponent range, ordinary scientific computation rarely overflows,
but routing the denominator $c^2 + d^2$ through the same square-root path keeps the code path consistent.
Argument and Conjugation
arg
The argument is computed via atan2:
$$\arg(z) = \operatorname{atan2}(b, a) \in (-\pi, \pi]$$
The principal value is returned, with $\arg(0) = 0$ following the atan2(0, 0) convention.
The branch cut lies on the negative real axis ($a < 0$, $b = 0$).
$\arg \to \pi$ as $b \to 0^+$ and $\arg \to -\pi$ as $b \to 0^-$.
conj
Conjugation $\bar{z} = a - bi$ simply flips the sign of the imaginary part — $O(1)$.
$$|z|^2 = z \cdot \bar{z}$$
could be used to compute the squared norm, but sangi prefers the symmetric $a^2 + b^2$ form because it avoids catastrophic cancellation between real-part and imaginary-part contributions.
Related articles: Complex Plane — Magnitude and Argument / Branch Cuts
Polar Form and expI
polar(r, θ)
Builds a complex number from polar coordinates:
$$\operatorname{polar}(r, \theta) = r \cos\theta + i \, r \sin\theta$$
The real part $r \cos\theta$ and the imaginary part $r \sin\theta$ are computed and
a Complex<T> is constructed directly.
For multiprecision types, T::sin and T::cos are invoked,
triggering precision-controlled series expansion and range reduction internally.
expI(ω) — FFT Twiddle Factors
For the common case $e^{i\omega} = \cos\omega + i \sin\omega$ in FFT and NTT,
a dedicated function avoids the two multiplications by $r = 1$ that polar would perform.
$$\operatorname{expI}(\omega) = e^{i\omega} = \cos\omega + i \sin\omega$$
The twiddle factors $W_N^k = e^{-2\pi i k/N}$ used in FFT are obtained as:
for (int k = 0; k < N; ++k) {
auto W_k = expI(-2.0 * M_PI * k / N); // k-th N-th root of unity
}
Euler's Formula
At $\omega = \pi$, expI(M_PI) ≈ -1 + 0i.
Floating-point rounding leaves a small imaginary residue,
but multiprecision types reach $-1$ with high precision because
T::pi(precision) and the high-precision trigonometric functions cooperate.
Related articles: Euler's Formula / Complex Exponential and Trigonometric Functions
Design Decisions
Why the Multiplication Form Differs Between Floating-Point and Multiprecision
In floating-point arithmetic, multiplication and addition have similar throughput,
and modern x86 / ARM CPUs may even fuse multiply-add into a single FMA instruction.
In this setting Strassen's trade-off (one fewer multiplication, three more additions) yields no net benefit
and may even be slightly slower.
Therefore T = double / float uses the naive four-multiplication form.
For multiprecision types, a real multiplication is $10^1$ to $10^3$ times more expensive than a real addition, and reducing the multiplication count dominates the cost balance. In that regime Strassen's three-multiplication form always wins.
Branch Cost in Smith's Method
Smith's division branches once on $|c|$ vs $|d|$. This branch is well predicted in practice (typical inputs are heavily skewed to one side), and the resulting pipeline impact on modern CPUs is negligible compared with the catastrophic cost of overflow in the alternative.
NaN Safety
$z / 0$ returns IEEE 754 NaN rather than throwing.
This matches the behaviour of std::complex and Python cmath.
Multiprecision types also propagate NaN, so a zero divisor anywhere in a computation chain does not abort the program.
References
- Smith, R. L. (1962). "Algorithm 116: Complex division". Communications of the ACM, 5 (8), 435.
- Karatsuba, A. and Ofman, Yu. (1962). "Multiplication of Multidigit Numbers on Automata". Doklady Akademii Nauk SSSR, 145, 293–294.
- Knuth, D. E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison-Wesley, 1997.