FFT — Fast Fourier Transform

Overview

Computes the Discrete Fourier Transform (DFT) and its inverse efficiently for arbitrary-length input.

$$X[k] = \sum_{n=0}^{N-1} x[n] \, e^{-2\pi i \, kn/N}, \qquad k = 0, \dots, N{-}1$$
  • Mixed-radix — radix-2, 3, 5 butterfly stages handle $N = 2^a \cdot 3^b \cdot 5^c$ in $O(N \log N)$
  • Bluestein — arbitrary lengths via the Chirp-Z transform
  • SIMD optimized — AVX2/SSE auto-dispatch for power-of-two sizes
  • Real FFT — exploits conjugate symmetry, returning only $N/2+1$ bins
  • 2D / 3D FFT — sequential 1D FFT along each axis
  • NTT — Number Theoretic Transform for ModularInt
  • Utilities — DCT, DST, Hilbert transform, Walsh-Hadamard, Hartley, Hankel
  • Batch FFT — multi-signal simultaneous processing (AVX2 interleaved), stereo FFT

All APIs reside in the calx namespace (batch functions in calx::simd_fft).

FFTDivMode

Enumeration specifying the normalization mode for FFTEngine.

enum class FFTDivMode {
    None,       // No normalization
    Forward,    // Normalize forward transform by 1/N
    Inverse,    // Normalize inverse transform by 1/N (default)
    Both        // Normalize both by 1/sqrt(N)
};
ValueForwardInverseUse case
None$1$$1$Performance-critical, manual normalization
Forward$1/N$$1$Spectrum analysis (amplitudes are physical quantities)
Inverse$1$$1/N$Mathematical convention (default)
Both$1/\sqrt{N}$$1/\sqrt{N}$Unitary (Parseval's theorem)

Note: FFT<T> and RealFFT<Real> always use Inverse convention (1/N on inverse). Only FFTEngine supports arbitrary normalization modes.

FFT<T, Real> Class

Static FFT API for complex and ModularInt types.

template<typename T, typename Real = double>
class FFT;

T may be std::complex<float>, std::complex<double>, or ModularInt<P>. Real specifies the floating-point type for twiddle factors.

is_valid_size / next_valid_size

static bool is_valid_size(int n);
static int  next_valid_size(int n);
FunctionDescription
is_valid_sizeReturns true if $n = 2^a \cdot 3^b \cdot 5^c$ (eligible for the mixed-radix path).
next_valid_sizeReturns the smallest valid size $\geq n$.

Note: Non-valid sizes can still be passed to fft() (falls back to Bluestein).

fft

static void fft(std::span<T> data);
ParameterTypeDescription
datastd::span<T>Input/output array (in-place transform)

Forward DFT. No normalization. Automatically dispatches based on size:

  • $2^k$ — SIMD-optimized radix-2 (float/double), scalar radix-2 (ModularInt)
  • $2^a \cdot 3^b \cdot 5^c$ — mixed-radix
  • Other — Bluestein (Chirp-Z)

ifft

static void ifft(std::span<T> data);

Inverse DFT. Normalized by $1/N$. For complex types, implemented as conjugate + FFT + conjugate + $1/N$.

convolve

static std::vector<T> convolve(std::span<const T> a, std::span<const T> b);
ParameterTypeDescription
astd::span<const T>First polynomial
bstd::span<const T>Second polynomial

Returns: Convolution result of size a.size() + b.size() - 1. Internally pads to the next power of two.

Usage: Type T must be a complex type (e.g. std::complex<double>); intended for polynomial multiplication or complex-coefficient convolution. For convolution of real waveforms (float / double), use RealFFT::convolve instead (FIR filtering, impulse-response application, etc.).

ntt / intt / convolve_ntt (deprecated)

Deprecated: These methods are deprecated. Use the Ntt class instead. Ntt provides 5-smooth sizes, AVX2 Montgomery optimization, and Bluestein arbitrary-length support.

template<int P>
[[deprecated]] static void ntt(std::span<ModularInt<P>> data, ModularInt<P> primitive_root);

template<int P>
[[deprecated]] static void intt(std::span<ModularInt<P>> data, ModularInt<P> primitive_root);

template<int P>
[[deprecated]] static std::vector<ModularInt<P>> convolve_ntt(
    std::span<const ModularInt<P>> a, std::span<const ModularInt<P>> b,
    ModularInt<P> primitive_root);

Ntt Class — Number Theoretic Transform

Number Theoretic Transform (NTT) over 64-bit integers. Internally uses AVX2 Montgomery NTT pipeline. Supports 5-smooth sizes ($N = 2^a \cdot 3^b \cdot 5^c$, $b,c \in \{0,1\}$) and arbitrary sizes via Bluestein algorithm.

#include <math/fft/ntt.hpp>
// Link: calx_ntt
class Ntt;

Prime Struct

struct Prime {
    uint64_t p;          // NTT-friendly prime
    uint64_t g;          // Primitive root
    size_t maxNttSize;   // Maximum NTT length = 2^47
};

Recommended Primes

FunctionPrime $p$RootNotes
prime1()$30 \times 2^{47} + 1$19$30 = 2 \times 3 \times 5$
prime2()$345 \times 2^{47} + 1$13$345 = 3 \times 5 \times 23$
prime3()$420 \times 2^{47} + 1$17$420 = 4 \times 3 \times 5 \times 7$

All three primes satisfy $2^{47} \mid (p - 1)$, supporting NTT of up to $2^{47}$ points. Coefficient range is $[0, p)$ where $p \approx 2^{62}$.

forward / inverse

static void forward(uint64_t* data, size_t N, const Prime& prime);
static void inverse(uint64_t* data, size_t N, const Prime& prime);
ParameterTypeDescription
datauint64_t*In-place input/output array
Nsize_tArray size (requires $N \mid (p - 1)$)
primeconst Prime&NTT prime (e.g. prime1())

5-smooth sizes are handled via mixed-radix NTT. Non-smooth sizes fall back to Bluestein (Chirp-Z) algorithm. inverse includes $1/N$ scaling.

Throws: std::invalid_argument if $N \nmid (p - 1)$.

pointwiseMul

static void pointwiseMul(uint64_t* r, const uint64_t* a, const uint64_t* b,
                          size_t N, const Prime& prime);

Element-wise modular multiplication: $r[i] = (a[i] \times b[i]) \bmod p$. Used for convolution in NTT domain.

polyMul

static void polyMul(uint64_t* r, const uint64_t* a, size_t an,
                     const uint64_t* b, size_t bn, const Prime& prime);
ParameterTypeDescription
ruint64_t*Output array (size $\geq an + bn$)
a, bconst uint64_t*Input polynomial coefficients
an, bnsize_tNumber of terms in each polynomial

Polynomial multiplication $\bmod p$. Uses schoolbook for small sizes ($an + bn \leq 16$), NTT otherwise. Montgomery transform is applied automatically.

Size Utilities

FunctionDescription
nextSmoothSize(n)Returns the smallest 5-smooth number $\geq n$
isSmooth(n)Tests whether $n$ is 5-smooth
isValidSize(N, prime)Tests whether $N \mid (p - 1)$

Example

#include <math/fft/ntt.hpp>
#include <vector>
#include <iostream>

int main() {
    using calx::Ntt;
    auto prime = Ntt::prime1();

    // Polynomial multiplication: (1 + 2x) × (3 + 4x) = 3 + 10x + 8x²
    uint64_t a[] = {1, 2};
    uint64_t b[] = {3, 4};
    uint64_t r[4] = {};
    Ntt::polyMul(r, a, 2, b, 2, prime);

    for (int i = 0; i < 3; ++i)
        std::cout << r[i] << " ";   // 3 10 8
    std::cout << std::endl;

    // Manual NTT: forward → pointwise multiply → inverse
    size_t N = 4;  // zero-padded power of 2
    std::vector<uint64_t> fa(N, 0), fb(N, 0);
    fa[0] = 1; fa[1] = 2;
    fb[0] = 3; fb[1] = 4;

    Ntt::forward(fa.data(), N, prime);
    Ntt::forward(fb.data(), N, prime);
    Ntt::pointwiseMul(fa.data(), fa.data(), fb.data(), N, prime);
    Ntt::inverse(fa.data(), N, prime);

    for (size_t i = 0; i < 3; ++i)
        std::cout << fa[i] << " ";   // 3 10 8
    std::cout << std::endl;
}

RealFFT<Real> Class

Static FFT API specialized for real-valued data. Exploits conjugate symmetry to return only $N/2+1$ bins.

template<typename Real>
class RealFFT;

fft

static std::vector<std::complex<Real>> fft(std::span<const Real> data);
ParameterTypeDescription
datastd::span<const Real>Real input of size $N$

Returns: Complex vector of size $N/2+1$.

ifft

static std::vector<Real> ifft(std::span<const std::complex<Real>> spectrum,
                               int output_size);
ParameterTypeDescription
spectrumstd::span<const std::complex<Real>>Frequency domain ($N/2+1$ bins)
output_sizeintOutput size $N$

Returns: Real vector of size $N$. Normalized by $1/N$.

Throws: MathError if spectrum.size() != output_size / 2 + 1.

convolve

static std::vector<Real> convolve(std::span<const Real> a, std::span<const Real> b);

Real-valued convolution using RealFFT::fft internally.

FFTEngine<Real> Class

Instance-based FFT wrapper that holds a normalization mode (FFTDivMode).

template<typename Real = double>
class FFTEngine;

Constructor

explicit FFTEngine(int nfft, FFTDivMode mode = FFTDivMode::Forward);
ParameterTypeDescription
nfftintFFT size (must be a power of two)
modeFFTDivModeNormalization mode (default: Forward)

Throws: std::invalid_argument if nfft is not a power of two.

Member Functions

FunctionSignatureDescription
sizeint size() constReturns the FFT size
transformvector<complex> transform(const vector<complex>&)Complex forward FFT
inversevector<complex> inverse(const vector<complex>&)Complex inverse FFT
real_transformvector<complex> real_transform(const vector<Real>&)Real forward FFT ($N \to N/2+1$)
real_inversevector<Real> real_inverse(const vector<complex>&)Real inverse FFT ($N/2+1 \to N$)

Throws std::invalid_argument if the input size does not match nfft.

2D FFT

Two-dimensional Fourier transform via sequential 1D FFTs along rows then columns. Matrix<complex>-based.

#include <math/fft/fft2d.hpp>

fft2d

template<typename Real = double>
Matrix<std::complex<Real>> fft2d(const Matrix<std::complex<Real>>& input);

2D FFT (row-wise then column-wise). Output has the same dimensions as input.

ifft2d

template<typename Real = double>
Matrix<std::complex<Real>> ifft2d(const Matrix<std::complex<Real>>& input);

2D inverse FFT. The $1/N$ normalization is applied automatically by the underlying FFT::ifft on each axis.

fft2d_real

template<typename Real = double>
Matrix<std::complex<Real>> fft2d_real(const Matrix<Real>& input);

Converts a real matrix to complex, then applies fft2d.

3D FFT

Three-dimensional Fourier transform via sequential 1D FFTs along $x \to y \to z$ axes.

#include <math/fft/fft3d.hpp>

Tensor3D<T>

template<typename T>
class Tensor3D;

3D tensor container. Row-major layout: data[z * ny * nx + y * nx + x].

MemberDescription
Tensor3D(nx, ny, nz)Construct with given dimensions
Tensor3D(nx, ny, nz, val)Construct with initial value
nx(), ny(), nz()Dimension along each axis
size()Total element count ($nx \times ny \times nz$)
operator()(x, y, z)Element access
data()Pointer to the internal buffer

fft3d / ifft3d

template<typename Real = double>
Tensor3D<std::complex<Real>> fft3d(const Tensor3D<std::complex<Real>>& input);

template<typename Real = double>
Tensor3D<std::complex<Real>> ifft3d(const Tensor3D<std::complex<Real>>& input);

fft3d_real

template<typename Real = double>
Tensor3D<std::complex<Real>> fft3d_real(const Tensor3D<Real>& input);

fftshift3d

template<typename T>
Tensor3D<T> fftshift3d(const Tensor3D<T>& input);

Shifts the DC component to the center of the tensor by half-shifting along each axis.

Utilities

#include <math/fft/fft_utils.hpp>

Spectrum Analysis

FunctionDescription
amplitude_spectrum(spectrum)Computes the amplitude spectrum $|X[k]|$
phase_spectrum(spectrum)Computes the phase spectrum $\arg(X[k])$ as the principal value $(-\pi, \pi]$ (wrapped)
unwrap_phase(phase, tol=π)1D phase unwrapping (equivalent to NumPy np.unwrap)
reconstruct_spectrum(amp, phase)Reconstructs a complex spectrum from amplitude and phase
template<typename Real>
std::vector<Real> amplitude_spectrum(std::span<const std::complex<Real>> spectrum);

template<typename Real>
std::vector<Real> phase_spectrum(std::span<const std::complex<Real>> spectrum);

template<typename Real>
std::vector<Real> unwrap_phase(std::span<const Real> phase,
                                Real tol = std::numbers::pi_v<Real>);

template<typename Real>
std::vector<std::complex<Real>> reconstruct_spectrum(
    std::span<const Real> amplitude, std::span<const Real> phase);

Phase unwrapping: phase_spectrum returns the principal value of std::arg(), so the phase response wraps at $\pm\pi$. Apply unwrap_phase when a continuous phase is required (group delay $-d\phi/d\omega$, minimum-phase estimation, complex cepstrum, etc.). tol is the threshold on adjacent-sample phase differences beyond which a $2\pi$ offset is applied (default: $\pi$).

auto spec  = RealFFT<double>::fft(signal);
auto phase = phase_spectrum<double>(spec);       // wrapped principal values
auto cont  = unwrap_phase<double>(phase);        // continuous phase
// group delay: approximate by differencing -dφ/dω

Polynomials

template<typename T>
std::vector<T> polynomial_multiply(std::span<const T> a, std::span<const T> b);

template<typename T, typename U>
auto evaluate_polynomial(std::span<const T> poly, const U& x);

polynomial_multiply delegates to FFT::convolve. evaluate_polynomial uses Horner's method.

DCT / IDCT

template<typename Real>
std::vector<Real> dct(std::span<const Real> data);    // DCT-II

template<typename Real>
std::vector<Real> idct(std::span<const Real> coeffs);  // DCT-III (IDCT)

template<typename Real>
std::vector<Real> fast_dct(std::span<const Real> data); // Alias for dct

Normalized DCT-II: $X[k] = c(k) \sum_{i=0}^{N-1} x[i] \cos\bigl(\pi(i+0.5)k/N\bigr)$, where $c(0) = \sqrt{1/N}$, $c(k) = \sqrt{2/N}$ for $k > 0$.

When $N$ is an FFT-compatible size ($2^a \cdot 3^b \cdot 5^c$), uses the Makhoul method at $O(N \log N)$. Otherwise falls back to direct $O(N^2)$ computation.

DST / IDST

template<typename Real>
std::vector<Real> dst(std::span<const Real> data);     // DST-II

template<typename Real>
std::vector<Real> idst(std::span<const Real> coeffs);   // DST-III (IDST)

Normalized DST-II. Currently uses direct $O(N^2)$ computation.

Hilbert Transform

template<typename Real>
std::vector<std::complex<Real>> analyticSignal(std::span<const Real> data);

template<typename Real>
std::vector<Real> hilbertTransform(std::span<const Real> data);

template<typename Real>
std::vector<Real> instantaneousAmplitude(std::span<const Real> data);

template<typename Real>
std::vector<Real> instantaneousFrequency(std::span<const Real> data,
                                          Real sampleRate = Real(1));
FunctionDescription
analyticSignalConstructs the analytic signal $z[n] = x[n] + j \cdot \mathcal{H}\{x\}[n]$
hilbertTransformHilbert transform (imaginary part of the analytic signal)
instantaneousAmplitudeInstantaneous amplitude $|z[n]|$ (envelope)
instantaneousFrequencyInstantaneous frequency $\frac{d}{dt}\arg(z[n]) \cdot f_s / 2\pi$

Walsh-Hadamard Transform

template<typename Real>
std::vector<Real> walshHadamard(std::span<const Real> data);

template<typename Real>
std::vector<Real> iwalshHadamard(std::span<const Real> coeffs);

template<typename Real>
void walshHadamardInplace(std::vector<Real>& data);

Fast Walsh-Hadamard Transform. Size must be a power of two. $O(N \log N)$ butterfly algorithm. Self-inverse with $1/N$ normalization.

Hartley Transform

template<typename Real>
std::vector<Real> hartley(std::span<const Real> data);

template<typename Real>
std::vector<Real> ihartley(std::span<const Real> coeffs);

$H[k] = \sum_{n=0}^{N-1} x[n] \cdot \mathrm{cas}(2\pi nk/N)$, where $\mathrm{cas}(\theta) = \cos\theta + \sin\theta$. Real input produces real output. Computed via FFT. Self-inverse with $1/N$ normalization.

Hankel Transform

template<typename Real>
struct HankelTransform {
    HankelTransform(int n, Real radius, Real order = Real(0));

    std::vector<Real> r;    // Sample points
    std::vector<Real> k;    // Frequency points

    std::vector<Real> transform(std::span<const Real> f) const;
    std::vector<Real> inverse(std::span<const Real> F) const;
};

Discrete Hankel Transform (GSL gsl_dht equivalent). Supports arbitrary order $\nu$. For $\nu = 0$, a fast path using McMahon asymptotics and $J_0$/$J_1$ approximations is used.

NTT Utilities

template<int P>
std::optional<ModularInt<P>> find_ntt_primitive_root(int n);

Searches for a primitive root suitable for NTT of size $n$ (power of two). Returns std::nullopt if none found.

Batch FFT

#include <math/fft/fft_batch.hpp>
namespace calx::simd_fft { ... }

Processes multiple same-sized signals simultaneously. Interleaves 4 signals for AVX2 butterfly operations.

batchFFT / batchIFFT

void batchFFT(std::complex<float>** signals, int count, int fftSize);
void batchIFFT(std::complex<float>** signals, int count, int fftSize);
ParameterTypeDescription
signalscomplex<float>**Array of signal pointers (each signal is updated in-place)
countintNumber of signals
fftSizeintFFT size (power of two)

Processes 4 signals at a time using AVX2 batch butterflies. Remaining signals are processed with standard simd_fft::fft(). Twiddle tables are cached in thread_local storage.

stereoFFT

void stereoFFT(
    const float* interleaved, int fftSize,
    const float* window,
    std::complex<float>* outL,
    std::complex<float>* outR);
ParameterTypeDescription
interleavedconst float*Stereo interleaved data {L0,R0,L1,R1,...} (fftSize*2 floats)
fftSizeintFFT size
windowconst float*Window function (fftSize floats), or nullptr for no window
outLcomplex<float>*L channel spectrum (fftSize/2+1 bins)
outRcomplex<float>*R channel spectrum (fftSize/2+1 bins)

Splits interleaved stereo audio into L and R channel spectra using a single complex FFT. Exploits Hermitian symmetry for separation.

Examples

Basic FFT

#include <math/fft/fft.hpp>
#include <iostream>
#include <complex>
#include <vector>

int main() {
    using C = std::complex<double>;

    // 8-point FFT
    std::vector<C> signal = {1, 2, 3, 4, 4, 3, 2, 1};
    calx::FFT<C>::fft(signal);

    std::cout << "FFT result:" << std::endl;
    for (auto& x : signal)
        std::cout << "  " << x << std::endl;

    // Inverse FFT to recover original
    calx::FFT<C>::ifft(signal);

    std::cout << "IFFT result:" << std::endl;
    for (auto& x : signal)
        std::cout << "  " << x.real() << std::endl;
}

Spectrum Analysis

#include <math/fft/fft.hpp>
#include <math/fft/fft_utils.hpp>
#include <vector>
#include <cmath>
#include <iostream>

int main() {
    constexpr int N = 1024;
    constexpr double fs = 44100.0;
    constexpr double f0 = 440.0;

    // Generate 440 Hz sine wave
    std::vector<double> signal(N);
    for (int i = 0; i < N; ++i)
        signal[i] = std::sin(2.0 * M_PI * f0 * i / fs);

    // Real FFT
    auto spectrum = calx::RealFFT<double>::fft(signal);

    // Amplitude spectrum
    auto amp = calx::amplitude_spectrum<double>(spectrum);

    // Find peak frequency
    int peak = 0;
    for (int k = 1; k < (int)amp.size(); ++k)
        if (amp[k] > amp[peak]) peak = k;

    double peakFreq = peak * fs / N;
    std::cout << "Peak frequency: " << peakFreq << " Hz" << std::endl;
}

Convolution (Polynomial Multiplication)

#include <math/fft/fft.hpp>
#include <iostream>

int main() {
    using C = std::complex<double>;

    // p(x) = 1 + 2x + 3x^2
    // q(x) = 4 + 5x
    std::vector<C> p = {1, 2, 3};
    std::vector<C> q = {4, 5};

    auto result = calx::FFT<C>::convolve(p, q);
    // result = {4, 13, 22, 15}  (4 + 13x + 22x^2 + 15x^3)

    for (auto& c : result)
        std::cout << c.real() << " ";
    std::cout << std::endl;
}

2D FFT

#include <math/fft/fft2d.hpp>
#include <math/core/matrix.hpp>
#include <iostream>

int main() {
    using C = std::complex<double>;

    // 4x4 matrix
    calx::Matrix<C> img(4, 4);
    for (int r = 0; r < 4; ++r)
        for (int c = 0; c < 4; ++c)
            img(r, c) = C(r + c, 0);

    // Forward 2D FFT
    auto freq = calx::fft2d(img);

    // Inverse 2D FFT
    auto recovered = calx::ifft2d(freq);

    std::cout << "Recovered (0,0): " << recovered(0, 0).real() << std::endl;
}

DCT (Discrete Cosine Transform)

#include <math/fft/fft_utils.hpp>
#include <vector>
#include <iostream>

int main() {
    std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};

    auto coeffs = calx::dct<double>(data);
    auto recovered = calx::idct<double>(coeffs);

    std::cout << "DCT coefficients:";
    for (auto v : coeffs) std::cout << " " << v;
    std::cout << std::endl;

    std::cout << "Recovered:";
    for (auto v : recovered) std::cout << " " << v;
    std::cout << std::endl;
}

Hilbert Transform (Envelope Detection)

#include <math/fft/fft_utils.hpp>
#include <vector>
#include <cmath>
#include <iostream>

int main() {
    constexpr int N = 256;

    // AM signal: (1 + 0.5*cos(2pi*5*t)) * cos(2pi*50*t)
    std::vector<double> signal(N);
    for (int i = 0; i < N; ++i) {
        double t = double(i) / N;
        signal[i] = (1.0 + 0.5 * std::cos(2 * M_PI * 5 * t))
                   * std::cos(2 * M_PI * 50 * t);
    }

    // Instantaneous amplitude (envelope)
    auto envelope = calx::instantaneousAmplitude<double>(signal);

    std::cout << "Envelope[0]: " << envelope[0] << std::endl;
}

Related Mathematical Background

The following articles explain the mathematical concepts underlying the FFT module.