FFT — Fast Fourier Transform
Overview
Computes the Discrete Fourier Transform (DFT) and its inverse efficiently for arbitrary-length input.
$$X[k] = \sum_{n=0}^{N-1} x[n] \, e^{-2\pi i \, kn/N}, \qquad k = 0, \dots, N{-}1$$- Mixed-radix — radix-2, 3, 5 butterfly stages handle $N = 2^a \cdot 3^b \cdot 5^c$ in $O(N \log N)$
- Bluestein — arbitrary lengths via the Chirp-Z transform
- SIMD optimized — AVX2/SSE auto-dispatch for power-of-two sizes
- Real FFT — exploits conjugate symmetry, returning only $N/2+1$ bins
- 2D / 3D FFT — sequential 1D FFT along each axis
- NTT — Number Theoretic Transform for ModularInt
- Utilities — DCT, DST, Hilbert transform, Walsh-Hadamard, Hartley, Hankel
- Batch FFT — multi-signal simultaneous processing (AVX2 interleaved), stereo FFT
All APIs reside in the calx namespace (batch functions in calx::simd_fft).
Header
#include <math/fft/fft.hpp> // FFT, RealFFT, FFTEngine, FFTDivMode
#include <math/fft/ntt.hpp> // Ntt (Number Theoretic Transform)
#include <math/fft/fft2d.hpp> // fft2d, ifft2d, fft2d_real
#include <math/fft/fft3d.hpp> // Tensor3D, fft3d, ifft3d, fft3d_real, fftshift3d
#include <math/fft/fft_utils.hpp> // DCT, DST, Hilbert, Walsh-Hadamard, Hartley, Hankel
#include <math/fft/fft_batch.hpp> // batchFFT, batchIFFT, stereoFFT
Linking
| Library | Required for |
|---|---|
calx_fft | Using FFT / RealFFT / FFTEngine with float / double |
calx_ntt | Using the Ntt class |
// CMake
target_link_libraries(your_target PRIVATE calx_fft calx_ntt)
FFTDivMode
Enumeration specifying the normalization mode for FFTEngine.
enum class FFTDivMode {
None, // No normalization
Forward, // Normalize forward transform by 1/N
Inverse, // Normalize inverse transform by 1/N (default)
Both // Normalize both by 1/sqrt(N)
};
| Value | Forward | Inverse | Use case |
|---|---|---|---|
None | $1$ | $1$ | Performance-critical, manual normalization |
Forward | $1/N$ | $1$ | Spectrum analysis (amplitudes are physical quantities) |
Inverse | $1$ | $1/N$ | Mathematical convention (default) |
Both | $1/\sqrt{N}$ | $1/\sqrt{N}$ | Unitary (Parseval's theorem) |
Note: FFT<T> and RealFFT<Real> always use Inverse convention (1/N on inverse).
Only FFTEngine supports arbitrary normalization modes.
FFT<T, Real> Class
Static FFT API for complex and ModularInt types.
template<typename T, typename Real = double>
class FFT;
T may be std::complex<float>, std::complex<double>, or ModularInt<P>.
Real specifies the floating-point type for twiddle factors.
is_valid_size / next_valid_size
static bool is_valid_size(int n);
static int next_valid_size(int n);
| Function | Description |
|---|---|
is_valid_size | Returns true if $n = 2^a \cdot 3^b \cdot 5^c$ (eligible for the mixed-radix path). |
next_valid_size | Returns the smallest valid size $\geq n$. |
Note: Non-valid sizes can still be passed to fft() (falls back to Bluestein).
fft
static void fft(std::span<T> data);
| Parameter | Type | Description |
|---|---|---|
data | std::span<T> | Input/output array (in-place transform) |
Forward DFT. No normalization. Automatically dispatches based on size:
- $2^k$ — SIMD-optimized radix-2 (float/double), scalar radix-2 (ModularInt)
- $2^a \cdot 3^b \cdot 5^c$ — mixed-radix
- Other — Bluestein (Chirp-Z)
ifft
static void ifft(std::span<T> data);
Inverse DFT. Normalized by $1/N$. For complex types, implemented as conjugate + FFT + conjugate + $1/N$.
convolve
static std::vector<T> convolve(std::span<const T> a, std::span<const T> b);
| Parameter | Type | Description |
|---|---|---|
a | std::span<const T> | First polynomial |
b | std::span<const T> | Second polynomial |
Returns: Convolution result of size a.size() + b.size() - 1. Internally pads to the next power of two.
Usage: Type T must be a complex type (e.g. std::complex<double>);
intended for polynomial multiplication or complex-coefficient convolution.
For convolution of real waveforms (float / double),
use RealFFT::convolve instead
(FIR filtering, impulse-response application, etc.).
ntt / intt / convolve_ntt (deprecated)
Deprecated: These methods are deprecated. Use the Ntt class instead.
Ntt provides 5-smooth sizes, AVX2 Montgomery optimization, and Bluestein arbitrary-length support.
template<int P>
[[deprecated]] static void ntt(std::span<ModularInt<P>> data, ModularInt<P> primitive_root);
template<int P>
[[deprecated]] static void intt(std::span<ModularInt<P>> data, ModularInt<P> primitive_root);
template<int P>
[[deprecated]] static std::vector<ModularInt<P>> convolve_ntt(
std::span<const ModularInt<P>> a, std::span<const ModularInt<P>> b,
ModularInt<P> primitive_root);
Ntt Class — Number Theoretic Transform
Number Theoretic Transform (NTT) over 64-bit integers. Internally uses AVX2 Montgomery NTT pipeline. Supports 5-smooth sizes ($N = 2^a \cdot 3^b \cdot 5^c$, $b,c \in \{0,1\}$) and arbitrary sizes via Bluestein algorithm.
#include <math/fft/ntt.hpp>
// Link: calx_ntt
class Ntt;
Prime Struct
struct Prime {
uint64_t p; // NTT-friendly prime
uint64_t g; // Primitive root
size_t maxNttSize; // Maximum NTT length = 2^47
};
Recommended Primes
| Function | Prime $p$ | Root | Notes |
|---|---|---|---|
prime1() | $30 \times 2^{47} + 1$ | 19 | $30 = 2 \times 3 \times 5$ |
prime2() | $345 \times 2^{47} + 1$ | 13 | $345 = 3 \times 5 \times 23$ |
prime3() | $420 \times 2^{47} + 1$ | 17 | $420 = 4 \times 3 \times 5 \times 7$ |
All three primes satisfy $2^{47} \mid (p - 1)$, supporting NTT of up to $2^{47}$ points. Coefficient range is $[0, p)$ where $p \approx 2^{62}$.
forward / inverse
static void forward(uint64_t* data, size_t N, const Prime& prime);
static void inverse(uint64_t* data, size_t N, const Prime& prime);
| Parameter | Type | Description |
|---|---|---|
data | uint64_t* | In-place input/output array |
N | size_t | Array size (requires $N \mid (p - 1)$) |
prime | const Prime& | NTT prime (e.g. prime1()) |
5-smooth sizes are handled via mixed-radix NTT. Non-smooth sizes fall back to Bluestein (Chirp-Z) algorithm.
inverse includes $1/N$ scaling.
Throws: std::invalid_argument if $N \nmid (p - 1)$.
pointwiseMul
static void pointwiseMul(uint64_t* r, const uint64_t* a, const uint64_t* b,
size_t N, const Prime& prime);
Element-wise modular multiplication: $r[i] = (a[i] \times b[i]) \bmod p$. Used for convolution in NTT domain.
polyMul
static void polyMul(uint64_t* r, const uint64_t* a, size_t an,
const uint64_t* b, size_t bn, const Prime& prime);
| Parameter | Type | Description |
|---|---|---|
r | uint64_t* | Output array (size $\geq an + bn$) |
a, b | const uint64_t* | Input polynomial coefficients |
an, bn | size_t | Number of terms in each polynomial |
Polynomial multiplication $\bmod p$. Uses schoolbook for small sizes ($an + bn \leq 16$), NTT otherwise. Montgomery transform is applied automatically.
Size Utilities
| Function | Description |
|---|---|
nextSmoothSize(n) | Returns the smallest 5-smooth number $\geq n$ |
isSmooth(n) | Tests whether $n$ is 5-smooth |
isValidSize(N, prime) | Tests whether $N \mid (p - 1)$ |
Example
#include <math/fft/ntt.hpp>
#include <vector>
#include <iostream>
int main() {
using calx::Ntt;
auto prime = Ntt::prime1();
// Polynomial multiplication: (1 + 2x) × (3 + 4x) = 3 + 10x + 8x²
uint64_t a[] = {1, 2};
uint64_t b[] = {3, 4};
uint64_t r[4] = {};
Ntt::polyMul(r, a, 2, b, 2, prime);
for (int i = 0; i < 3; ++i)
std::cout << r[i] << " "; // 3 10 8
std::cout << std::endl;
// Manual NTT: forward → pointwise multiply → inverse
size_t N = 4; // zero-padded power of 2
std::vector<uint64_t> fa(N, 0), fb(N, 0);
fa[0] = 1; fa[1] = 2;
fb[0] = 3; fb[1] = 4;
Ntt::forward(fa.data(), N, prime);
Ntt::forward(fb.data(), N, prime);
Ntt::pointwiseMul(fa.data(), fa.data(), fb.data(), N, prime);
Ntt::inverse(fa.data(), N, prime);
for (size_t i = 0; i < 3; ++i)
std::cout << fa[i] << " "; // 3 10 8
std::cout << std::endl;
}
RealFFT<Real> Class
Static FFT API specialized for real-valued data. Exploits conjugate symmetry to return only $N/2+1$ bins.
template<typename Real>
class RealFFT;
fft
static std::vector<std::complex<Real>> fft(std::span<const Real> data);
| Parameter | Type | Description |
|---|---|---|
data | std::span<const Real> | Real input of size $N$ |
Returns: Complex vector of size $N/2+1$.
ifft
static std::vector<Real> ifft(std::span<const std::complex<Real>> spectrum,
int output_size);
| Parameter | Type | Description |
|---|---|---|
spectrum | std::span<const std::complex<Real>> | Frequency domain ($N/2+1$ bins) |
output_size | int | Output size $N$ |
Returns: Real vector of size $N$. Normalized by $1/N$.
Throws: MathError if spectrum.size() != output_size / 2 + 1.
convolve
static std::vector<Real> convolve(std::span<const Real> a, std::span<const Real> b);
Real-valued convolution using RealFFT::fft internally.
FFTEngine<Real> Class
Instance-based FFT wrapper that holds a normalization mode (FFTDivMode).
template<typename Real = double>
class FFTEngine;
Constructor
explicit FFTEngine(int nfft, FFTDivMode mode = FFTDivMode::Forward);
| Parameter | Type | Description |
|---|---|---|
nfft | int | FFT size (must be a power of two) |
mode | FFTDivMode | Normalization mode (default: Forward) |
Throws: std::invalid_argument if nfft is not a power of two.
Member Functions
| Function | Signature | Description |
|---|---|---|
size | int size() const | Returns the FFT size |
transform | vector<complex> transform(const vector<complex>&) | Complex forward FFT |
inverse | vector<complex> inverse(const vector<complex>&) | Complex inverse FFT |
real_transform | vector<complex> real_transform(const vector<Real>&) | Real forward FFT ($N \to N/2+1$) |
real_inverse | vector<Real> real_inverse(const vector<complex>&) | Real inverse FFT ($N/2+1 \to N$) |
Throws std::invalid_argument if the input size does not match nfft.
2D FFT
Two-dimensional Fourier transform via sequential 1D FFTs along rows then columns. Matrix<complex>-based.
#include <math/fft/fft2d.hpp>
fft2d
template<typename Real = double>
Matrix<std::complex<Real>> fft2d(const Matrix<std::complex<Real>>& input);
2D FFT (row-wise then column-wise). Output has the same dimensions as input.
ifft2d
template<typename Real = double>
Matrix<std::complex<Real>> ifft2d(const Matrix<std::complex<Real>>& input);
2D inverse FFT. The $1/N$ normalization is applied automatically by the underlying FFT::ifft on each axis.
fft2d_real
template<typename Real = double>
Matrix<std::complex<Real>> fft2d_real(const Matrix<Real>& input);
Converts a real matrix to complex, then applies fft2d.
3D FFT
Three-dimensional Fourier transform via sequential 1D FFTs along $x \to y \to z$ axes.
#include <math/fft/fft3d.hpp>
Tensor3D<T>
template<typename T>
class Tensor3D;
3D tensor container. Row-major layout: data[z * ny * nx + y * nx + x].
| Member | Description |
|---|---|
Tensor3D(nx, ny, nz) | Construct with given dimensions |
Tensor3D(nx, ny, nz, val) | Construct with initial value |
nx(), ny(), nz() | Dimension along each axis |
size() | Total element count ($nx \times ny \times nz$) |
operator()(x, y, z) | Element access |
data() | Pointer to the internal buffer |
fft3d / ifft3d
template<typename Real = double>
Tensor3D<std::complex<Real>> fft3d(const Tensor3D<std::complex<Real>>& input);
template<typename Real = double>
Tensor3D<std::complex<Real>> ifft3d(const Tensor3D<std::complex<Real>>& input);
fft3d_real
template<typename Real = double>
Tensor3D<std::complex<Real>> fft3d_real(const Tensor3D<Real>& input);
fftshift3d
template<typename T>
Tensor3D<T> fftshift3d(const Tensor3D<T>& input);
Shifts the DC component to the center of the tensor by half-shifting along each axis.
Utilities
#include <math/fft/fft_utils.hpp>
Spectrum Analysis
| Function | Description |
|---|---|
amplitude_spectrum(spectrum) | Computes the amplitude spectrum $|X[k]|$ |
phase_spectrum(spectrum) | Computes the phase spectrum $\arg(X[k])$ as the principal value $(-\pi, \pi]$ (wrapped) |
unwrap_phase(phase, tol=π) | 1D phase unwrapping (equivalent to NumPy np.unwrap) |
reconstruct_spectrum(amp, phase) | Reconstructs a complex spectrum from amplitude and phase |
template<typename Real>
std::vector<Real> amplitude_spectrum(std::span<const std::complex<Real>> spectrum);
template<typename Real>
std::vector<Real> phase_spectrum(std::span<const std::complex<Real>> spectrum);
template<typename Real>
std::vector<Real> unwrap_phase(std::span<const Real> phase,
Real tol = std::numbers::pi_v<Real>);
template<typename Real>
std::vector<std::complex<Real>> reconstruct_spectrum(
std::span<const Real> amplitude, std::span<const Real> phase);
Phase unwrapping: phase_spectrum returns the principal value of std::arg(),
so the phase response wraps at $\pm\pi$. Apply unwrap_phase when a continuous phase is required
(group delay $-d\phi/d\omega$, minimum-phase estimation, complex cepstrum, etc.).
tol is the threshold on adjacent-sample phase differences beyond which a $2\pi$ offset is applied (default: $\pi$).
auto spec = RealFFT<double>::fft(signal);
auto phase = phase_spectrum<double>(spec); // wrapped principal values
auto cont = unwrap_phase<double>(phase); // continuous phase
// group delay: approximate by differencing -dφ/dω
Polynomials
template<typename T>
std::vector<T> polynomial_multiply(std::span<const T> a, std::span<const T> b);
template<typename T, typename U>
auto evaluate_polynomial(std::span<const T> poly, const U& x);
polynomial_multiply delegates to FFT::convolve. evaluate_polynomial uses Horner's method.
DCT / IDCT
template<typename Real>
std::vector<Real> dct(std::span<const Real> data); // DCT-II
template<typename Real>
std::vector<Real> idct(std::span<const Real> coeffs); // DCT-III (IDCT)
template<typename Real>
std::vector<Real> fast_dct(std::span<const Real> data); // Alias for dct
Normalized DCT-II: $X[k] = c(k) \sum_{i=0}^{N-1} x[i] \cos\bigl(\pi(i+0.5)k/N\bigr)$, where $c(0) = \sqrt{1/N}$, $c(k) = \sqrt{2/N}$ for $k > 0$.
When $N$ is an FFT-compatible size ($2^a \cdot 3^b \cdot 5^c$), uses the Makhoul method at $O(N \log N)$. Otherwise falls back to direct $O(N^2)$ computation.
DST / IDST
template<typename Real>
std::vector<Real> dst(std::span<const Real> data); // DST-II
template<typename Real>
std::vector<Real> idst(std::span<const Real> coeffs); // DST-III (IDST)
Normalized DST-II. Currently uses direct $O(N^2)$ computation.
Hilbert Transform
template<typename Real>
std::vector<std::complex<Real>> analyticSignal(std::span<const Real> data);
template<typename Real>
std::vector<Real> hilbertTransform(std::span<const Real> data);
template<typename Real>
std::vector<Real> instantaneousAmplitude(std::span<const Real> data);
template<typename Real>
std::vector<Real> instantaneousFrequency(std::span<const Real> data,
Real sampleRate = Real(1));
| Function | Description |
|---|---|
analyticSignal | Constructs the analytic signal $z[n] = x[n] + j \cdot \mathcal{H}\{x\}[n]$ |
hilbertTransform | Hilbert transform (imaginary part of the analytic signal) |
instantaneousAmplitude | Instantaneous amplitude $|z[n]|$ (envelope) |
instantaneousFrequency | Instantaneous frequency $\frac{d}{dt}\arg(z[n]) \cdot f_s / 2\pi$ |
Walsh-Hadamard Transform
template<typename Real>
std::vector<Real> walshHadamard(std::span<const Real> data);
template<typename Real>
std::vector<Real> iwalshHadamard(std::span<const Real> coeffs);
template<typename Real>
void walshHadamardInplace(std::vector<Real>& data);
Fast Walsh-Hadamard Transform. Size must be a power of two. $O(N \log N)$ butterfly algorithm. Self-inverse with $1/N$ normalization.
Hartley Transform
template<typename Real>
std::vector<Real> hartley(std::span<const Real> data);
template<typename Real>
std::vector<Real> ihartley(std::span<const Real> coeffs);
$H[k] = \sum_{n=0}^{N-1} x[n] \cdot \mathrm{cas}(2\pi nk/N)$, where $\mathrm{cas}(\theta) = \cos\theta + \sin\theta$. Real input produces real output. Computed via FFT. Self-inverse with $1/N$ normalization.
Hankel Transform
template<typename Real>
struct HankelTransform {
HankelTransform(int n, Real radius, Real order = Real(0));
std::vector<Real> r; // Sample points
std::vector<Real> k; // Frequency points
std::vector<Real> transform(std::span<const Real> f) const;
std::vector<Real> inverse(std::span<const Real> F) const;
};
Discrete Hankel Transform (GSL gsl_dht equivalent). Supports arbitrary order $\nu$. For $\nu = 0$, a fast path using McMahon asymptotics and $J_0$/$J_1$ approximations is used.
NTT Utilities
template<int P>
std::optional<ModularInt<P>> find_ntt_primitive_root(int n);
Searches for a primitive root suitable for NTT of size $n$ (power of two). Returns std::nullopt if none found.
Batch FFT
#include <math/fft/fft_batch.hpp>
namespace calx::simd_fft { ... }
Processes multiple same-sized signals simultaneously. Interleaves 4 signals for AVX2 butterfly operations.
batchFFT / batchIFFT
void batchFFT(std::complex<float>** signals, int count, int fftSize);
void batchIFFT(std::complex<float>** signals, int count, int fftSize);
| Parameter | Type | Description |
|---|---|---|
signals | complex<float>** | Array of signal pointers (each signal is updated in-place) |
count | int | Number of signals |
fftSize | int | FFT size (power of two) |
Processes 4 signals at a time using AVX2 batch butterflies. Remaining signals are processed with standard simd_fft::fft(). Twiddle tables are cached in thread_local storage.
stereoFFT
void stereoFFT(
const float* interleaved, int fftSize,
const float* window,
std::complex<float>* outL,
std::complex<float>* outR);
| Parameter | Type | Description |
|---|---|---|
interleaved | const float* | Stereo interleaved data {L0,R0,L1,R1,...} (fftSize*2 floats) |
fftSize | int | FFT size |
window | const float* | Window function (fftSize floats), or nullptr for no window |
outL | complex<float>* | L channel spectrum (fftSize/2+1 bins) |
outR | complex<float>* | R channel spectrum (fftSize/2+1 bins) |
Splits interleaved stereo audio into L and R channel spectra using a single complex FFT. Exploits Hermitian symmetry for separation.
Examples
Basic FFT
#include <math/fft/fft.hpp>
#include <iostream>
#include <complex>
#include <vector>
int main() {
using C = std::complex<double>;
// 8-point FFT
std::vector<C> signal = {1, 2, 3, 4, 4, 3, 2, 1};
calx::FFT<C>::fft(signal);
std::cout << "FFT result:" << std::endl;
for (auto& x : signal)
std::cout << " " << x << std::endl;
// Inverse FFT to recover original
calx::FFT<C>::ifft(signal);
std::cout << "IFFT result:" << std::endl;
for (auto& x : signal)
std::cout << " " << x.real() << std::endl;
}
Spectrum Analysis
#include <math/fft/fft.hpp>
#include <math/fft/fft_utils.hpp>
#include <vector>
#include <cmath>
#include <iostream>
int main() {
constexpr int N = 1024;
constexpr double fs = 44100.0;
constexpr double f0 = 440.0;
// Generate 440 Hz sine wave
std::vector<double> signal(N);
for (int i = 0; i < N; ++i)
signal[i] = std::sin(2.0 * M_PI * f0 * i / fs);
// Real FFT
auto spectrum = calx::RealFFT<double>::fft(signal);
// Amplitude spectrum
auto amp = calx::amplitude_spectrum<double>(spectrum);
// Find peak frequency
int peak = 0;
for (int k = 1; k < (int)amp.size(); ++k)
if (amp[k] > amp[peak]) peak = k;
double peakFreq = peak * fs / N;
std::cout << "Peak frequency: " << peakFreq << " Hz" << std::endl;
}
Convolution (Polynomial Multiplication)
#include <math/fft/fft.hpp>
#include <iostream>
int main() {
using C = std::complex<double>;
// p(x) = 1 + 2x + 3x^2
// q(x) = 4 + 5x
std::vector<C> p = {1, 2, 3};
std::vector<C> q = {4, 5};
auto result = calx::FFT<C>::convolve(p, q);
// result = {4, 13, 22, 15} (4 + 13x + 22x^2 + 15x^3)
for (auto& c : result)
std::cout << c.real() << " ";
std::cout << std::endl;
}
2D FFT
#include <math/fft/fft2d.hpp>
#include <math/core/matrix.hpp>
#include <iostream>
int main() {
using C = std::complex<double>;
// 4x4 matrix
calx::Matrix<C> img(4, 4);
for (int r = 0; r < 4; ++r)
for (int c = 0; c < 4; ++c)
img(r, c) = C(r + c, 0);
// Forward 2D FFT
auto freq = calx::fft2d(img);
// Inverse 2D FFT
auto recovered = calx::ifft2d(freq);
std::cout << "Recovered (0,0): " << recovered(0, 0).real() << std::endl;
}
DCT (Discrete Cosine Transform)
#include <math/fft/fft_utils.hpp>
#include <vector>
#include <iostream>
int main() {
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};
auto coeffs = calx::dct<double>(data);
auto recovered = calx::idct<double>(coeffs);
std::cout << "DCT coefficients:";
for (auto v : coeffs) std::cout << " " << v;
std::cout << std::endl;
std::cout << "Recovered:";
for (auto v : recovered) std::cout << " " << v;
std::cout << std::endl;
}
Hilbert Transform (Envelope Detection)
#include <math/fft/fft_utils.hpp>
#include <vector>
#include <cmath>
#include <iostream>
int main() {
constexpr int N = 256;
// AM signal: (1 + 0.5*cos(2pi*5*t)) * cos(2pi*50*t)
std::vector<double> signal(N);
for (int i = 0; i < N; ++i) {
double t = double(i) / N;
signal[i] = (1.0 + 0.5 * std::cos(2 * M_PI * 5 * t))
* std::cos(2 * M_PI * 50 * t);
}
// Instantaneous amplitude (envelope)
auto envelope = calx::instantaneousAmplitude<double>(signal);
std::cout << "Envelope[0]: " << envelope[0] << std::endl;
}
Related Mathematical Background
The following articles explain the mathematical concepts underlying the FFT module.
- Fast Fourier Transform — Cooley-Tukey algorithm, complexity analysis
- Fast Multiplication Algorithms — NTT, Schönhage-Strassen