Sampling Theorem (Shannon's Theorem)
The Fundamental Theorem Underpinning the Digitization of Analog Signals
1. Statement of the Theorem
The sampling theorem provides the theoretical foundation for representing and processing continuous-time signals in digital form. It was rigorously formulated in the context of information theory by C. E. Shannon (1949), and is therefore also known as Shannon's theorem. However, equivalent results had been independently obtained by E. T. Whittaker (1915), H. Nyquist (1928), and V. A. Kotelnikov (1933). In Japan, Isao Someya (染谷勲, 1949) independently derived the theorem at the same time as Shannon, systematizing it—including extensions of the sampling theorem—in his book Hakei Densou (波形傳送, "Waveform Transmission"). Due to Japan's academic isolation in the post-war period, with no communication with the West, this is recognized as a completely independent discovery. Internationally, the result is sometimes called the Whittaker–Kotelnikov–Shannon–Someya theorem (WKSS), while in Japan it is known as the Nyquist–Shannon theorem or the Shannon–Someya theorem.
Theorem (Sampling Theorem).
Let $f(t)$ be a continuous-time signal that is band-limited, meaning its Fourier transform $F(f)$ vanishes identically for $|f| > B$, where $B$ is the highest frequency [Hz] present in the signal. If the sampling frequency $f_s$ satisfies
$$f_s > 2B$$then $f(t)$ can be perfectly reconstructed from the sample values $\{f(nT)\}_{n \in \mathbb{Z}}$, where $T = 1/f_s$ is the sampling interval.
Caution: Critical Sampling at $f_s = 2B$
At equality $f_s = 2B$ (exactly the Nyquist rate), a signal component at frequency $B$ may not be correctly reconstructed. With sampling interval $T = 1/(2B)$, the samples of a general sinusoid $A\cos(2\pi Bt + \phi)$ at frequency $B$ are
$$A\cos\!\left(2\pi B \cdot \frac{n}{2B} + \phi\right) = A\cos(n\pi + \phi)$$Applying the addition formula:
$$= A\bigl[\cos(n\pi)\cos\phi - \underbrace{\sin(n\pi)}_{=\,0}\sin\phi\bigr] = A(-1)^n\cos\phi$$Since $\sin(n\pi) = 0$, regardless of the phase $\phi$, the sin component vanishes and only the cos component survives. For example, if $\phi = 0$ (pure cosine), the sample values are $A(-1)^n$, which correctly captures the signal; but if $\phi = \pi/2$ (pure sine), all sample values are zero. In other words, the odd-function component (sin component) at the Nyquist frequency is fundamentally unrecoverable.
Intuitively, the theorem states: "If you sample faster than twice the highest frequency present in the signal, no information is lost."
1.1 Nyquist Rate vs. Nyquist Frequency
Two important concepts related to the sampling theorem are distinguished as follows:
| Term | Definition | Meaning |
|---|---|---|
| Nyquist rate | $f_{\text{Nyquist}} = 2B$ | The minimum sampling frequency required to faithfully reconstruct the signal |
| Nyquist frequency | $f_N = f_s / 2$ | The highest frequency that can be represented at a given sampling frequency |
The sampling theorem condition $f_s > 2B$ is equivalent to stating that the Nyquist frequency $f_N = f_s/2$ exceeds the signal's highest frequency $B$.
2. Proof Outline via Fourier Analysis
The core of the proof is the fact that sampling in the time domain corresponds to periodic repetition of the spectrum in the frequency domain.
2.1 Properties of the Dirac Delta Function
The proof relies on the following properties of the Dirac delta function $\delta(t)$.
Property 1 (Sifting Property). For any continuous function $f(t)$,
$$f(t)\,\delta(t - t_0) = f(t_0)\,\delta(t - t_0)$$This holds because $\delta(t - t_0)$ is zero for $t \neq t_0$, so $f(t)$ can be replaced by its value at $t = t_0$. Integrating both sides, the delta function extracts only the value at $t = t_0$ from the integrand:
$$\int_{-\infty}^{\infty} f(t)\,\delta(t - t_0)\,dt = f(t_0)$$Property 2 (Fourier Transform of the Delta Function). Here we define the Fourier transform using ordinary frequency $f$ [Hz]: $\displaystyle F(f) = \int_{-\infty}^{\infty} g(t)\,e^{-j2\pi ft}\,dt$. Then
$$\mathcal{F}[\delta(t - t_0)] = \int_{-\infty}^{\infty} \delta(t - t_0)\,e^{-j2\pi ft}\,dt = e^{-j2\pi f t_0}$$In particular, when $t_0 = 0$, $\mathcal{F}[\delta(t)] = 1$ (uniform energy across all frequencies).
Property 3 (Fourier Transform of the Dirac Comb). The Fourier transform of a Dirac comb with period $T$ is a Dirac comb with period $f_s = 1/T$:
$$\mathcal{F}\!\left[\sum_{n=-\infty}^{\infty} \delta(t - nT)\right] = f_s \sum_{k=-\infty}^{\infty} \delta(f - k f_s)$$Derivation of Property 3
The Dirac comb $\delta_T(t) = \displaystyle\sum_{n=-\infty}^{\infty} \delta(t - nT)$ is periodic with period $T$, so it can be expanded as a Fourier series:
$$\delta_T(t) = \sum_{n=-\infty}^{\infty} C_n \, e^{j 2\pi n f_s t}$$To find the Fourier coefficients $C_n$: on the interval $-T/2 \leq t \leq T/2$, $\delta_T(t) = \delta(t)$, so
$$C_n = \frac{1}{T}\int_{-T/2}^{T/2} \delta_T(t)\, e^{-j2\pi n f_s t}\, dt = \frac{1}{T}\int_{-T/2}^{T/2} \delta(t)\, e^{-j2\pi n f_s t}\, dt$$By Property 1 (the sifting property), $\displaystyle\int \delta(t)\, g(t)\, dt = g(0)$, so
$$C_n = \frac{1}{T}\, e^{-j2\pi n f_s \cdot 0} = \frac{1}{T}$$That is, $C_n = 1/T$ for all $n$. Therefore
$$\delta_T(t) = \frac{1}{T}\sum_{n=-\infty}^{\infty} e^{j 2\pi n f_s t}$$The Fourier transform of each term is $\mathcal{F}[e^{j2\pi f_0 t}] = \delta(f - f_0)$, so
$$\mathcal{F}[\delta_T(t)] = \frac{1}{T}\sum_{n=-\infty}^{\infty} \delta(f - n f_s) = f_s \sum_{n=-\infty}^{\infty} \delta(f - n f_s) \qquad \square$$2.2 Mathematical Representation of Sampling
Sampling a continuous signal $f(t)$ at intervals $T = 1/f_s$ is equivalent to multiplying $f(t)$ by a Dirac comb. Applying Property 1 to each term:
$$f_s(t) = f(t) \cdot \sum_{n=-\infty}^{\infty} \delta(t - nT) = \sum_{n=-\infty}^{\infty} f(nT)\, \delta(t - nT)$$The second equality uses $f(t)\,\delta(t - nT) = f(nT)\,\delta(t - nT)$ (Property 1). That is, the sampled signal $f_s(t)$ consists of delta functions at each sample point $t = nT$, each with amplitude $f(nT)$.
2.3 Effect in the Frequency Domain
We compute the Fourier transform of $f_s(t)$. In the ordinary-frequency Fourier transform, multiplication in the time domain corresponds directly to convolution in the frequency domain ($\mathcal{F}[g \cdot h] = G * H$). Therefore
$$F_s(f) = F(f) * \left[f_s \sum_{k=-\infty}^{\infty} \delta(f - k f_s)\right]$$where Property 3 has been used. Since convolution with a delta function is a shift (the integral form of Property 1):
$$F_s(f) = f_s \sum_{k=-\infty}^{\infty} F(f - k f_s)$$That is, the original spectrum $F(f)$ is repeated at intervals of $f_s$.
2.4 Principle of Recovery
When $f_s > 2B$ holds, the repeated spectral copies do not overlap. Therefore, by applying an ideal low-pass filter with cutoff frequency $f_s/2$, the original spectrum $F(f)$ can be perfectly extracted. This is the essence of the sampling theorem.
3. Reconstruction Formula (Sinc Interpolation)
Writing the "application of an ideal low-pass filter" described above in the time domain yields the Whittaker–Shannon interpolation formula:
Reconstruction Formula.
$$f(t) = \sum_{n=-\infty}^{\infty} f(nT)\, \operatorname{sinc}\!\left(\frac{t - nT}{T}\right)$$where $T = 1/f_s$ is the sampling interval and $\operatorname{sinc}(x) = \dfrac{\sin(\pi x)}{\pi x}$ is the normalized sinc function.
This formula interpolates between sample points by superposing sinc functions weighted by each sample value $f(nT)$. The sinc function is the impulse response of the ideal low-pass filter, and satisfies $\operatorname{sinc}(0) = 1$ and $\operatorname{sinc}(n) = 0$ for any nonzero integer $n$. Therefore, at $t = mT$, only the $n = m$ term contributes $f(mT) \cdot 1$, while all other terms vanish, so the original values are exactly reproduced at the sample points.
3.1 Derivation of the Reconstruction Formula
Derivation
Step 1: Spectrum of the sampled signal.
As shown in Section 2, the spectrum of the sampled signal is
$$F_s(f) = f_s \sum_{k=-\infty}^{\infty} F(f - k f_s)$$When $f_s > 2B$, only the $k = 0$ term contributes in the range $|f| < f_s/2$, so
$$F_s(f) = f_s \, F(f) \qquad (|f| < f_s/2)$$Step 2: Recovery via ideal low-pass filter.
Apply an ideal low-pass filter $H(f)$ with cutoff frequency $f_s/2$ and gain $T = 1/f_s$. The gain $T$ is needed because Step 1 shows $F_s(f) = f_s F(f)$, so we must multiply by $1/f_s = T$ to recover the original spectrum $F(f)$:
$$H(f) = \begin{cases} T & (|f| \leq f_s/2) \\ 0 & (|f| > f_s/2) \end{cases}$$The recovered spectrum is
$$F(f) = H(f) \cdot F_s(f) = T \cdot f_s \, F(f) = F(f) \qquad (|f| < f_s/2)$$Since $F(f) = 0$ for $|f| \geq B$ and $f_s/2 > B$, this holds for all frequencies.
Step 3: Transform back to the time domain.
Taking the inverse Fourier transform of $H(f) \cdot F_s(f)$, multiplication becomes convolution in the time domain:
$$f(t) = h(t) * f_s(t)$$We compute the inverse Fourier transform of $H(f)$ (the impulse response of the ideal LPF):
\begin{align} h(t) &= \int_{-f_s/2}^{f_s/2} T \, e^{j2\pi ft}\,df \\ &= T \left[\frac{e^{j2\pi ft}}{j2\pi t}\right]_{-f_s/2}^{f_s/2} \\ &= \frac{T}{j2\pi t}\left(e^{j\pi f_s t} - e^{-j\pi f_s t}\right) \end{align}Applying Euler's formula $e^{j\theta} - e^{-j\theta} = 2j\sin\theta$,
\begin{align} h(t) &= \frac{T}{j2\pi t} \cdot 2j\sin(\pi f_s t) \\ &= T \cdot \frac{\sin(\pi f_s t)}{\pi t} \end{align}Substituting $f_s = 1/T$ and simplifying,
$$h(t) = \frac{\sin(\pi t/T)}{\pi t/T} = \operatorname{sinc}\!\left(\frac{t}{T}\right)$$Convolving with $f_s(t) = \sum_n f(nT)\,\delta(t - nT)$, using the definition of convolution and Property 1:
\begin{align} f(t) &= h(t) * f_s(t) = \int_{-\infty}^{\infty} h(t - \tau)\, f_s(\tau)\, d\tau \\ &= \int_{-\infty}^{\infty} h(t - \tau) \sum_{n=-\infty}^{\infty} f(nT)\,\delta(\tau - nT)\, d\tau \\ &= \sum_{n=-\infty}^{\infty} f(nT) \underbrace{\int_{-\infty}^{\infty} h(t - \tau)\,\delta(\tau - nT)\, d\tau}_{= h(t - nT)} \\ &= \sum_{n=-\infty}^{\infty} f(nT)\, \operatorname{sinc}\!\left(\frac{t - nT}{T}\right) \qquad \square \end{align}3.2 Convergence Conditions
For the series above to converge pointwise to $f(t)$, it suffices that $f$ be band-limited ($F(f) = 0$ for $|f| > B$) and $f \in L^2(\mathbb{R})$ ($L^2$ convergence is guaranteed by Plancherel's theorem). In practice, if $f$ is continuous and $\sum |f(nT)|^2 < \infty$, the series converges uniformly.
4. Aliasing
When the sampling theorem condition $f_s > 2B$ is not satisfied—that is, when the sampling frequency is insufficient—the spectral copies in the frequency domain overlap. This is called aliasing.
When aliasing occurs, frequency components above the Nyquist frequency $f_s/2$ appear as spurious low-frequency components below $f_s/2$. Specifically, a signal component at frequency $f$ folds back to $|f - k f_s|$ (where $k$ is an integer).
4.1 Intuitive Examples of Aliasing
Video example: The phenomenon where helicopter rotors or car wheels appear to rotate slowly in reverse in movies and on television is a classic example of temporal aliasing. Because the camera's frame rate (sampling frequency) is less than twice the rotor's rotation frequency, the high-speed rotation folds back below the Nyquist frequency and is perceived as slow reverse rotation.
Fluorescent light and fan example: Fluorescent lights flicker 100 times per second (East Japan, 50 Hz mains) or 120 times per second (West Japan, 60 Hz mains). This flickering acts as a kind of strobe (temporal sampling), so when a fan's blade frequency is close to the flicker frequency, the blades appear to rotate slowly or in reverse. For example, if a 3-blade fan rotates at 27 revolutions per second (1620 rpm), the blade pattern frequency is $27 \times 3 = 81$ Hz. Under East Japan fluorescent lighting (100 Hz sampling), this appears as $100 - 81 = 19$ Hz slow forward rotation. If the speed increases to 35 rps ($35 \times 3 = 105$ Hz), it appears as $105 - 100 = 5$ Hz reverse rotation. With LED lighting, which either does not flicker or flickers at very high frequency, this phenomenon is much less likely to occur.
5. Practical Considerations
5.1 Anti-Aliasing Filter
Real-world signals are often not strictly band-limited. Therefore, an anti-aliasing filter (an analog low-pass filter) is inserted before sampling to remove components above the Nyquist frequency. This ensures that the input signal to the ADC (analog-to-digital converter) satisfies the band-limiting condition.
Ideally, a rectangular (brick-wall) filter characteristic is desired, but this is physically unrealizable. In practice, approximate low-pass filters such as Butterworth or Chebyshev filters are used. Since a transition band (the frequency range between the passband edge and the Nyquist frequency) is needed, $f_s$ is set somewhat higher than $2B$.
5.2 Oversampling
Oversampling means using a sampling frequency significantly higher than the Nyquist rate. Oversampling offers the following advantages:
- Anti-aliasing filter design becomes easier (the transition band can be made wider)
- Quantization noise is spread over a wider bandwidth, improving the in-band SNR
- Combined with noise shaping ($\Sigma\Delta$ modulation), high effective resolution can be achieved
For example, audio $\Sigma\Delta$ ADCs employ 64× or 128× oversampling and achieve 24-bit effective resolution with a 1-bit quantizer.
5.3 Practical Reconstruction
The theoretical reconstruction formula (sinc interpolation) requires an infinitely long sinc function, making it impractical to use directly. At the output of a DAC (digital-to-analog converter), a zero-order hold (staircase output) followed by a reconstruction filter (analog low-pass filter) is the standard approach. In the digital domain, finite-length approximate interpolations such as polynomial interpolation and spline interpolation are widely used.
6. Applications
6.1 Digital Audio (CD: 44.1 kHz)
The human audible range is approximately 20 Hz – 20 kHz. By the sampling theorem, a minimum of 40 kHz is required to correctly sample signals up to 20 kHz. The CD sampling rate of 44.1 kHz adds approximately 10% headroom above this theoretical minimum to accommodate the transition band of the anti-aliasing filter. Historically, compatibility with NTSC (525 lines × 3 samples/line × 30 fps − sync overhead = 44,100) and PAL was also a factor in the choice.
Modern high-resolution audio uses sampling rates of 96 kHz or 192 kHz. The primary motivation is not to preserve ultrasonic components beyond the audible range, but rather to ease anti-aliasing filter design and improve temporal resolution.
6.2 Digital Imaging
The sampling theorem also applies to image sampling (pixelization). The optical MTF (modulation transfer function) of the lens and the pixel pitch of the Bayer array determine the spatial sampling frequency; spatial frequency components exceeding it appear as moiré (spatial aliasing). In digital cameras, an optical low-pass filter (OLPF) serves as the anti-aliasing mechanism.
6.3 ADC (Analog-to-Digital Converter)
In ADC design, the sampling theorem governs the relationship between input bandwidth and sampling clock. Successive approximation register (SAR) ADCs typically operate near the Nyquist rate, while $\Sigma\Delta$ ADCs combine substantial oversampling with decimation to achieve high resolution. In communications, bandpass sampling determines the sampling rate based solely on the signal bandwidth, enabling direct digitization of RF signals.
7. Frequently Asked Questions (FAQ)
Q1. What is the Sampling Theorem (Shannon's theorem)?
The sampling theorem states that a band-limited signal (one containing frequency components only up to $B$ Hz) can be perfectly reconstructed from its samples if the sampling frequency satisfies $f_s > 2B$. Reconstruction uses the sinc interpolation formula. This theorem is fundamental to all analog-to-digital conversion in digital audio, image processing, communications engineering, and beyond.
Q2. What is aliasing?
Aliasing is a phenomenon that occurs when sampling under the condition $f_s < 2B$. Spectral copies overlap in the frequency domain, and components above the Nyquist frequency fold back into lower frequencies. Once aliasing has occurred, it cannot be removed by post-processing. Prevention methods include applying an anti-aliasing filter (low-pass filter) before sampling, or using a sufficiently high sampling frequency.
Q3. Why is the CD sampling rate 44.1 kHz?
The theoretical minimum for the 20 kHz upper limit of human hearing is 40 kHz; 44.1 kHz adds headroom for the anti-aliasing filter's transition band. See Section 6.1 for details.
References
- C. E. Shannon, "Communication in the Presence of Noise," Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, 1949.
- I. Someya (染谷勲), Hakei Densou (波形傳送, "Waveform Transmission"), Shukyosha, Tokyo, 1949. (Koushuu Kagaku Ronsou, vol. 3) — National Diet Library catalog
- H. Ogawa, "A memorial tribute to Isao Someya, 1915–2007," Sampling Theory in Signal and Image Processing, vol. 7, no. 3, pp. 227–228, 2008.
- A. V. Oppenheim, R. W. Schafer, Discrete-Time Signal Processing, 3rd ed., Pearson, 2010. (Ch. 4, "Sampling of Continuous-Time Signals")
- Wikipedia: Nyquist–Shannon sampling theorem
- Wikipedia: Aliasing