Overflow and Underflow

Goal

Understand the definitions of overflow and underflow, learn how IEEE 754 handles them and the role of denormalized numbers, and master typical occurrence scenarios and avoidance techniques.

Prerequisites

Basic structure of IEEE 754 floating-point numbers

Table of Contents

1. Overflow

Overflow occurs when the absolute value of an arithmetic result exceeds the maximum representable floating-point number.

$$|x| > x_{\max} \quad \Rightarrow \quad \text{Overflow}$$
PrecisionMaximum $x_{\max}$Max exponent
Single (float32)$\approx 3.4 \times 10^{38}$$2^{127}$
Double (float64)$\approx 1.8 \times 10^{308}$$2^{1023}$

In IEEE 754, the result is set to $\pm\infty$ on overflow. Arithmetic with $\infty$ follows rules such as $\infty + 1 = \infty$, $\infty \times 2 = \infty$, $\infty - \infty = \text{NaN}$.

2. Underflow

Underflow occurs when the absolute value of a nonzero result is smaller than the minimum normalized number.

$$0 < |x| < x_{\min} \quad \Rightarrow \quad \text{underflow}$$
PrecisionMin normalized $x_{\min}$Min denormalized
Single (float32)$\approx 1.2 \times 10^{-38}$$\approx 1.4 \times 10^{-45}$
Double (float64)$\approx 2.2 \times 10^{-308}$$\approx 5.0 \times 10^{-324}$

Underflow is often less critical than overflow. IEEE 754 gradual underflow ensures results approach zero by gradually losing precision rather than being flushed to zero suddenly.

3. Handling in IEEE 754

0 x_min x_max Subnorm. Normalized +∞ −x_min −x_max Subnorm. Normalized −∞ Gradual underflow Overflow Overflow
Figure 1. Representable range of floating-point numbers (symmetric about zero). Near zero, subnormal numbers provide gradual underflow. Beyond $\pm x_{\max}$, results overflow to $\pm\infty$.

Special values in IEEE 754:

  • $\pm\infty$: Result of overflow. Also includes $1/0 = +\infty$, $-1/0 = -\infty$.
  • NaN (Not a Number): Result of undefined operations such as $0/0$, $\infty - \infty$, $\sqrt{-1}$.
  • $\pm 0$: Positive and negative zero are distinguished. $1/(+0) = +\infty$, $1/(-0) = -\infty$.

4. Denormalized Numbers (Subnormals)

Denormalized numbers (subnormal numbers) represent values smaller than $x_{\min}$ by setting the implicit leading bit of the significand to 0 when the exponent is at its minimum value.

Denormalized numbers guarantee the important property $x - y = 0 \Leftrightarrow x = y$ (this property breaks with flush-to-zero underflow).

However, denormalized numbers have fewer significant digits than normalized numbers, and on some processors, arithmetic with denormalized numbers is significantly slower (over 100x), posing a performance concern.

5. Typical Occurrence Scenarios

Overflow

  • Factorials: $170! \approx 7.26 \times 10^{306}$ is within double precision range, but $171! \approx 1.24 \times 10^{309}$ overflows.
  • Exponential function: $e^{709} \approx 8.2 \times 10^{307}$ is within range, but $e^{710}$ overflows.
  • Vector norms: In $\|x\|_2 = \sqrt{\sum x_i^2}$, large $x_i$ values can cause $x_i^2$ to overflow.

Underflow

  • Probability products: Products of many small probabilities easily underflow. Compute in log-space (log-probabilities).
  • Exponential decay: $e^{-x}$ underflows to zero when $x > 745$ (double precision).
  • Gaussian density: $e^{-x^2/2}$ at distant points is extremely small.

6. Avoidance Techniques

Log-Scale Computation

Convert products to sums: compute $\sum \log p_i$ instead of $\prod p_i$. Apply $\exp$ only when the final result is needed.

Log-Sum-Exp Trick

To compute $\log\left(\sum_i e^{x_i}\right)$ stably, let $M = \max_i x_i$:

$$\log\left(\sum_i e^{x_i}\right) = M + \log\left(\sum_i e^{x_i - M}\right)$$

Since $x_i - M \le 0$, we have $e^{x_i-M} \le 1$, preventing overflow.

Scaling (Normalization)

When computing the vector norm $\|x\|_2$, divide by $m = \max_i |x_i|$ first:

$$\|x\|_2 = m \sqrt{\sum_i (x_i / m)^2}$$

Since $(x_i/m)^2 \le 1$, overflow is prevented. LAPACK's dnrm2 uses this technique.

Formula Transformation

Transforming $e^a / e^b$ to $e^{a-b}$ or $\sqrt{a} \cdot \sqrt{b}$ to $\sqrt{ab}$ can avoid intermediate overflow/underflow.

7. Frequently Asked Questions

Q1. What is overflow?

Overflow occurs when the absolute value of an arithmetic result exceeds the maximum representable floating-point number. In IEEE 754, the result becomes $\pm\infty$. The maximum double precision value is approximately $1.8 \times 10^{308}$.

Q2. What is underflow?

Underflow occurs when the absolute value of a nonzero result is smaller than the minimum normalized number. IEEE 754 gradual underflow (denormalized numbers) reduces precision gradually but avoids sudden flushing to zero.

Q3. How can overflow and underflow be avoided?

Key methods include log-scale computation (log-sum-exp), formula transformations, scaling/normalization, and extended precision arithmetic.

8. References

  • Wikipedia, "Arithmetic overflow" (Japanese)
  • Wikipedia, "Arithmetic underflow" (English)
  • Wikipedia, "Denormal number" (English)
  • D. Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic," ACM Computing Surveys, vol. 23, no. 1, pp. 5--48, 1991.
  • IEEE 754-2019, IEEE Standard for Floating-Point Arithmetic, IEEE, 2019.