Chapter 10: Interval Arithmetic

Interval Arithmetic

History and Motivation

In numerical computation, floating-point arithmetic incurs a rounding error at every step. For instance, IEEE 754 double precision introduces a relative error of about $10^{-16}$ at each operation, and it is far from obvious how far this error accumulates after thousands of steps. Traditionally, this problem was addressed by manual error analysis: one estimated the rounding error of each operation individually and combined them to derive an upper bound on the final error. However, this method is laborious and prone to human mistakes, and is impractical for large-scale numerical computations.

Ramon Moore and the Birth of Interval Arithmetic

In the 1960s, Ramon E. Moore systematically proposed interval arithmetic in his doctoral dissertation and in his book "Interval Analysis" (1966). The central idea is simple: instead of a single floating-point number, one carries out computations using an interval that is guaranteed to contain the true value. If each operation is defined so that intervals are propagated correctly, the final interval automatically provides an upper bound on the error.

The motivations behind interval arithmetic can be summarised as follows:

  • Automation: it makes manual error analysis unnecessary; the computer itself tracks the error bounds.
  • Rigour: the resulting interval is mathematically guaranteed to contain the true value.
  • Generality: it extends beyond the four basic operations to elementary functions and to the solution of equations.
  • Verification: it can deliver mathematically rigorous existence and non-existence proofs for the results of numerical computations.

Since Moore, interval arithmetic has developed vigorously, with influential textbooks by Alefeld & Herzberger (1983), Neumaier (1990), and Tucker (2011), among others. It is now used as a foundational technology for computer-assisted proofs in a wide range of areas, including existence proofs for solutions of differential equations and the analysis of chaotic dynamical systems.

Definition of Interval Arithmetic

Basic Concepts

Definition (Closed Interval)

For real numbers $\underline{x}, \overline{x}$ with $\underline{x} \leq \overline{x}$, the set

$$\mathbf{x} = [\underline{x}, \overline{x}] = \{ x \in \mathbb{R} \mid \underline{x} \leq x \leq \overline{x} \}$$

is called a closed interval. We call $\underline{x}$ the lower endpoint (infimum) and $\overline{x}$ the upper endpoint (supremum). The set of all closed intervals is denoted by $\mathbb{IR}$.

Definition (Quantities Associated with an Interval)

For an interval $\mathbf{x} = [\underline{x}, \overline{x}]$, define:

  • Width: $w(\mathbf{x}) = \overline{x} - \underline{x}$
  • Midpoint: $\text{mid}(\mathbf{x}) = \dfrac{\underline{x} + \overline{x}}{2}$
  • Radius: $\text{rad}(\mathbf{x}) = \dfrac{w(\mathbf{x})}{2} = \dfrac{\overline{x} - \underline{x}}{2}$
  • Absolute value (magnitude): $|\mathbf{x}| = \max(|\underline{x}|, |\overline{x}|)$
  • Mignitude: $\langle\mathbf{x}\rangle = \begin{cases} \min(|\underline{x}|, |\overline{x}|) & \text{if } 0 \notin \mathbf{x} \\ 0 & \text{otherwise} \end{cases}$

In particular, when $w(\mathbf{x}) = 0$, the interval $\mathbf{x}$ is a point interval (thin interval) and may be identified with the ordinary real number.

The Four Basic Operations

The four basic operations on intervals are defined as the smallest interval containing the set of all attainable values of the operation. Formally, for a binary operation $\circ \in \{+, -, \times, /\}$, we set

$$\mathbf{x} \circ \mathbf{y} = \{ x \circ y \mid x \in \mathbf{x}, \, y \in \mathbf{y} \}$$

and define $\mathbf{x} \circ \mathbf{y}$ as the smallest interval hull containing this set. Explicit formulas for each operation follow.

Definition (Interval Addition)

$$[\underline{x}, \overline{x}] + [\underline{y}, \overline{y}] = [\underline{x} + \underline{y},\; \overline{x} + \overline{y}]$$

Proof

Take any $a \in [\underline{x}, \overline{x}]$ and $b \in [\underline{y}, \overline{y}]$. By definition $\underline{x} \leq a \leq \overline{x}$ and $\underline{y} \leq b \leq \overline{y}$, so adding these inequalities gives

$$\underline{x} + \underline{y} \leq a + b \leq \overline{x} + \overline{y}.$$

Hence $a + b \in [\underline{x} + \underline{y}, \overline{x} + \overline{y}]$. Conversely, the lower endpoint is attained at $a = \underline{x}, b = \underline{y}$ and the upper endpoint at $a = \overline{x}, b = \overline{y}$, so this is the smallest enclosing interval. $\square$

Definition (Interval Subtraction)

$$[\underline{x}, \overline{x}] - [\underline{y}, \overline{y}] = [\underline{x} - \overline{y},\; \overline{x} - \underline{y}]$$

Proof

For $a \in [\underline{x}, \overline{x}]$ and $b \in [\underline{y}, \overline{y}]$ we have $-\overline{y} \leq -b \leq -\underline{y}$, hence

$$\underline{x} - \overline{y} \leq a - b \leq \overline{x} - \underline{y}.$$

Equality at the endpoints is attained in the same way, so this is the smallest enclosing interval. $\square$

Definition (Interval Multiplication)

Let $S = \{\underline{x}\,\underline{y},\; \underline{x}\,\overline{y},\; \overline{x}\,\underline{y},\; \overline{x}\,\overline{y}\}$. Then

$$[\underline{x}, \overline{x}] \times [\underline{y}, \overline{y}] = [\min S,\; \max S].$$

Proof

Since $f(a, b) = ab$ is linear in each variable separately, its maximum and minimum on the rectangle $[\underline{x}, \overline{x}] \times [\underline{y}, \overline{y}]$ are attained at one of the four vertices (combinations of endpoints): $(\underline{x}, \underline{y})$, $(\underline{x}, \overline{y})$, $(\overline{x}, \underline{y})$, $(\overline{x}, \overline{y})$. The minimum and maximum of $S$ therefore give the lower and upper endpoints. $\square$

Special Cases of Multiplication

When the signs of both intervals are determined, it is unnecessary to compute all four products:

  • If $\underline{x} \geq 0$ and $\underline{y} \geq 0$ (both non-negative): $\mathbf{x} \times \mathbf{y} = [\underline{x}\,\underline{y},\; \overline{x}\,\overline{y}]$
  • If $\overline{x} \leq 0$ and $\overline{y} \leq 0$ (both non-positive): $\mathbf{x} \times \mathbf{y} = [\overline{x}\,\overline{y},\; \underline{x}\,\underline{y}]$
  • If $\underline{x} \geq 0$ and $\overline{y} \leq 0$ ($\mathbf{x}$ non-negative, $\mathbf{y}$ non-positive): $\mathbf{x} \times \mathbf{y} = [\overline{x}\,\underline{y},\; \underline{x}\,\overline{y}]$

Implementations branch on the signs and skip the unnecessary multiplications for efficiency.

Definition (Interval Division)

When $0 \notin [\underline{y}, \overline{y}]$:

$$[\underline{x}, \overline{x}] \;/\; [\underline{y}, \overline{y}] = [\underline{x}, \overline{x}] \times \left[\dfrac{1}{\overline{y}},\; \dfrac{1}{\underline{y}}\right]$$

Proof

When $0 \notin [\underline{y}, \overline{y}]$, any $b \in [\underline{y}, \overline{y}]$ satisfies $b \neq 0$. The function $g(b) = 1/b$ is monotone on any interval not containing $0$:

  • If $\underline{y} > 0$, then $g$ is monotonically decreasing and $1/b \in [1/\overline{y}, 1/\underline{y}]$.
  • If $\overline{y} < 0$, then $g$ is monotonically decreasing and $1/b \in [1/\overline{y}, 1/\underline{y}]$.

In either case $\{1/b \mid b \in [\underline{y}, \overline{y}]\} = [1/\overline{y}, 1/\underline{y}]$, so division reduces to multiplication by the interval of reciprocals. $\square$

On Division by Zero

When $0 \in [\underline{y}, \overline{y}]$, the reciprocal $1/b$ blows up as $b \to 0$, so in ordinary interval arithmetic division is left undefined. Extended interval arithmetic introduces extended intervals containing $[-\infty, +\infty]$ to handle this case. For instance, one sets $[1,2] / [-1, 1] = [-\infty, +\infty]$, returning the whole real line. More precisely, when $\underline{y} = 0$ and $\overline{y} > 0$ one may return a half-infinite interval such as $[1,2] / [0, 1] = [1, +\infty]$.

Concrete Numerical Examples

Example: Addition

$$[1.5, 2.5] + [3.0, 4.0] = [1.5 + 3.0,\; 2.5 + 4.0] = [4.5, 6.5]$$

Example: Multiplication

Let $\mathbf{x} = [1.5, 2.5]$, $\mathbf{y} = [3.0, 4.0]$. Both intervals are positive, so

$$[1.5, 2.5] \times [3.0, 4.0] = [1.5 \times 3.0,\; 2.5 \times 4.0] = [4.5, 10.0].$$

Example: Multiplication with Mixed Signs

Let $\mathbf{x} = [-1, 2]$, $\mathbf{y} = [-3, 4]$. The four products of endpoints are:

$$S = \{(-1)(-3),\; (-1)(4),\; (2)(-3),\; (2)(4)\} = \{3, -4, -6, 8\}$$ $$[-1, 2] \times [-3, 4] = [\min S, \max S] = [-6, 8]$$

The Inclusion Theorem of Interval Arithmetic

Theorem (Fundamental Theorem of Interval Arithmetic: Inclusion)

Let $f(x_1, \ldots, x_n)$ be a rational expression in the variables $x_1, \ldots, x_n$ (an expression built solely from the four basic operations). Let $\mathbf{f}(\mathbf{x}_1, \ldots, \mathbf{x}_n)$ denote the interval extension of $f$ obtained by replacing every operation by the corresponding interval operation.

If $x_i \in \mathbf{x}_i$ for each $i = 1, \ldots, n$ (and no division by zero occurs), then

$$f(x_1, \ldots, x_n) \in \mathbf{f}(\mathbf{x}_1, \ldots, \mathbf{x}_n).$$

Proof

We argue by induction on the structure of $f$.

Base case: For $f(x_i) = x_i$ the inclusion $x_i \in \mathbf{x}_i$ is immediate. For $f = c$ (a constant), $c \in [c, c]$ is also immediate.

Inductive step: For $f = g \circ h$ ($\circ \in \{+,-,\times,/\}$), the induction hypothesis gives $g(x_1, \ldots, x_n) \in \mathbf{g}(\mathbf{x}_1, \ldots, \mathbf{x}_n)$ and $h(x_1, \ldots, x_n) \in \mathbf{h}(\mathbf{x}_1, \ldots, \mathbf{x}_n)$. By the definition of interval arithmetic, $a \in \mathbf{a}$ and $b \in \mathbf{b}$ imply $a \circ b \in \mathbf{a} \circ \mathbf{b}$, so

$$f(x_1, \ldots, x_n) = g(\ldots) \circ h(\ldots) \in \mathbf{g}(\ldots) \circ \mathbf{h}(\ldots) = \mathbf{f}(\mathbf{x}_1, \ldots, \mathbf{x}_n).$$

$\square$

Remark: Meaning of Inclusion

The inclusion property guarantees that the true value is always contained in the resulting interval, but it does not in general guarantee that the resulting interval is the smallest (tightest) possible. When the same variable occurs several times, each occurrence is treated independently, so the resulting interval can be larger than the true range of values. This is the "dependency problem", to be discussed in the next section.

The Dependency Problem

While interval arithmetic guarantees inclusion, it suffers from an intrinsic overestimation problem. This phenomenon is called the dependency problem (or wrapping effect), and arises because interval arithmetic treats multiple occurrences of the same variable as independent.

Concrete Examples

Example 1: Overestimation of $x - x$

For $x \in [0, 1]$ we actually have $x - x = 0$ for every $x$. In interval arithmetic, however,

$$[0, 1] - [0, 1] = [0 - 1,\; 1 - 0] = [-1, 1].$$

The resulting interval has width $2$ and grossly overestimates the true range $\{0\}$. The reason is that in the subtraction $\mathbf{x} - \mathbf{y}$ the information "$\mathbf{x}$ and $\mathbf{y}$ are the same variable" is lost, so the combination of taking the maximum of $\mathbf{x}$ and the minimum of $\mathbf{y}$ ($1 - 0 = 1$) is admitted.

Example 2: Result Depends on the Form of the Expression

For $x \in [2, 3]$, consider $f(x) = x^2 - 2x + 1 = (x-1)^2$. The true range is $[1, 4]$.

Direct evaluation ($x$ occurs three times):

$$\begin{align} [2,3]^2 - 2 \cdot [2,3] + 1 &= [4, 9] - [4, 6] + [1, 1] \\ &= [4-6,\; 9-4] + [1, 1] \\ &= [-2, 5] + [1, 1] \\ &= [-1, 6] \end{align}$$

The result $[-1, 6]$ has width $7$ and overestimates the true range $[1, 4]$.

Evaluation after rearrangement ($x$ occurs only once):

$$([2,3] - [1,1])^2 = [1, 2]^2 = [1, 4]$$

This coincides with the true range $[1, 4]$. Reducing the number of occurrences of $x$ to one eliminates the dependency problem entirely.

Root Cause

The fundamental cause of the dependency problem is that interval arithmetic treats each occurrence of each variable as an independent variable. Mathematically, interval arithmetic faithfully computes the set operation

$$\mathbf{x} \circ \mathbf{y} = \{ x \circ y \mid x \in \mathbf{x}, \, y \in \mathbf{y} \}$$

but has no mechanism to incorporate the constraint "$x$ and $y$ are in fact the same value of the same variable". The more occurrences of the same variable in an expression, the larger the overestimation tends to be.

Remedies

1. Rewriting the Expression

As shown in Example 2 above, transforming the expression into a mathematically equivalent form that reduces the number of variable occurrences can mitigate overestimation. In particular, rewriting $f(x) = x^2 - 2x + 1$ as $f(x) = (x-1)^2$ makes $x$ appear only once, eliminating the dependency problem. However, such rewriting is not always possible for every expression.

2. Affine Arithmetic

Definition (Affine Form)

An affine form is an expression of the type

$$\hat{x} = x_0 + x_1 \varepsilon_1 + x_2 \varepsilon_2 + \cdots + x_n \varepsilon_n$$

where $x_0 \in \mathbb{R}$ is the central value, $x_i \in \mathbb{R}$ ($i \geq 1$) are the deviation coefficients, and $\varepsilon_i \in [-1, 1]$ are independent variables called noise symbols.

The key idea of affine arithmetic is that shared noise symbols across different quantities encode their mutual dependence. For example, if $\hat{x} = 5 + 2\varepsilon_1$ and $\hat{y} = \hat{x}$, then $\hat{x} - \hat{y} = (5 + 2\varepsilon_1) - (5 + 2\varepsilon_1) = 0$, so $x - x$, which is overestimated in ordinary interval arithmetic, is correctly computed as $0$.

The Four Operations of Affine Arithmetic

The rules for $\hat{x} = x_0 + \displaystyle\sum_i x_i \varepsilon_i$ and $\hat{y} = y_0 + \displaystyle\sum_i y_i \varepsilon_i$ are as follows.

Operation Resulting Affine Form Accuracy
$\hat{x} + \hat{y}$ $(x_0 + y_0) + \displaystyle\sum_i (x_i + y_i)\varepsilon_i$ Exact (no new symbol needed)
$\alpha \hat{x} + \beta$ $(\alpha x_0 + \beta) + \displaystyle\sum_i \alpha x_i \varepsilon_i$ Exact
$\hat{x} \cdot \hat{y}$ $x_0 y_0 + \displaystyle\sum_i (x_0 y_i + y_0 x_i)\varepsilon_i + z_{n+1}\varepsilon_{n+1}$ Approximate (the quadratic part is collected into a new symbol)

In multiplication the quadratic term $\displaystyle\sum_{i,j} x_i y_j \varepsilon_i \varepsilon_j$ cannot be expressed as an affine form, so its absolute value is bounded above by $z_{n+1} = \displaystyle\sum_i |x_i| \cdot \displaystyle\sum_j |y_j|$ and absorbed into a new noise symbol $\varepsilon_{n+1}$.

Affine Arithmetic vs Interval Arithmetic

Compute $f(x) = x^2 - x$ for $x \in [2, 4]$.

  • Interval arithmetic: $[2,4]^2 - [2,4] = [4,16] - [2,4] = [0, 14]$ (an overestimate of the true range $[2, 12]$).
  • Affine arithmetic: put $\hat{x} = 3 + \varepsilon_1$. Then $\hat{x}^2 \approx 9 + 6\varepsilon_1 + \varepsilon_2$ (with $|\varepsilon_2| \leq 1$), so $\hat{x}^2 - \hat{x} = 6 + 5\varepsilon_1 + \varepsilon_2 \in [0, 12]$, which is more accurate.

Limitations and Refinements of Affine Arithmetic

Affine arithmetic tracks linear dependence exactly, but in nonlinear operations a new noise symbol is introduced, so the number of symbols grows in long computations. Improved variants such as Modified Affine Arithmetic (MAA) and Generalised Affine Arithmetic (GAA) have been proposed, and they improve symbol management in nonlinear operations.

3. Taylor Models

A Taylor model represents the value of a function as a pair consisting of a Taylor polynomial and an interval remainder:

$$f(x) \in T_k(x) + \mathbf{r}$$

where $T_k(x)$ is a Taylor polynomial of degree $k$ and $\mathbf{r}$ is an interval enclosing the remainder. The polynomial part captures the behaviour of the function precisely, while only the remainder is enclosed by an interval, drastically suppressing overestimation compared to plain interval arithmetic. The framework was systematised by Makino & Berz (1996) and is applied in beam physics and celestial mechanics.

Operations on Taylor Models

For Taylor models $\mathcal{T}_f = (T_f, \mathbf{r}_f)$ and $\mathcal{T}_g = (T_g, \mathbf{r}_g)$, the operations are as follows.

  • Addition: $\mathcal{T}_f + \mathcal{T}_g = (T_f + T_g,\; \mathbf{r}_f + \mathbf{r}_g)$.
  • Multiplication: keep the terms of $T_f \cdot T_g$ of degree $\leq k$ in the polynomial part, and add a range enclosing the terms of degree $\geq k+1$ to the interval remainder.
  • Composition: compute the Taylor expansion of $f(g(x))$ up to degree $k$ via automatic differentiation, and enclose the remainder in an interval.
Method Order of Overestimation Computational Cost Typical Use
Interval arithmetic $O(w(\mathbf{x}))$ Minimal Basic operations on narrow intervals
Affine arithmetic $O(w(\mathbf{x})^2)$ Moderate Highly linear computations
Degree-$k$ Taylor model $O(w(\mathbf{x})^{k+1})$ $O(n^k)$ Long-time integration of ODEs, global optimisation

Applications of Taylor Models

  • Verified solution of ODEs: update the Taylor model step by step to guarantee existence and uniqueness of the solution while integrating over long times (COSY INFINITY, CAPD).
  • Global optimisation: efficiently compute lower bounds of the objective from the polynomial part of the Taylor model to accelerate branch-and-bound methods.
  • Celestial mechanics: propagate orbital uncertainties of asteroids via Taylor models to rigorously evaluate collision probabilities.

4. Mean Value Form

Mean Value Form

If $f$ is differentiable and $m = \text{mid}(\mathbf{x})$, then

$$f(\mathbf{x}) \subseteq f(m) + f'(\mathbf{x}) \cdot (\mathbf{x} - m).$$

The right-hand side is computed by ordinary interval arithmetic. When the width of $\mathbf{x}$ is small, the width of $(\mathbf{x} - m)$ is about $w(\mathbf{x})/2$, so the result is often tighter than the direct interval evaluation.

Justification

By the mean value theorem, for every $x \in \mathbf{x}$ there exists some $\xi \in \mathbf{x}$ such that $f(x) = f(m) + f'(\xi)(x - m)$. Since $\xi \in \mathbf{x}$ we have $f'(\xi) \in f'(\mathbf{x})$, and since $x - m \in \mathbf{x} - m$,

$$f(x) = f(m) + f'(\xi)(x - m) \in f(m) + f'(\mathbf{x}) \cdot (\mathbf{x} - m).$$

$\square$

The Interval Newton Method

Review of the Ordinary Newton Method

The Newton method for finding a root $x^*$ of $f(x^*) = 0$ starts from an initial value $x_0$ and iterates

$$x_{n+1} = x_n - \dfrac{f(x_n)}{f'(x_n)}.$$

The classical Newton method converges quickly (quadratically), but has the following limitations:

  • It may not converge unless the initial value is sufficiently close to a root.
  • Even when it converges, it can miss other roots.
  • It provides no facility for proving the existence of a root.

The Interval Newton Operator

Definition (Interval Newton Operator)

Let $f$ be continuously differentiable, $\mathbf{x} = [\underline{x}, \overline{x}]$, and $m = \text{mid}(\mathbf{x})$. The interval Newton operator is defined by

$$N(\mathbf{x}) = m - \dfrac{f(m)}{f'(\mathbf{x})}$$

where $f'(\mathbf{x})$ is the interval evaluation of $f'$ on $\mathbf{x}$ and the division is interval division.

The iteration consists of intersecting this operator with the original interval:

$$\mathbf{x}_{\text{new}} = \mathbf{x} \cap N(\mathbf{x}).$$

Three Possible Outcomes

The interval Newton operator produces one of three possible outcomes, and this is where the real power of the method lies.

Theorem (Interval Newton Decision Test)

Let $f$ be continuously differentiable on $\mathbf{x}$ and assume $0 \notin f'(\mathbf{x})$. Then:

  1. If $N(\mathbf{x}) \cap \mathbf{x} = \emptyset$: no root of $f$ exists in $\mathbf{x}$ (verified non-existence).
  2. If $N(\mathbf{x}) \subseteq \text{int}(\mathbf{x})$ (strict containment): there exists exactly one root of $f$ in $\mathbf{x}$ (verified existence and uniqueness).
  3. Otherwise: existence of a root is undetermined, but the interval is shrunk.

Sketch of Proof

Case 1 (non-existence):

Assume that $f(x^*) = 0$ with $x^* \in \mathbf{x}$. By the mean value theorem, there exists $\xi \in \mathbf{x}$ with

$$f(m) = f(m) - f(x^*) = f'(\xi)(m - x^*),$$ $$x^* = m - \dfrac{f(m)}{f'(\xi)}.$$

Since $\xi \in \mathbf{x}$ we have $f'(\xi) \in f'(\mathbf{x})$, so $x^* \in m - f(m)/f'(\mathbf{x}) = N(\mathbf{x})$. At the same time $x^* \in \mathbf{x}$, hence $x^* \in N(\mathbf{x}) \cap \mathbf{x}$. Taking the contrapositive, $N(\mathbf{x}) \cap \mathbf{x} = \emptyset$ implies that no root of $f$ exists in $\mathbf{x}$.

Case 2 (existence and uniqueness):

When $N(\mathbf{x}) \subseteq \text{int}(\mathbf{x})$, one shows that the mapping $g(x) = x - f(x)/f'(c)$ (with $c$ a suitable representative of $f'(\mathbf{x})$) is a contraction from $\mathbf{x}$ into $\mathbf{x}$. By Brouwer's fixed point theorem (or the contraction mapping principle), $\mathbf{x}$ contains exactly one fixed point, that is, exactly one root of $f$. $\square$

The Power of the Interval Newton Method

The ordinary Newton method is merely a "numerical procedure" that converges to a root, whereas the interval Newton method is a mathematical proof tool capable of proving existence and proving non-existence of roots. This is an exceptional property in numerical analysis, and the method has become a cornerstone of computer-assisted proof.

Worked Example

Example: Finding the Root of $f(x) = x^2 - 2$ in $[1, 2]$

Here $f(x) = x^2 - 2$ and $f'(x) = 2x$. Take the initial interval $\mathbf{x} = [1, 2]$.

Iteration 1:

  • $m = \text{mid}([1, 2]) = 1.5$
  • $f(m) = f(1.5) = 2.25 - 2 = 0.25$
  • $f'(\mathbf{x}) = 2 \cdot [1, 2] = [2, 4]$
  • $N([1, 2]) = 1.5 - \dfrac{0.25}{[2, 4]} = 1.5 - [0.0625, 0.125] = [1.375, 1.4375]$

Since $[1.375, 1.4375] \subset [1, 2]$ (strict containment), case 2 of the theorem applies, and the existence of exactly one root of $f$ in $[1, 2]$ is proved.

Indeed, $\sqrt{2} \approx 1.41421356\ldots$ lies in $[1.375, 1.4375]$.

Iteration 2: continue with $\mathbf{x} = [1.375, 1.4375]$.

  • $m = 1.40625$
  • $f(m) = 1.40625^2 - 2 = 1.97754 - 2 = -0.02246$
  • $f'(\mathbf{x}) = 2 \cdot [1.375, 1.4375] = [2.75, 2.875]$
  • $N(\mathbf{x}) = 1.40625 - \dfrac{-0.02246}{[2.75, 2.875]} = 1.40625 + [0.00781, 0.00817] = [1.41406, 1.41442]$

The width has shrunk from $0.0625$ to $0.00036$, rapidly converging to $\sqrt{2}$.

Combination with the Bisection Method

In practice the interval Newton method is combined with bisection. One splits the search interval in two and applies the interval Newton method to each subinterval:

  1. Bisect the initial interval $\mathbf{x}$ into $\mathbf{x}_L$ and $\mathbf{x}_R$.
  2. Compute the interval Newton operator on each subinterval.
  3. If $N(\mathbf{x}_i) \cap \mathbf{x}_i = \emptyset$, discard the subinterval (non-existence has been proved).
  4. If $N(\mathbf{x}_i) \subseteq \text{int}(\mathbf{x}_i)$, conclude that the subinterval contains exactly one root.
  5. Otherwise, bisect further and recurse.

This procedure finds all roots in a given interval without missing any, and the existence of each root is proved mathematically. It enables a global and rigorous root search that is not available with the ordinary Newton method.

Interval Extensions of Elementary Functions

In addition to the four basic operations, one needs to apply elementary functions (powers, exponentials and logarithms, trigonometric functions, etc.) to intervals. The underlying principle is the same: compute the smallest interval enclosing the range of the function. For monotonic functions only endpoint evaluation is required, while for non-monotonic functions (such as trigonometric functions) one must take account of extrema inside the interval.

The Monotonic Case

Theorem (Interval Extension of a Monotonic Function)

If $f$ is monotonically increasing on $[\underline{x}, \overline{x}]$, then

$$f([\underline{x}, \overline{x}]) = [f(\underline{x}), f(\overline{x})].$$

If $f$ is monotonically decreasing, then

$$f([\underline{x}, \overline{x}]) = [f(\overline{x}), f(\underline{x})].$$

The theorem follows immediately from monotonicity together with the intermediate value theorem. The exponential $\exp$, logarithm $\log$, and hyperbolic functions such as $\sinh$ are monotonic on their entire domains, so the smallest enclosing interval is obtained from endpoint evaluation alone.

Example: Interval Extension of the Exponential

Since $\exp$ is monotonically increasing,

$$\exp([1, 2]) = [\exp(1), \exp(2)] = [2.71828\ldots, 7.38905\ldots].$$

In a floating-point implementation, the lower endpoint is rounded towards $-\infty$ and the upper endpoint towards $+\infty$.

The Power $x^n$

Interval Power

The computation of an integer power $[\underline{x}, \overline{x}]^n$ depends on the parity of $n$ and on the sign of the interval:

  • If $n$ is odd: $x^n$ is monotonically increasing, so the result is $[\underline{x}^n, \overline{x}^n]$.
  • If $n$ is even: $x^n$ attains its minimum at $x = 0$, so the following case analysis is needed:
    • $\underline{x} \geq 0$ (entirely non-negative): $[\underline{x}^n, \overline{x}^n]$.
    • $\overline{x} \leq 0$ (entirely non-positive): $[\overline{x}^n, \underline{x}^n]$.
    • $\underline{x} < 0 < \overline{x}$ (straddling zero): $[0, \max(\underline{x}^n, \overline{x}^n)]$.

Example: $[-1, 2]^2$ vs $[-1, 2] \times [-1, 2]$

Interpreted as the power $x^2$ ($x$ identical in the two factors):

$$[-1, 2]^2 = [0, \max((-1)^2, 2^2)] = [0, 4]$$

Interpreted as the multiplication $\mathbf{x} \times \mathbf{x}$ (ignoring dependency):

$$[-1, 2] \times [-1, 2] = [\min\{1, -2, -2, 4\}, \max\{1, -2, -2, 4\}] = [-2, 4]$$

The power formula matches the true range $[0, 4]$, while the multiplication formula does not. This is a textbook instance of the dependency problem caused by treating the two occurrences of $x$ in $x \times x$ as independent. Interval arithmetic libraries therefore implement $x^2$ via a dedicated power function rather than multiplication, to avoid this overestimation.

Trigonometric Functions

Trigonometric functions are non-monotonic, so their interval extension requires more elaborate processing. We explain the case of $\sin$.

Interval Extension of $\sin$

To compute $\sin([\underline{x}, \overline{x}])$, one must check whether the interval $[\underline{x}, \overline{x}]$ contains an extremum of $\sin$ (a point $x = \pi/2 + k\pi$, $k \in \mathbb{Z}$):

  1. Check whether $[\underline{x}, \overline{x}]$ contains a local maximum ($\sin = 1$).
  2. Check whether $[\underline{x}, \overline{x}]$ contains a local minimum ($\sin = -1$).
  3. If no extremum is contained, the decision can be made from the endpoint values alone ($\sin$ is monotonic on the relevant subinterval).
  4. The upper endpoint is $\max(\sin\underline{x}, \sin\overline{x})$ (and $1$ if a local maximum is contained).
  5. The lower endpoint is $\min(\sin\underline{x}, \sin\overline{x})$ (and $-1$ if a local minimum is contained).

Example: $\sin([0, \pi])$

The interval $[0, \pi]$ contains the local maximum at $x = \pi/2$. The endpoint values are $\sin(0) = 0$ and $\sin(\pi) = 0$, and the value at the maximum is $\sin(\pi/2) = 1$. Therefore

$$\sin([0, \pi]) = [\min(0, 0), 1] = [0, 1].$$

If one used only the endpoints, the incorrect result $[\min(0,0), \max(0,0)] = [0, 0]$ would be obtained. This shows that detecting extrema inside the interval is essential.

Implementation Note

In the interval extension of trigonometric functions, the position of extrema depends on the value of $\pi$, which is irrational and cannot be represented exactly. The implementation must therefore use an interval enclosure $[\underline{\pi}, \overline{\pi}]$ of $\pi$ for a safe decision. This "$\pi$ problem" is one of the important technical challenges in implementing interval arithmetic libraries.

Interval Matrix Arithmetic

The extension of interval arithmetic from scalars to vectors and matrices is the foundation of many applications, including the verified solution of linear systems (Chapter 11). An interval matrix is a matrix whose entries are intervals.

Definition (Interval Vector and Interval Matrix)

An interval vector $[\mathbf{x}] = ([\underline{x}_i, \overline{x}_i])_{i=1}^n \in \mathbb{IR}^n$ is a vector whose entries are intervals. Geometrically, it represents an axis-aligned box in $\mathbb{R}^n$.

An interval matrix $[\mathbf{A}] = ([\underline{a}_{ij}, \overline{a}_{ij}])_{i,j} \in \mathbb{IR}^{m \times n}$ is a matrix whose entries are intervals.

Operations on Interval Matrices

Addition and multiplication of interval matrices are obtained by decomposing the ordinary matrix operations into scalar operations and replacing each scalar operation by an interval operation.

Interval Matrix Multiplication

For $[\mathbf{A}] \in \mathbb{IR}^{m \times p}$ and $[\mathbf{B}] \in \mathbb{IR}^{p \times n}$:

$$([\mathbf{A}][\mathbf{B}])_{ij} = \displaystyle\sum_{k=1}^{p} [\mathbf{A}]_{ik} \cdot [\mathbf{B}]_{kj}$$

Each entry is computed as an interval sum of $p$ interval products. Inclusion follows automatically from the scalar case.

Overestimation

As in the scalar case, interval matrix multiplication suffers from overestimation due to the dependency problem. The inner product $\displaystyle\sum_{k} a_{ik} b_{kj}$ is computed term by term, so the overestimation accumulates.

Multiplication of $n \times n$ interval matrices costs $O(n^3)$ interval multiplications and additions, roughly four times the cost of ordinary matrix multiplication (because each interval operation requires two floating-point operations).

Interval Linear Systems

For an interval matrix $[\mathbf{A}] \in \mathbb{IR}^{n \times n}$ and an interval vector $[\mathbf{b}] \in \mathbb{IR}^n$, consider the solution set

$$\Sigma([\mathbf{A}], [\mathbf{b}]) = \{ \mathbf{x} \in \mathbb{R}^n \mid A\mathbf{x} = \mathbf{b} \text{ for some } A \in [\mathbf{A}], \, \mathbf{b} \in [\mathbf{b}] \}.$$

Structure of the Solution Set

In general the solution set $\Sigma$ is not convex, so enclosing it by an interval vector (box) introduces overestimation. The Oettli–Prager theorem (1964) gives the following characterisation of the solution set:

Theorem (Oettli–Prager, 1964)

Let $A_c = \text{mid}([\mathbf{A}])$, $\Delta = \text{rad}([\mathbf{A}])$, $b_c = \text{mid}([\mathbf{b}])$, $\delta = \text{rad}([\mathbf{b}])$. Then $\mathbf{x} \in \Sigma([\mathbf{A}], [\mathbf{b}])$ if and only if

$$|A_c \mathbf{x} - b_c| \leq \Delta |\mathbf{x}| + \delta$$

(the inequality holding component-wise). Here $|A_c \mathbf{x} - b_c|$ is the component-wise absolute value of the residual.

The theorem shows that the solution set is characterised by a residual condition within the uncertainty bounds of the matrix and the right-hand side. The direct computation of the smallest enclosing interval for the solution set of an interval linear system is known to be NP-hard, and in practice the Krawczyk and Rump methods discussed in Chapter 11 are used.

Constraint Propagation

Constraint propagation is a technique used in combination with interval arithmetic, which contracts intervals using equality and inequality constraints. It plays an essential role in global optimisation, the solution of nonlinear equations, and robust control design.

Basic Idea

Suppose the constraint $z = x + y$ is given for variables $x, y, z$, and the current intervals are $x \in [1, 5]$, $y \in [2, 6]$, $z \in [4, 8]$. Forward evaluation by interval arithmetic gives $z \in [1+2, 5+6] = [3, 11]$, and intersecting with the known interval $[4, 8]$ contracts to $z \in [4, 8]$.

Furthermore, one can use $z \in [4, 8]$ and $y \in [2, 6]$ to contract the interval of $x$ in the reverse direction. From $x = z - y$, $x \in [4-6, 8-2] = [-2, 6]$, and intersecting with the known $x \in [1, 5]$ gives $x \in [1, 5]$ (no contraction in this case). Similarly, from $y = z - x$, $y \in [4-5, 8-1] = [-1, 7] \cap [2, 6] = [2, 6]$.

Constraint Propagation Algorithm

Given a set of constraints $\{c_1, c_2, \ldots, c_m\}$ and initial intervals $[\mathbf{x}_1], \ldots, [\mathbf{x}_n]$ of the variables:

  1. Enqueue all constraints.
  2. Dequeue a constraint $c_k$ and contract the intervals of the involved variables in both the forward and reverse directions.
  3. If any variable's interval has been contracted, enqueue every other constraint involving that variable.
  4. Stop when the queue is empty or when the contractions fall below a threshold.
  5. If any variable's interval becomes empty, the constraint system is declared infeasible (no solution).

Contraction Rules for the Basic Operations

Reverse Propagation for the Four Operations

Contraction rules for each variable in the constraint $z = x \circ y$:

Constraint Contraction of $z$ Contraction of $x$ Contraction of $y$
$z = x + y$ $[z] \cap ([x] + [y])$ $[x] \cap ([z] - [y])$ $[y] \cap ([z] - [x])$
$z = x \times y$ $[z] \cap ([x] \times [y])$ $[x] \cap ([z] / [y])$ $[y] \cap ([z] / [x])$
$z = x^2$ $[z] \cap [x]^2$ $[x] \cap ([\pm\sqrt{\underline{z}}, \pm\sqrt{\overline{z}}])$ (taking signs into account)

When the reverse propagation of multiplication uses division, note that extended interval division is needed if $0 \in [y]$.

Application: Accelerating Global Optimisation

Example: Contracting the Search Space by Constraint Propagation

Problem: minimise $f(x, y) = x^2 + y^2$ subject to $x + y \geq 3$, $x \in [0, 5]$, $y \in [0, 5]$.

Constraint propagation: from $x + y \geq 3$ we get $x \geq 3 - y \geq 3 - 5 = -2$ (no change, given $x \geq 0$); similarly $y \geq 3 - x \geq 3 - 5 = -2$ (no change).

Suppose an upper bound $f^*_{\text{UB}}$ of $10$ for the objective is known. Then $x^2 \leq 10$ gives $x \in [0, \sqrt{10}] \cap [0, 5] = [0, 3.162]$, and similarly $y \in [0, 3.162]$.

Combining $x + y \geq 3$ with $y \leq 3.162$ yields $x \geq 3 - 3.162 = -0.162$ (no change), and $x \leq 3.162$ yields $y \geq 3 - 3.162 = -0.162$ (no change).

The search space has been reduced from $[0,5]^2$ (area $25$) to $[0, 3.162]^2$ (area $10$). Combined with an interval branch-and-bound algorithm, this significantly accelerates the search.

The HC4 Algorithm

HC4 (Benhamou et al., 1999) is a practical constraint propagation algorithm. It alternates forward evaluation and reverse contraction along the syntax tree (DAG) of the constraint expression. Each constraint can be propagated once in $O(k)$ time (where $k$ is the size of the constraint expression), and the procedure is iterated until a fixed point is reached. It is widely implemented as the core of constraint satisfaction problem (CSP) solvers.

Directed Rounding

Implementation Challenges

To ensure the inclusion property of interval arithmetic rigorously on a computer, one must control the direction of rounding when representing the endpoints of an interval as floating-point numbers. For example, the true value of $[1, 1] / [3, 3]$ is $[1/3, 1/3]$, but $1/3$ cannot be represented exactly in a finite-precision floating-point format. If the ordinary "round to nearest" mode is used, the lower and upper endpoints may be rounded to the same floating-point number, and the true value $1/3$ may fail to lie in the resulting interval.

Definition (Directed Rounding)

Interval arithmetic implementations use the following directed rounding:

  • For the lower endpoint: round towards $-\infty$ (round toward $-\infty$, floor): $$\underline{z} = \text{RD}(f(\underline{x}, \overline{x}, \underline{y}, \overline{y}))$$
  • For the upper endpoint: round towards $+\infty$ (round toward $+\infty$, ceiling): $$\overline{z} = \text{RU}(g(\underline{x}, \overline{x}, \underline{y}, \overline{y}))$$

Here $\text{RD}$ denotes rounding down and $\text{RU}$ rounding up. This guarantees that the floating-point interval $[\underline{z}, \overline{z}]$ encloses the true result interval.

IEEE 754 and the Rounding Modes

The IEEE 754 floating-point standard defines four rounding modes:

  1. Round to nearest, ties to even: the default rounding mode.
  2. Round toward $+\infty$: used for upper endpoints of intervals.
  3. Round toward $-\infty$: used for lower endpoints of intervals.
  4. Round toward zero: truncation.

Interval arithmetic mainly uses modes 2 and 3.

Hardware Support

Most modern CPUs are able to switch rounding modes by modifying the control word of the FPU (floating-point unit). For example, on the x86 architecture, the rounding mode can be controlled via the MXCSR register of the SSE.

However, switching the rounding mode incurs overhead, and frequent switching can stall the pipeline. Practical interval arithmetic libraries therefore apply optimisations such as:

  • Compute all lower endpoints together (in round-toward-$-\infty$ mode), then all upper endpoints together (in round-toward-$+\infty$ mode).
  • Alternatively, perform all computations in round-up mode and obtain the lower endpoints by sign reversal (using $\text{RD}(a + b) = -\text{RU}(-a - b)$).

Software-Based Alternatives

When hardware control of the rounding mode is unavailable, or when greater portability is required, software-based techniques can be used:

  • 1 ULP addition/subtraction: add or subtract 1 ULP (unit in the last place) from the result to guarantee inclusion. Subtract 1 ULP from the lower endpoint and add 1 ULP to the upper endpoint. This introduces overestimation but preserves correctness.
  • Integration with MPFR: the MPFR library guarantees correctly rounded results in arbitrary-precision floating-point arithmetic and lets one specify the rounding mode per operation. The interval library mpfi uses MPFR as a back end to provide arbitrary-precision interval arithmetic.

Visualisation of Interval Arithmetic

Inclusion of values by interval arithmetic Addition: [1.5, 2.5] + [3.0, 4.0] = [4.5, 6.5] x y x+y 1.5 2.5 3.0 4.0 + 4.5 6.5 x+y is guaranteed to lie in this interval Inclusion guarantee x ∈ [a,b], y ∈ [c,d] ⇒ x+y ∈ [a+c, b+d] Dependency problem: x - x (x ∈ [0, 1]) True range {0} Interval result -1 1 Overestimation

Interval Arithmetic Libraries

Implementing interval arithmetic correctly requires considerable technical effort, including control of directed rounding and rigorous interval extensions of many elementary functions. We list below some of the major interval arithmetic libraries.

Arb

A ball arithmetic library developed by Fredrik Johansson. Intervals are represented as a pair of midpoint and radius $x = (m, r)$, where $m$ is an arbitrary-precision floating-point number and $r$ is a fixed-precision positive floating-point number. The true value is contained in $[m - r, m + r]$.

This representation is symmetric and in many cases more efficient than an endpoint representation. Arb offers a wide range of features, including ball arithmetic for complex numbers, polynomials, matrices, and special functions, and is integrated with SageMath and Nemo (the computer algebra system in Julia). It is implemented in C and uses GMP/MPFR/FLINT as back ends.

INTLAB

A verified computing toolbox for MATLAB/Octave developed by Siegfried M. Rump. In addition to interval arithmetic, it provides automatic differentiation, verified solution of linear and nonlinear equations, eigenvalue enclosures, and other facilities for verified numerical computation. It is widely used both in teaching and in research.

Boost.Interval (C++)

An interval arithmetic library provided as part of the Boost C++ libraries. A template-based design allows the underlying type (float, double, etc.) to be chosen flexibly. The rounding policy is also a template parameter, making it a standard option for interval arithmetic in C++.

filib++

A C++ library designed for fast interval arithmetic. It exploits IEEE 754 rounding mode switching efficiently and provides fast interval evaluations of the four basic operations together with elementary functions. It is used in applications where performance is critical.

mpfi

An arbitrary-precision interval arithmetic library built on MPFR. It directly exploits MPFR's correctly rounded operations and directed rounding modes to provide arbitrary-precision interval arithmetic. Each endpoint of an interval is represented as an MPFR arbitrary-precision floating-point number. It is well-suited to computations requiring high precision (rigorous enclosures of constants, evaluation of remainder terms in series, etc.).

Library Selection Guide

  • Arbitrary precision required: Arb or mpfi.
  • MATLAB/Octave environment: INTLAB.
  • Fast double-precision arithmetic in C++: Boost.Interval or filib++.
  • Broad mathematical functionality (special functions, polynomial operations, etc.): Arb.
  • Verified numerical computation (linear systems, eigenvalue problems, etc.): INTLAB.

Summary

  • History and motivation: proposed by Moore in the 1960s. A framework for automatically and rigorously tracking round-off error in floating-point arithmetic.
  • Definition of interval arithmetic: extends the four basic operations to intervals; the inclusion theorem guarantees that the true value lies in the resulting interval.
  • Dependency problem: multiple occurrences of the same variable cause overestimation. Remedies include rewriting the expression, affine arithmetic, Taylor models, and the mean value form.
  • Interval Newton method: can rigorously prove existence and non-existence of roots, based on Brouwer's fixed point theorem. Combined with bisection it enables global root search.
  • Interval extensions of elementary functions: monotonic functions reduce to endpoint evaluation, whereas $x^n$ and trigonometric functions require detection of extrema within the interval. Implementing $x^2$ as a dedicated function avoids the dependency-based overestimation.
  • Interval matrix arithmetic: the solution set of an interval linear system is in general non-convex and is characterised by the Oettli–Prager theorem. Computing the smallest enclosing interval is NP-hard, but the Krawczyk and Rump methods give practical enclosures.
  • Constraint propagation: contracts intervals in both directions using equality and inequality constraints. Indispensable for accelerating global optimisation and constraint satisfaction problems.
  • Directed rounding: exploits the IEEE 754 rounding modes to guarantee inclusion in a floating-point environment. Integration with MPFR is ideal.
  • Major libraries: Arb (ball arithmetic), INTLAB (MATLAB), Boost.Interval (C++), filib++ (performance-oriented), mpfi (arbitrary precision).