Float Trigonometric Function Algorithms

Overview

The calx Float provides the following 10 trigonometric and hyperbolic functions:

CategoryFunctions
Trigonometricsin, cos, tan
Inverse trigonometricasin, acos, atan, atan2
Hyperbolicsinh, cosh, tanh

The fundamental strategy is a three-stage pipeline common to all functions:

  1. Argument reduction: shrink the input to a small range where the Taylor series converges quickly
  2. Taylor series: evaluate the series using Paterson-Stockmeyer evaluation
  3. Doubling reconstruction: repeatedly apply double-angle formulas to restore the original scale

Dispatch is performed based on input precision, using fast paths for low precision:

PrecisionMethod
$\leq 19$ digits (64 bit)Q128 fixed-point fast path
$\leq 38$ digits (128 bit)Q256 fixed-point or lightweight Float
$> 38$ digitsRawFloat + Paterson-Stockmeyer Taylor

Inverse trigonometric functions reduce to atan:

  • $\operatorname{asin}(x) = \operatorname{atan}\!\bigl(x / \sqrt{1 - x^2}\bigr)$
  • $\operatorname{acos}(x) = \pi/2 - \operatorname{asin}(x)$

sin and cos

Argument Reduction

The input $x$ is reduced to the range $[0, \pi/2]$:

  1. Reduction by $2\pi$: $x \bmod 2\pi$ brings $x$ into $[0, 2\pi)$
  2. Quadrant determination: $[0, 2\pi)$ is divided into 4 quadrants to determine sign and function correspondence
  3. Reduction to $[0, \pi/2]$: symmetry is exploited to project into $[0, \pi/2]$

The quadrant also determines whether to use the sin Taylor series or the cos Taylor series. In the 1st and 3rd quadrants, the sin Taylor series is used; in the 2nd and 4th quadrants, the cos Taylor series is used.

Taylor Series

With $u = x^2$:

$$\sin(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k+1)!} = x \left(1 - \frac{u}{3!} + \frac{u^2}{5!} - \cdots \right)$$

$$\cos(x) = \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k)!} = 1 - \frac{u}{2!} + \frac{u^2}{4!} - \cdots$$

Paterson-Stockmeyer Evaluation

Evaluating a degree-$n$ polynomial with Horner's method requires $O(n)$ multiplications, but the Paterson-Stockmeyer method reduces this to $O(\sqrt{n})$.

The polynomial $P(u) = c_0 + c_1 u + c_2 u^2 + \cdots + c_n u^n$ is partitioned into blocks of size $l$. First, a power table $u, u^2, \ldots, u^{l-1}$ is built, and $u^l$ serves as the base for Horner-style evaluation of the outer polynomial:

$$P(u) = \bigl((\cdots(B_{s} \cdot u^l + B_{s-1}) \cdot u^l + B_{s-2}) \cdots\bigr) \cdot u^l + B_0$$

Each block $B_i = c_{il} + c_{il+1} u + \cdots + c_{il+l-1} u^{l-1}$ can be evaluated in $O(l)$ multiplications using the precomputed power table. The overall cost is $O(l + n/l) = O(\sqrt{n})$ multiplications (minimized when $l = \sqrt{n}$).

Computing sin via cos (High-Precision Path)

Instead of directly expanding sin as a Taylor series, the following procedure is used:

  1. Compute $\cos(\theta)$ via cosDoubling (a single Taylor series evaluation)
  2. Obtain $\sin(\theta) = \sqrt{1 - \cos^2(\theta)}$

Since only a single Taylor series evaluation is required, this is faster than independently expanding sin and cos.

Catastrophic cancellation countermeasure: When $|\theta|$ is small, $\cos(\theta) \approx 1$, causing significant loss of effective bits in $1 - \cos^2(\theta)$. To prevent this, guard bits are added proportional to precision.

Doubling Reconstruction

After evaluating the Taylor series at the $K$-times halved angle $\theta / 2^K$, the double-angle formula is applied $K$ times to reconstruct the original scale.

Double-Angle Formulas

$$\sin(2\theta) = 2 \sin(\theta) \cos(\theta)$$

$$\cos(2\theta) = 2\cos^2(\theta) - 1$$

cosDoubling

When reconstructing cos only:

  1. Halve the argument $K$ times: $\theta' = \theta / 2^K$
  2. Compute $\cos(\theta')$ via cosTaylor
  3. Iterate $K$ times: $c \leftarrow 2c^2 - 1$

Each step requires only one squaring plus constant-time operations; the squaring benefits from mpn::square's symmetry optimization.

sinCosDoubling

When simultaneously reconstructing the $(\sin, \cos)$ pair:

  1. Halve the argument $K$ times: $\theta' = \theta / 2^K$
  2. Compute $(\sin(\theta'), \cos(\theta'))$ via sinTaylor and cosTaylor
  3. Iterate $K$ times: $s' \leftarrow 2sc$, $c' \leftarrow 2c^2 - 1$, $(s, c) \leftarrow (s', c')$

Q128/Q256 Fast Paths

sincos_small ($\leq 19$ digits)

Completely bypasses Float object construction, computing in Q128 fixed-point arithmetic.

  1. Argument reduction: $z = x - k \cdot (\pi/2)$ reduces $|z| < \pi/4$
  2. Tiny value handling: when $|z| < 2^{-64}$, return $\sin(z) \approx z$, $\cos(z) \approx 1$ immediately
  3. Versine for cancellation avoidance: Instead of computing $\cos(z)$ directly, compute the versine $v$: $$v = 1 - \cos(z) = \frac{z^2}{2!} - \frac{z^4}{4!} + \frac{z^6}{6!} - \cdots$$ Since $|z| < \pi/4$ implies $\cos(z) > 1/\sqrt{2} \approx 0.707$, $\cos(z) = 1 - v$ can always be represented accurately in 2 words
  4. Q128 Taylor: evaluate sin and versine coefficients in Q128 (128-bit fixed-point)

38-Digit Path

At $\leq 38$ digits of precision, sinTaylor / cosTaylor are called directly with a lightweight Float. The number of terms needed for convergence is small enough that Paterson-Stockmeyer overhead is unnecessary, and simple Horner evaluation suffices.

atan

Argument Reduction

Argument reduction for atan proceeds in two stages:

  1. Reciprocal transformation: when $|x| > 1$, $$\operatorname{atan}(x) = \frac{\pi}{2} - \operatorname{atan}\!\left(\frac{1}{x}\right)$$ reduces to $|x| < 1$
  2. Argument halving: the following identity is applied $K$ times to further shrink the argument: $$\operatorname{atan}(x) = 2 \cdot \operatorname{atan}\!\left(\frac{x}{1 + \sqrt{1 + x^2}}\right)$$

Taylor Series

With $u = x^2$:

$$\operatorname{atan}(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{2k+1} = x \left(1 - \frac{u}{3} + \frac{u^2}{5} - \cdots \right)$$

This is an odd-degree series with the same structure as sin/cos, so Paterson-Stockmeyer evaluation applies directly.

Q128 Fast Path ($\leq 19$ digits)

An initial approximation $y_0$ with $\sim$53-bit precision is obtained via std::atan(double), then a single Newton correction is applied:

$$y_1 = y_0 + \frac{x \cos(y_0) - \sin(y_0)}{\cos(y_0) + x \sin(y_0)}$$

$\sin(y_0)$ and $\cos(y_0)$ are computed simultaneously using Q128 Taylor + doubling reconstruction. One Newton step doubles the precision, reaching 19 digits.

Q256 Fast Path ($\leq 38$ digits)

Same structure as the Q128 path, using Q256 sincos + Newton correction to achieve 38-digit precision.

Inverse Trigonometric Functions

Inverse trigonometric functions reduce to atan.

asin

$$\operatorname{asin}(x) = \operatorname{atan}\!\left(\frac{x}{\sqrt{1 - x^2}}\right)$$

Special values: $\operatorname{asin}(\pm 1) = \pm\pi/2$.

When $|x|$ is close to 1, computing $\sqrt{1 - x^2}$ suffers from catastrophic cancellation, so guard bits are added to maintain precision.

acos

$$\operatorname{acos}(x) = \frac{\pi}{2} - \operatorname{asin}(x)$$

Special values: $\operatorname{acos}(1) = 0$, $\operatorname{acos}(-1) = \pi$.

atan2

$\operatorname{atan2}(y, x)$ is the four-quadrant inverse tangent function. After determining the quadrant, it reduces to $\operatorname{atan}(y/x)$:

ConditionResult
$x > 0$$\operatorname{atan}(y/x)$
$x < 0,\; y \geq 0$$\operatorname{atan}(y/x) + \pi$
$x < 0,\; y < 0$$\operatorname{atan}(y/x) - \pi$
$x = 0,\; y > 0$$\pi / 2$
$x = 0,\; y < 0$$-\pi / 2$

Q128/Q256 fast paths apply similarly.

Hyperbolic Functions

Taylor Series

The Taylor series for hyperbolic functions are identical to those of trigonometric functions but without alternating signs:

$$\sinh(x) = x + \frac{x^3}{3!} + \frac{x^5}{5!} + \cdots = x \cdot \sum_{k=0}^{\infty} \frac{u^k}{(2k+1)!}, \quad u = x^2$$

$$\cosh(x) = 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + \cdots = \sum_{k=0}^{\infty} \frac{u^k}{(2k)!}, \quad u = x^2$$

Since signs do not alternate, all terms are positive, and catastrophic cancellation does not occur.

sinhCoshDoubling

Argument halving and doubling reconstruction are performed just as with trigonometric functions. The double-angle formulas:

$$\sinh(2t) = 2 \sinh(t) \cosh(t)$$

$$\cosh(2t) = 2\cosh^2(t) - 1$$

The $(\sinh, \cosh)$ pair is simultaneously reconstructed via $K$ doubling steps. The structure is identical to sinCosDoubling.

Precision-Based Dispatch

PrecisionMethodRationale
$\leq 19$ digitsQ128 fixed-point (sinh_small)Q128 exp is fast
$> 19$ digitssinhCoshDoublingTaylor + $K$ doublings (no division required)
  • 19 digits or fewer: Q128 fixed-point exp enables fast computation.
  • 20 digits or more: sinhCoshDoubling (Taylor + doubling reconstruction) is used. The exp-based approach $\sinh = (e^x - e^{-x})/2$ requires computing $e^{-x}$ via Float division (Newton reciprocal iteration), making it slower than sinhCoshDoubling.

tanh is computed as $\tanh(x) = \sinh(x) / \cosh(x)$. Since sinhCoshDoubling simultaneously yields the $(\sinh, \cosh)$ pair, only one division is added.

Design Decisions

Taylor Series vs. AGM-Based Methods

AGM (Arithmetic-Geometric Mean) based methods are known for trigonometric computation, but calx consistently employs Taylor series with argument reduction/doubling reconstruction. The reasons are as follows:

  • Performance in practical precision ranges: AGM is asymptotically $O(M(n) \log n)$ (where $M(n)$ is $n$-bit multiplication cost), but has large constant factors. In practical precision ranges (hundreds to tens of thousands of digits), Taylor + Paterson-Stockmeyer is faster
  • Code sharing: sin, cos, atan, sinh, and cosh all share the same structure (argument reduction, Paterson-Stockmeyer Taylor, doubling reconstruction), enabling shared internal power-table construction and polynomial evaluation routines
  • Consistency with fast paths: Q128/Q256 fast paths also use Taylor series, so result continuity across precision thresholds is naturally guaranteed

References

  • Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 4: Elementary and Special Functions.
  • Muller, J.-M. et al. (2018). Handbook of Floating-Point Arithmetic. 2nd ed. Birkhäuser. Chapter 11: Evaluating Floating-Point Elementary Functions.
  • Paterson, M. S. and Stockmeyer, L. J. (1973). "On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials". SIAM J. Comput., 2(1), 60–66.

Related article: Float Square Root Algorithm (integer reduction for sqrt, correct rounding)