What algorithms do calx trigonometric functions use?

Argument reduction shrinks the input to a small range, then a Taylor series (evaluated via Paterson-Stockmeyer) computes the result, followed by double-angle formula reconstruction to restore the original scale. For low precision (19 digits or fewer / 38 digits or fewer), Q128/Q256 fixed-point fast paths are used.

Why is sin computed via cos at high precision?

In the cos-via-sin computation, cosDoubling computes cos, and then sin = sqrt(1 - cos^2) yields sin. This requires evaluating only a single Taylor series, making it faster than independently expanding sin and cos. Guard bits are added to mitigate catastrophic cancellation.

How are the inverse trigonometric functions computed?

asin(x) = atan(x / sqrt(1 - x^2)) and acos(x) = pi/2 - asin(x), reducing both to atan. atan itself uses argument halving and a Taylor series with Paterson-Stockmeyer evaluation. At low precision, Q128/Q256 sincos + Newton correction fast paths are used.

Float Trigonometric Function Algorithms

Overview

The calx Float provides the following 10 trigonometric and hyperbolic functions:

Category	Functions
Trigonometric	`sin`, `cos`, `tan`
Inverse trigonometric	`asin`, `acos`, `atan`, `atan2`
Hyperbolic	`sinh`, `cosh`, `tanh`

The fundamental strategy is a three-stage pipeline common to all functions:

Argument reduction: shrink the input to a small range where the Taylor series converges quickly
Taylor series: evaluate the series using Paterson-Stockmeyer evaluation
Doubling reconstruction: repeatedly apply double-angle formulas to restore the original scale

Dispatch is performed based on input precision, using fast paths for low precision:

Precision	Method
$\leq 19$ digits (64 bit)	Q128 fixed-point fast path
$\leq 38$ digits (128 bit)	Q256 fixed-point or lightweight Float
$> 38$ digits	RawFloat + Paterson-Stockmeyer Taylor

Inverse trigonometric functions reduce to atan:

$\operatorname{asin}(x) = \operatorname{atan}\!\bigl(x / \sqrt{1 - x^2}\bigr)$
$\operatorname{acos}(x) = \pi/2 - \operatorname{asin}(x)$

sin and cos

Argument Reduction

The input $x$ is reduced to the range $[0, \pi/2]$:

Reduction by $2\pi$: $x \bmod 2\pi$ brings $x$ into $[0, 2\pi)$
Quadrant determination: $[0, 2\pi)$ is divided into 4 quadrants to determine sign and function correspondence
Reduction to $[0, \pi/2]$: symmetry is exploited to project into $[0, \pi/2]$

The quadrant also determines whether to use the sin Taylor series or the cos Taylor series. In the 1st and 3rd quadrants, the sin Taylor series is used; in the 2nd and 4th quadrants, the cos Taylor series is used.

Taylor Series

With $u = x^2$:

$$\sin(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k+1)!} = x \left(1 - \frac{u}{3!} + \frac{u^2}{5!} - \cdots \right)$$

$$\cos(x) = \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k)!} = 1 - \frac{u}{2!} + \frac{u^2}{4!} - \cdots$$

Paterson-Stockmeyer Evaluation

Evaluating a degree-$n$ polynomial with Horner's method requires $O(n)$ multiplications, but the Paterson-Stockmeyer method reduces this to $O(\sqrt{n})$.

The polynomial $P(u) = c_0 + c_1 u + c_2 u^2 + \cdots + c_n u^n$ is partitioned into blocks of size $l$. First, a power table $u, u^2, \ldots, u^{l-1}$ is built, and $u^l$ serves as the base for Horner-style evaluation of the outer polynomial:

$$P(u) = \bigl((\cdots(B_{s} \cdot u^l + B_{s-1}) \cdot u^l + B_{s-2}) \cdots\bigr) \cdot u^l + B_0$$

Each block $B_i = c_{il} + c_{il+1} u + \cdots + c_{il+l-1} u^{l-1}$ can be evaluated in $O(l)$ multiplications using the precomputed power table. The overall cost is $O(l + n/l) = O(\sqrt{n})$ multiplications (minimized when $l = \sqrt{n}$).

Computing sin via cos (High-Precision Path)

Instead of directly expanding sin as a Taylor series, the following procedure is used:

Compute $\cos(\theta)$ via cosDoubling (a single Taylor series evaluation)
Obtain $\sin(\theta) = \sqrt{1 - \cos^2(\theta)}$

Since only a single Taylor series evaluation is required, this is faster than independently expanding sin and cos.

Catastrophic cancellation countermeasure: When $|\theta|$ is small, $\cos(\theta) \approx 1$, causing significant loss of effective bits in $1 - \cos^2(\theta)$. To prevent this, guard bits are added proportional to precision.

Doubling Reconstruction

After evaluating the Taylor series at the $K$-times halved angle $\theta / 2^K$, the double-angle formula is applied $K$ times to reconstruct the original scale.

Double-Angle Formulas

$$\sin(2\theta) = 2 \sin(\theta) \cos(\theta)$$

$$\cos(2\theta) = 2\cos^2(\theta) - 1$$

cosDoubling

When reconstructing cos only:

Halve the argument $K$ times: $\theta' = \theta / 2^K$
Compute $\cos(\theta')$ via cosTaylor
Iterate $K$ times: $c \leftarrow 2c^2 - 1$

Each step requires only one squaring plus constant-time operations; the squaring benefits from mpn::square's symmetry optimization.

sinCosDoubling

When simultaneously reconstructing the $(\sin, \cos)$ pair:

Halve the argument $K$ times: $\theta' = \theta / 2^K$
Compute $(\sin(\theta'), \cos(\theta'))$ via sinTaylor and cosTaylor
Iterate $K$ times: $s' \leftarrow 2sc$, $c' \leftarrow 2c^2 - 1$, $(s, c) \leftarrow (s', c')$

Q128/Q256 Fast Paths

sincos_small ($\leq 19$ digits)

Completely bypasses Float object construction, computing in Q128 fixed-point arithmetic.

Argument reduction: $z = x - k \cdot (\pi/2)$ reduces $|z| < \pi/4$
Tiny value handling: when $|z| < 2^{-64}$, return $\sin(z) \approx z$, $\cos(z) \approx 1$ immediately
Versine for cancellation avoidance: Instead of computing $\cos(z)$ directly, compute the versine $v$: $$v = 1 - \cos(z) = \frac{z^2}{2!} - \frac{z^4}{4!} + \frac{z^6}{6!} - \cdots$$ Since $|z| < \pi/4$ implies $\cos(z) > 1/\sqrt{2} \approx 0.707$, $\cos(z) = 1 - v$ can always be represented accurately in 2 words
Q128 Taylor: evaluate sin and versine coefficients in Q128 (128-bit fixed-point)

38-Digit Path

At $\leq 38$ digits of precision, sinTaylor / cosTaylor are called directly with a lightweight Float. The number of terms needed for convergence is small enough that Paterson-Stockmeyer overhead is unnecessary, and simple Horner evaluation suffices.

atan

Argument Reduction

Argument reduction for atan proceeds in two stages:

Reciprocal transformation: when $|x| > 1$, $$\operatorname{atan}(x) = \frac{\pi}{2} - \operatorname{atan}\!\left(\frac{1}{x}\right)$$ reduces to $|x| < 1$
Argument halving: the following identity is applied $K$ times to further shrink the argument: $$\operatorname{atan}(x) = 2 \cdot \operatorname{atan}\!\left(\frac{x}{1 + \sqrt{1 + x^2}}\right)$$

Taylor Series

With $u = x^2$:

$$\operatorname{atan}(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{2k+1} = x \left(1 - \frac{u}{3} + \frac{u^2}{5} - \cdots \right)$$

This is an odd-degree series with the same structure as sin/cos, so Paterson-Stockmeyer evaluation applies directly.

Q128 Fast Path ($\leq 19$ digits)

An initial approximation $y_0$ with $\sim$53-bit precision is obtained via std::atan(double), then a single Newton correction is applied:

$$y_1 = y_0 + \frac{x \cos(y_0) - \sin(y_0)}{\cos(y_0) + x \sin(y_0)}$$

$\sin(y_0)$ and $\cos(y_0)$ are computed simultaneously using Q128 Taylor + doubling reconstruction. One Newton step doubles the precision, reaching 19 digits.

Q256 Fast Path ($\leq 38$ digits)

Same structure as the Q128 path, using Q256 sincos + Newton correction to achieve 38-digit precision.

Inverse Trigonometric Functions

Inverse trigonometric functions reduce to atan.

asin

$$\operatorname{asin}(x) = \operatorname{atan}\!\left(\frac{x}{\sqrt{1 - x^2}}\right)$$

Special values: $\operatorname{asin}(\pm 1) = \pm\pi/2$.

When $|x|$ is close to 1, computing $\sqrt{1 - x^2}$ suffers from catastrophic cancellation, so guard bits are added to maintain precision.

acos

$$\operatorname{acos}(x) = \frac{\pi}{2} - \operatorname{asin}(x)$$

Special values: $\operatorname{acos}(1) = 0$, $\operatorname{acos}(-1) = \pi$.

atan2

$\operatorname{atan2}(y, x)$ is the four-quadrant inverse tangent function. After determining the quadrant, it reduces to $\operatorname{atan}(y/x)$:

Condition	Result
$x > 0$	$\operatorname{atan}(y/x)$
$x < 0,\; y \geq 0$	$\operatorname{atan}(y/x) + \pi$
$x < 0,\; y < 0$	$\operatorname{atan}(y/x) - \pi$
$x = 0,\; y > 0$	$\pi / 2$
$x = 0,\; y < 0$	$-\pi / 2$

Q128/Q256 fast paths apply similarly.

Hyperbolic Functions

Taylor Series

The Taylor series for hyperbolic functions are identical to those of trigonometric functions but without alternating signs:

$$\sinh(x) = x + \frac{x^3}{3!} + \frac{x^5}{5!} + \cdots = x \cdot \sum_{k=0}^{\infty} \frac{u^k}{(2k+1)!}, \quad u = x^2$$

$$\cosh(x) = 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + \cdots = \sum_{k=0}^{\infty} \frac{u^k}{(2k)!}, \quad u = x^2$$

Since signs do not alternate, all terms are positive, and catastrophic cancellation does not occur.

sinhCoshDoubling

Argument halving and doubling reconstruction are performed just as with trigonometric functions. The double-angle formulas:

$$\sinh(2t) = 2 \sinh(t) \cosh(t)$$

$$\cosh(2t) = 2\cosh^2(t) - 1$$

The $(\sinh, \cosh)$ pair is simultaneously reconstructed via $K$ doubling steps. The structure is identical to sinCosDoubling.

Precision-Based Dispatch

Precision	Method	Rationale
$\leq 19$ digits	Q128 fixed-point (`sinh_small`)	Q128 exp is fast
$> 19$ digits	sinhCoshDoubling	Taylor + $K$ doublings (no division required)

19 digits or fewer: Q128 fixed-point exp enables fast computation.
20 digits or more: sinhCoshDoubling (Taylor + doubling reconstruction) is used. The exp-based approach $\sinh = (e^x - e^{-x})/2$ requires computing $e^{-x}$ via Float division (Newton reciprocal iteration), making it slower than sinhCoshDoubling.

tanh is computed as $\tanh(x) = \sinh(x) / \cosh(x)$. Since sinhCoshDoubling simultaneously yields the $(\sinh, \cosh)$ pair, only one division is added.

Design Decisions

Taylor Series vs. AGM-Based Methods

AGM (Arithmetic-Geometric Mean) based methods are known for trigonometric computation, but calx consistently employs Taylor series with argument reduction/doubling reconstruction. The reasons are as follows:

Performance in practical precision ranges: AGM is asymptotically $O(M(n) \log n)$ (where $M(n)$ is $n$-bit multiplication cost), but has large constant factors. In practical precision ranges (hundreds to tens of thousands of digits), Taylor + Paterson-Stockmeyer is faster
Code sharing: sin, cos, atan, sinh, and cosh all share the same structure (argument reduction, Paterson-Stockmeyer Taylor, doubling reconstruction), enabling shared internal power-table construction and polynomial evaluation routines
Consistency with fast paths: Q128/Q256 fast paths also use Taylor series, so result continuity across precision thresholds is naturally guaranteed

References

Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 4: Elementary and Special Functions.
Muller, J.-M. et al. (2018). Handbook of Floating-Point Arithmetic. 2nd ed. Birkhäuser. Chapter 11: Evaluating Floating-Point Elementary Functions.
Paterson, M. S. and Stockmeyer, L. J. (1973). "On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials". SIAM J. Comput., 2(1), 60–66.

Related article: Float Square Root Algorithm (integer reduction for sqrt, correct rounding)