Float Trigonometric Function Algorithms
Overview
The calx Float provides the following 10 trigonometric and hyperbolic functions:
| Category | Functions |
|---|---|
| Trigonometric | sin, cos, tan |
| Inverse trigonometric | asin, acos, atan, atan2 |
| Hyperbolic | sinh, cosh, tanh |
The fundamental strategy is a three-stage pipeline common to all functions:
- Argument reduction: shrink the input to a small range where the Taylor series converges quickly
- Taylor series: evaluate the series using Paterson-Stockmeyer evaluation
- Doubling reconstruction: repeatedly apply double-angle formulas to restore the original scale
Dispatch is performed based on input precision, using fast paths for low precision:
| Precision | Method |
|---|---|
| $\leq 19$ digits (64 bit) | Q128 fixed-point fast path |
| $\leq 38$ digits (128 bit) | Q256 fixed-point or lightweight Float |
| $> 38$ digits | RawFloat + Paterson-Stockmeyer Taylor |
Inverse trigonometric functions reduce to atan:
- $\operatorname{asin}(x) = \operatorname{atan}\!\bigl(x / \sqrt{1 - x^2}\bigr)$
- $\operatorname{acos}(x) = \pi/2 - \operatorname{asin}(x)$
sin and cos
Argument Reduction
The input $x$ is reduced to the range $[0, \pi/2]$:
- Reduction by $2\pi$: $x \bmod 2\pi$ brings $x$ into $[0, 2\pi)$
- Quadrant determination: $[0, 2\pi)$ is divided into 4 quadrants to determine sign and function correspondence
- Reduction to $[0, \pi/2]$: symmetry is exploited to project into $[0, \pi/2]$
The quadrant also determines whether to use the sin Taylor series or the cos Taylor series. In the 1st and 3rd quadrants, the sin Taylor series is used; in the 2nd and 4th quadrants, the cos Taylor series is used.
Taylor Series
With $u = x^2$:
$$\sin(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k+1)!} = x \left(1 - \frac{u}{3!} + \frac{u^2}{5!} - \cdots \right)$$
$$\cos(x) = \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{(2k)!} = 1 - \frac{u}{2!} + \frac{u^2}{4!} - \cdots$$
Paterson-Stockmeyer Evaluation
Evaluating a degree-$n$ polynomial with Horner's method requires $O(n)$ multiplications, but the Paterson-Stockmeyer method reduces this to $O(\sqrt{n})$.
The polynomial $P(u) = c_0 + c_1 u + c_2 u^2 + \cdots + c_n u^n$ is partitioned into blocks of size $l$. First, a power table $u, u^2, \ldots, u^{l-1}$ is built, and $u^l$ serves as the base for Horner-style evaluation of the outer polynomial:
$$P(u) = \bigl((\cdots(B_{s} \cdot u^l + B_{s-1}) \cdot u^l + B_{s-2}) \cdots\bigr) \cdot u^l + B_0$$
Each block $B_i = c_{il} + c_{il+1} u + \cdots + c_{il+l-1} u^{l-1}$ can be evaluated in $O(l)$ multiplications using the precomputed power table. The overall cost is $O(l + n/l) = O(\sqrt{n})$ multiplications (minimized when $l = \sqrt{n}$).
Computing sin via cos (High-Precision Path)
Instead of directly expanding sin as a Taylor series, the following procedure is used:
- Compute $\cos(\theta)$ via cosDoubling (a single Taylor series evaluation)
- Obtain $\sin(\theta) = \sqrt{1 - \cos^2(\theta)}$
Since only a single Taylor series evaluation is required, this is faster than independently expanding sin and cos.
Catastrophic cancellation countermeasure: When $|\theta|$ is small, $\cos(\theta) \approx 1$, causing significant loss of effective bits in $1 - \cos^2(\theta)$. To prevent this, guard bits are added proportional to precision.
Doubling Reconstruction
After evaluating the Taylor series at the $K$-times halved angle $\theta / 2^K$, the double-angle formula is applied $K$ times to reconstruct the original scale.
Double-Angle Formulas
$$\sin(2\theta) = 2 \sin(\theta) \cos(\theta)$$
$$\cos(2\theta) = 2\cos^2(\theta) - 1$$
cosDoubling
When reconstructing cos only:
- Halve the argument $K$ times: $\theta' = \theta / 2^K$
- Compute $\cos(\theta')$ via cosTaylor
- Iterate $K$ times: $c \leftarrow 2c^2 - 1$
Each step requires only one squaring plus constant-time operations; the squaring benefits from
mpn::square's symmetry optimization.
sinCosDoubling
When simultaneously reconstructing the $(\sin, \cos)$ pair:
- Halve the argument $K$ times: $\theta' = \theta / 2^K$
- Compute $(\sin(\theta'), \cos(\theta'))$ via sinTaylor and cosTaylor
- Iterate $K$ times: $s' \leftarrow 2sc$, $c' \leftarrow 2c^2 - 1$, $(s, c) \leftarrow (s', c')$
Q128/Q256 Fast Paths
sincos_small ($\leq 19$ digits)
Completely bypasses Float object construction, computing in Q128 fixed-point arithmetic.
- Argument reduction: $z = x - k \cdot (\pi/2)$ reduces $|z| < \pi/4$
- Tiny value handling: when $|z| < 2^{-64}$, return $\sin(z) \approx z$, $\cos(z) \approx 1$ immediately
- Versine for cancellation avoidance: Instead of computing $\cos(z)$ directly, compute the versine $v$: $$v = 1 - \cos(z) = \frac{z^2}{2!} - \frac{z^4}{4!} + \frac{z^6}{6!} - \cdots$$ Since $|z| < \pi/4$ implies $\cos(z) > 1/\sqrt{2} \approx 0.707$, $\cos(z) = 1 - v$ can always be represented accurately in 2 words
- Q128 Taylor: evaluate sin and versine coefficients in Q128 (128-bit fixed-point)
38-Digit Path
At $\leq 38$ digits of precision, sinTaylor / cosTaylor are called directly with a lightweight Float. The number of terms needed for convergence is small enough that Paterson-Stockmeyer overhead is unnecessary, and simple Horner evaluation suffices.
atan
Argument Reduction
Argument reduction for atan proceeds in two stages:
- Reciprocal transformation: when $|x| > 1$, $$\operatorname{atan}(x) = \frac{\pi}{2} - \operatorname{atan}\!\left(\frac{1}{x}\right)$$ reduces to $|x| < 1$
- Argument halving: the following identity is applied $K$ times to further shrink the argument: $$\operatorname{atan}(x) = 2 \cdot \operatorname{atan}\!\left(\frac{x}{1 + \sqrt{1 + x^2}}\right)$$
Taylor Series
With $u = x^2$:
$$\operatorname{atan}(x) = x \cdot \sum_{k=0}^{\infty} \frac{(-1)^k \, u^k}{2k+1} = x \left(1 - \frac{u}{3} + \frac{u^2}{5} - \cdots \right)$$
This is an odd-degree series with the same structure as sin/cos, so Paterson-Stockmeyer evaluation applies directly.
Q128 Fast Path ($\leq 19$ digits)
An initial approximation $y_0$ with $\sim$53-bit precision is obtained via std::atan(double),
then a single Newton correction is applied:
$$y_1 = y_0 + \frac{x \cos(y_0) - \sin(y_0)}{\cos(y_0) + x \sin(y_0)}$$
$\sin(y_0)$ and $\cos(y_0)$ are computed simultaneously using Q128 Taylor + doubling reconstruction. One Newton step doubles the precision, reaching 19 digits.
Q256 Fast Path ($\leq 38$ digits)
Same structure as the Q128 path, using Q256 sincos + Newton correction to achieve 38-digit precision.
Inverse Trigonometric Functions
Inverse trigonometric functions reduce to atan.
asin
$$\operatorname{asin}(x) = \operatorname{atan}\!\left(\frac{x}{\sqrt{1 - x^2}}\right)$$
Special values: $\operatorname{asin}(\pm 1) = \pm\pi/2$.
When $|x|$ is close to 1, computing $\sqrt{1 - x^2}$ suffers from catastrophic cancellation, so guard bits are added to maintain precision.
acos
$$\operatorname{acos}(x) = \frac{\pi}{2} - \operatorname{asin}(x)$$
Special values: $\operatorname{acos}(1) = 0$, $\operatorname{acos}(-1) = \pi$.
atan2
$\operatorname{atan2}(y, x)$ is the four-quadrant inverse tangent function. After determining the quadrant, it reduces to $\operatorname{atan}(y/x)$:
| Condition | Result |
|---|---|
| $x > 0$ | $\operatorname{atan}(y/x)$ |
| $x < 0,\; y \geq 0$ | $\operatorname{atan}(y/x) + \pi$ |
| $x < 0,\; y < 0$ | $\operatorname{atan}(y/x) - \pi$ |
| $x = 0,\; y > 0$ | $\pi / 2$ |
| $x = 0,\; y < 0$ | $-\pi / 2$ |
Q128/Q256 fast paths apply similarly.
Hyperbolic Functions
Taylor Series
The Taylor series for hyperbolic functions are identical to those of trigonometric functions but without alternating signs:
$$\sinh(x) = x + \frac{x^3}{3!} + \frac{x^5}{5!} + \cdots = x \cdot \sum_{k=0}^{\infty} \frac{u^k}{(2k+1)!}, \quad u = x^2$$
$$\cosh(x) = 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + \cdots = \sum_{k=0}^{\infty} \frac{u^k}{(2k)!}, \quad u = x^2$$
Since signs do not alternate, all terms are positive, and catastrophic cancellation does not occur.
sinhCoshDoubling
Argument halving and doubling reconstruction are performed just as with trigonometric functions. The double-angle formulas:
$$\sinh(2t) = 2 \sinh(t) \cosh(t)$$
$$\cosh(2t) = 2\cosh^2(t) - 1$$
The $(\sinh, \cosh)$ pair is simultaneously reconstructed via $K$ doubling steps. The structure is identical to sinCosDoubling.
Precision-Based Dispatch
| Precision | Method | Rationale |
|---|---|---|
| $\leq 19$ digits | Q128 fixed-point (sinh_small) | Q128 exp is fast |
| $> 19$ digits | sinhCoshDoubling | Taylor + $K$ doublings (no division required) |
- 19 digits or fewer: Q128 fixed-point exp enables fast computation.
- 20 digits or more: sinhCoshDoubling (Taylor + doubling reconstruction) is used. The exp-based approach $\sinh = (e^x - e^{-x})/2$ requires computing $e^{-x}$ via Float division (Newton reciprocal iteration), making it slower than sinhCoshDoubling.
tanh is computed as $\tanh(x) = \sinh(x) / \cosh(x)$. Since sinhCoshDoubling simultaneously yields the $(\sinh, \cosh)$ pair, only one division is added.
Design Decisions
Taylor Series vs. AGM-Based Methods
AGM (Arithmetic-Geometric Mean) based methods are known for trigonometric computation, but calx consistently employs Taylor series with argument reduction/doubling reconstruction. The reasons are as follows:
- Performance in practical precision ranges: AGM is asymptotically $O(M(n) \log n)$ (where $M(n)$ is $n$-bit multiplication cost), but has large constant factors. In practical precision ranges (hundreds to tens of thousands of digits), Taylor + Paterson-Stockmeyer is faster
- Code sharing: sin, cos, atan, sinh, and cosh all share the same structure (argument reduction, Paterson-Stockmeyer Taylor, doubling reconstruction), enabling shared internal power-table construction and polynomial evaluation routines
- Consistency with fast paths: Q128/Q256 fast paths also use Taylor series, so result continuity across precision thresholds is naturally guaranteed
References
- Brent, R. P. and Zimmermann, P. (2010). Modern Computer Arithmetic. Cambridge University Press. Chapter 4: Elementary and Special Functions.
- Muller, J.-M. et al. (2018). Handbook of Floating-Point Arithmetic. 2nd ed. Birkhäuser. Chapter 11: Evaluating Floating-Point Elementary Functions.
- Paterson, M. S. and Stockmeyer, L. J. (1973). "On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials". SIAM J. Comput., 2(1), 60–66.
Related article: Float Square Root Algorithm (integer reduction for sqrt, correct rounding)