Float is the arbitrary-precision floating-point type in calx.
Internally, a value is represented as a triple of (sign, mantissa Int, exponent int64_t),
encoding a value $x$ as:
$$x = (-1)^{s} \times m \times 2^{e}$$
where $s$ is the sign bit, $m$ is the multiple-precision integer (Int) mantissa, and $e$ is a 64-bit integer exponent.
Precision is specified in decimal digits and internally converted to bit precision.
IEEE 754-compatible rounding modes (RoundingMode) are supported,
and NaN and Infinity propagate safely through all operations.
Arbitrary precision — No limit on the number of digits. Calculations with hundreds of thousands of digits are possible
NaN/Infinity safe — Special values propagate with IEEE 754-like semantics
thread_local constant cache — Mathematical constants such as $\pi, e, \log 2$ are cached per thread, avoiding recomputation at the same precision
Precision tracking — Effective bits (effectiveBits) and requested bits (requestedBits) are automatically managed
Build
Required Headers
#include <math/core/mp/Float.hpp> // Float class
#include <math/core/mp/Float/FloatMath.hpp> // cbrt, nthRoot, fma, fmma, sqr, logUi, etc.
Link library: calx_float (depends on calx_int).
Constructors
Signature
Description
Float()
Default construction. Value is 0
Float(int value)
Construct from int
Float(int64_t value)
Construct from 64-bit integer
Float(double value)
Construct from double (53-bit precision)
Float(std::string_view str)
Construct from string (decimal)
Float(const Int& value)
Construct from arbitrary-precision integer (exact)
The three-way comparison operator automatically generates !=, <, >, <=, >=.
Any comparison involving NaN returns std::partial_ordering::unordered (IEEE 754 compliant).
Basic Mathematical Functions
All are free functions in the calx namespace.
The precision argument specifies the working precision in decimal digits.
Move overloads are also provided but omitted from the tables.
Exponential / Logarithmic
Signature
Description
Float exp(const Float& x, int precision)
$e^x$
Float exp2(const Float& x, int precision)
$2^x$
Float exp10(const Float& x, int precision)
$10^x$
Float expm1(const Float& x, int precision)
$e^x - 1$ (high accuracy near $x \approx 0$)
Float log(const Float& x, int precision)
$\ln x$ (AGM method)
Float log2(const Float& x, int precision)
$\log_2 x$
Float log10(const Float& x, int precision)
$\log_{10} x$
Float log1p(const Float& x, int precision)
$\ln(1+x)$ (high accuracy near $x \approx 0$)
Float logUi(unsigned long long n, int precision)
Logarithm of an integer (high accuracy via prime factorization)
Trigonometric Functions
Signature
Description
Float sin(const Float& x, int precision)
$\sin x$
Float cos(const Float& x, int precision)
$\cos x$
Float tan(const Float& x, int precision)
$\tan x$
Float sinPi(const Float& x, int precision)
$\sin(\pi x)$ (exactly 0 at integer points)
Float cosPi(const Float& x, int precision)
$\cos(\pi x)$
Float tanPi(const Float& x, int precision)
$\tan(\pi x)$
Float sec(const Float& x, int precision)
$\sec x = 1/\cos x$
Float csc(const Float& x, int precision)
$\csc x = 1/\sin x$
Float cot(const Float& x, int precision)
$\cot x = \cos x/\sin x$
Code Example
int prec = 50;
Float x("0.5", prec);
Float s = sin(x, prec); // sin(0.5)
Float c = cos(x, prec); // cos(0.5)
Float t = tan(x, prec); // tan(0.5)
Inverse Trigonometric Functions
Signature
Description
Float asin(const Float& x, int precision)
$\arcsin x$
Float acos(const Float& x, int precision)
$\arccos x$
Float atan(const Float& x, int precision)
$\arctan x$ (Taylor + argument halving)
Float atan2(const Float& y, const Float& x, int precision)
$\mathrm{atan2}(y, x)$
Float asinPi(const Float& x, int precision)
$\arcsin(x) / \pi$
Float acosPi(const Float& x, int precision)
$\arccos(x) / \pi$
Float atanPi(const Float& x, int precision)
$\arctan(x) / \pi$
Code Example
int prec = 50;
Float x("0.5", prec);
Float a = asin(x, prec); // arcsin(0.5) = pi/6
Float b = acos(x, prec); // arccos(0.5) = pi/3
Float c = atan(x, prec); // arctan(0.5) = 0.4636...
Hyperbolic Functions
Signature
Description
Float sinh(const Float& x, int precision)
$\sinh x$
Float cosh(const Float& x, int precision)
$\cosh x$
Float tanh(const Float& x, int precision)
$\tanh x$
Float sech(const Float& x, int precision)
$\mathrm{sech}\,x = 1/\cosh x$
Float csch(const Float& x, int precision)
$\mathrm{csch}\,x = 1/\sinh x$
Float coth(const Float& x, int precision)
$\coth x = \cosh x/\sinh x$
Code Example
int prec = 50;
Float x("1.0", prec);
Float s = sinh(x, prec); // sinh(1) = 1.1752...
Float c = cosh(x, prec); // cosh(1) = 1.5430...
Float t = tanh(x, prec); // tanh(1) = 0.7615...
Inverse Hyperbolic Functions
Signature
Description
Float asinh(const Float& x, int precision)
$\mathrm{arcsinh}\,x$
Float acosh(const Float& x, int precision)
$\mathrm{arccosh}\,x$
Float atanh(const Float& x, int precision)
$\mathrm{arctanh}\,x$
Powers / Roots
Signature
Description
Float sqr(const Float& x, int precision)
$x^2$ (squaring with optimized algorithm)
Float pow(const Float& x, const Float& y, int precision)
$x^y$
Float pow(const Float& x, int n, int precision)
$x^n$ (integer power via binary exponentiation)
Float sqrt(const Float& x, int precision)
$\sqrt{x}$
Float cbrt(const Float& x, int precision)
$\sqrt[3]{x}$
Float nthRoot(const Float& x, int n, int precision)
$\sqrt[n]{x}$
Float recSqrt(const Float& x, int precision)
$1/\sqrt{x}$ (reciprocal square root)
Float hypot(const Float& x, const Float& y, int precision)
$\sqrt{x^2 + y^2}$ (overflow-safe)
Code Example
int prec = 50;
Float x("2.0", prec);
Float s = sqrt(x, prec); // sqrt(2) = 1.41421356...
Float c = cbrt(x, prec); // cbrt(2) = 1.25992104...
Float p = pow(x, Float("0.5", prec), prec); // 2^0.5 = sqrt(2)
Miscellaneous
Signature
Description
Float abs(const Float& x)
$|x|$
Float fma(const Float& a, const Float& b, const Float& c, int precision)
$ab + c$ (fused multiply-add)
Float fms(const Float& a, const Float& b, const Float& c, int precision)
$ab - c$ (fused multiply-subtract)
Float fmma(const Float& a, const Float& b, const Float& c, const Float& d, int precision)
$ab + cd$ (double fused multiply-add)
Float fmms(const Float& a, const Float& b, const Float& c, const Float& d, int precision)
$ab - cd$ (double fused multiply-subtract)
Float factorial(int n, int precision)
$n!$
void sinCos(const Float& x, Float& s, Float& c, int precision)
Compute $\sin x$ and $\cos x$ simultaneously
void sinhCosh(const Float& x, Float& s, Float& c, int precision)
Compute $\sinh x$ and $\cosh x$ simultaneously
Float agm(const Float& a, const Float& b, int precision)
Arithmetic-geometric mean $\mathrm{AGM}(a, b)$
Float sum(std::span<const Float> values, int precision)
High-precision summation
Float dot(std::span<const Float> a, std::span<const Float> b, int precision)
High-precision dot product
Error Functions / Gamma Functions
Signature
Description
Float erf(const Float& x, int precision)
Error function $\mathrm{erf}(x)$
Float erfc(const Float& x, int precision)
Complementary error function $\mathrm{erfc}(x) = 1 - \mathrm{erf}(x)$
Float erfcx(const Float& x, int precision)
Scaled complementary error function $e^{x^2}\,\mathrm{erfc}(x)$
Float gamma(const Float& x, int precision)
$\Gamma(x)$
Float lnGamma(const Float& x, int precision)
$\ln\Gamma(x)$
Float beta(const Float& a, const Float& b, int precision)
$B(a, b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$
Float digamma(const Float& x, int precision)
$\psi(x) = \Gamma'(x)/\Gamma(x)$
Float trigamma(const Float& x, int precision)
$\psi'(x)$
Float polygamma(int n, const Float& x, int precision)
$\psi^{(n)}(x)$
Float gammaP(const Float& a, const Float& x, int precision)
Regularized lower incomplete gamma $P(a,x)$
Float gammaQ(const Float& a, const Float& x, int precision)
Regularized upper incomplete gamma $Q(a,x)$
Float gammaLower(const Float& a, const Float& x, int precision)
Lower incomplete gamma $\gamma(a,x)$
Float gammaUpper(const Float& a, const Float& x, int precision)
Upper incomplete gamma $\Gamma(a,x)$
Code Example
int prec = 50;
Float x("1.0", prec);
Float e = erf(x, prec); // erf(1) = 0.84270079...
Float g = gamma(x, prec); // Gamma(1) = 1
Float lg = lnGamma(Float("10", prec), prec); // ln(9!) = 12.8018...
Rounding / Integer Part
All are free functions in the calx namespace. Move overloads are also provided.
Signature
Description
Float floor(const Float& x)
$\lfloor x \rfloor$ (floor function)
Float ceil(const Float& x)
$\lceil x \rceil$ (ceiling function)
Float round(const Float& x)
Round to nearest (half-away-from-zero: ties round away from zero)
Float roundEven(const Float& x)
Round to nearest even (banker's rounding: ties round to even)
Float trunc(const Float& x)
Truncation toward zero
Float frac(const Float& x)
Fractional part $x - \lfloor x \rfloor$
Float nearbyint(const Float& x)
Equivalent to round
Float rint(const Float& x)
Equivalent to round
Float modf(const Float& x, Float& iptr)
Stores integer part in iptr, returns fractional part
// thread_local cache makes repeated calls at the same precision fast
Float pi1 = Float::pi(500); // First call: computed
Float pi2 = Float::pi(500); // Second call: returned from cache instantly
Float pi3 = Float::pi(1000); // Different precision: recomputed
// Changing the rounding mode
Float::setRoundingMode(RoundingMode::TowardZero);
Float x("1.999");
x.setPrecision(3); // Rounded with TowardZero
Related Mathematical Background
The following articles explain the mathematical concepts underlying the Float class.
calx Float is an arbitrary-precision floating-point type and differs in design goals from IEEE 754 fixed-width formats (float/double).
The following table summarizes which IEEE 754 features Float supports and which are intentionally omitted.
IEEE 754 Feature
calx Float
Notes
5 rounding modes
Implemented
Selectable per thread via setRoundingMode().
Uses guard bit + sticky bit for correct rounding decisions
NaN / ±Infinity
Fully supported
Generation, propagation, and comparison semantics implemented
Signed zero (±0)
Representable
However, x - x always returns +0. IEEE 754 mandates -0 under roundTowardNegative, but Float does not make this distinction
Subnormals (denormals)
Not applicable
IEEE 754 subnormals arise from the implicit leading 1 (hidden bit).
Float stores the full significand explicitly as an Int, so there is no hidden bit and the concept of subnormals does not apply.
The exponent range is ±260, so underflow does not occur in normal use.
subnormalize() is provided as an MPFR-compatible emulation feature
Exception flags
Infrastructure only
Flags such as FE_INEXACT are defined but not raised by arithmetic operators.
Only subnormalize() raises them
Rounding Strategy: Per-Operation Design
IEEE 754 (and MPFR) round every operation to the target precision.
calx Float uses a different rounding strategy depending on the type of operation.
Operation
Rounding
Design Rationale
Multiplication / Division
Rounded to requested_bits_
Multiplying two significands of the same bit length nearly doubles the result length, so without rounding, costs grow exponentially
Addition / Subtraction
No rounding
Addition only extends the significand by the exponent difference, and the extra low-order bits serve as guard bits for subsequent operations.
Rounding after every addition would discard these guard bits, causing rounding errors to accumulate in long summation chains.
When the exponent difference exceeds max(requested_bits, 1000) + 64, the smaller operand is discarded, preventing unbounded significand growth
setPrecision()
Rounded to specified precision
Explicit user control. Use after additions when a specific precision is needed
With this design, in an expression like (a + b) * c, the extra bits from addition act as guard bits for the multiplication,
yielding higher accuracy at the same requested_bits_ than the MPFR approach (round every operation).
To achieve equivalent accuracy with MPFR, one must allocate additional working precision.
Global precision (mp.prec); every operation rounds to that precision
calx Float
Two-value tracking: requested precision (requested_bits_) and effective precision (effective_bits_).
Multiplication and division round to the requested precision; addition and subtraction preserve guard bits to improve accuracy in subsequent operations