What is the Float class in calx?

calx::Float is an arbitrary-precision floating-point class implemented in C++23. It internally represents values as (sign, mantissa Int, exponent int64_t) and provides all mathematical functions, safe NaN/Infinity propagation, and thread_local constant caching.

How are NaN and Infinity handled?

calx::Float supports IEEE 754-style special values (NaN, +Infinity, -Infinity). Division by zero returns Infinity, and undefined operations return NaN. NaN propagates safely through all operations.

How does the constant cache work?

Mathematical constants such as pi, e, and log2 are cached per thread using thread_local storage. When the same precision is requested again, the cached value is returned immediately without recomputation.

Which algorithms are used for mathematical functions?

Pi uses Chudnovsky (binary splitting), e uses factorial summation (binary splitting), log2/log10 use atanh series, Euler-Mascheroni uses Brent-McMillan, and general logarithm uses a 4-tier dispatch (Halley iteration, fixed-point atanh, Halley+expDoubling, AGM) based on precision.

Float — Arbitrary-Precision Floating Point

Overview

Float is the arbitrary-precision floating-point type in calx. Internally, a value is represented as a triple of (sign, mantissa Int, exponent int64_t), encoding a value $x$ as:

$$x = (-1)^{s} \times m \times 2^{e}$$

where $s$ is the sign bit, $m$ is the multiple-precision integer (Int) mantissa, and $e$ is a 64-bit integer exponent.

Precision is specified in decimal digits and internally converted to bit precision. IEEE 754-compatible rounding modes (RoundingMode) are supported, and NaN and Infinity propagate safely through all operations.

Arbitrary precision — No limit on the number of digits. Calculations with hundreds of thousands of digits are possible
NaN/Infinity safe — Special values propagate with IEEE 754-like semantics
thread_local constant cache — Mathematical constants such as $\pi, e, \log 2$ are cached per thread, avoiding recomputation at the same precision
Precision tracking — Effective bits (effectiveBits) and requested bits (requestedBits) are automatically managed

Build

Required Headers

#include <math/core/mp/Float.hpp>          // Float class
#include <math/core/mp/Float/FloatMath.hpp> // cbrt, nthRoot, fma, fmma, sqr, logUi, etc.

Link library: calx_float (depends on calx_int).

Constructors

Signature	Description
`Float()`	Default construction. Value is 0
`Float(int value)`	Construct from int
`Float(int64_t value)`	Construct from 64-bit integer
`Float(double value)`	Construct from double (53-bit precision)
`Float(std::string_view str)`	Construct from string (decimal)
`Float(const Int& value)`	Construct from arbitrary-precision integer (exact)
`Float(const Int& mantissa, int64_t exponent, bool is_negative = false)`	Construct from mantissa + exponent
`Float(Int&& mantissa, int64_t exponent, bool is_negative = false)`	Construct from mantissa (move) + exponent
`Float(int64_t mantissa, int64_t exponent, bool is_negative = false)`	Construct from integer mantissa + exponent

Copy/move constructors and assignment operators are all provided as default. Assignment operators from int64_t and double are also available.

Special Value Factory

All are static member functions.

Signature	Description
`Float positiveInfinity()`	Returns $+\infty$
`Float negativeInfinity()`	Returns $-\infty$
`Float nan()`	Returns NaN
`Float epsilon(int precision)`	Machine epsilon for the given precision
`Float zero(int precision)`	Zero with the given precision
`Float one(int precision)`	One with the given precision

If precision is omitted, defaultPrecision() (thread-local) is used.

Mathematical Constants

All are static member functions. A thread_local cache avoids recomputation at the same precision.

Signature	Description	Algorithm
`Float pi(int precision)`	Pi $\pi \approx 3.1416$	Chudnovsky (binary splitting)
`Float e(int precision)`	Euler's number $e \approx 2.7183$	$\sum 1/n!$ (binary splitting)
`Float log2(int precision)`	$\ln 2 \approx 0.6931$	atanh series
`Float log10(int precision)`	$\ln 10 \approx 2.3026$	atanh series
`Float euler(int precision)`	Euler-Mascheroni constant $\gamma \approx 0.5772$	Brent-McMillan
`Float catalan(int precision)`	Catalan's constant $G \approx 0.9160$	Euler series

Trigonometric / Circle Constants

Signature	Description
`Float::half_pi(int precision)`	$\pi/2 \approx 1.5708$
`Float::quarter_pi(int precision)`	$\pi/4 \approx 0.7854$
`Float::two_pi(int precision)`	$2\pi \approx 6.2832$
`Float::inv_pi(int precision)`	$1/\pi \approx 0.3183$
`Float::two_inv_pi(int precision)`	$2/\pi \approx 0.6366$
`Float::pi_squared_over_6(int precision)`	$\pi^2/6 = \zeta(2) \approx 1.6449$
`Float::pi_squared_over_12(int precision)`	$\pi^2/12 \approx 0.8225$
`Float::inv_sqrt_pi(int precision)`	$1/\sqrt{\pi} \approx 0.5642$
`Float::two_inv_sqrt_pi(int precision)`	$2/\sqrt{\pi} \approx 1.1284$
`Float::degree(int precision)`	$\pi/180 \approx 0.01745$

Square Root Constants

Signature	Description
`Float::sqrt2(int precision)`	$\sqrt{2} \approx 1.4142$
`Float::sqrt3(int precision)`	$\sqrt{3} \approx 1.7321$
`Float::sqrt5(int precision)`	$\sqrt{5} \approx 2.2361$
`Float::inv_sqrt2(int precision)`	$1/\sqrt{2} \approx 0.7071$
`Float::cbrt2(int precision)`	$\sqrt[3]{2} \approx 1.2599$

Logarithmic Constants

Signature	Description
`Float::ln3(int precision)`	$\ln 3 \approx 1.0986$
`Float::ln5(int precision)`	$\ln 5 \approx 1.6094$
`Float::log2e(int precision)`	$\log_2 e \approx 1.4427$
`Float::log10e(int precision)`	$\log_{10} e \approx 0.4343$

Notable Constants

Signature	Description
`Float::phi(int precision)`	Golden ratio $(1+\sqrt{5})/2 \approx 1.6180$
`Float::lemniscate(int precision)`	Lemniscate constant $\varpi \approx 2.6221$
`Float::gamma14(int precision)`	$\Gamma(1/4) \approx 3.6256$
`Float::zeta3(int precision)`	Apéry's constant $\zeta(3) \approx 1.2021$
`Float::zeta5(int precision)`	$\zeta(5) \approx 1.0369$
`Float::zeta7(int precision)`	$\zeta(7) \approx 1.0083$
`Float::glaisher(int precision)`	Glaisher-Kinkelin constant $A \approx 1.2824$
`Float::khinchin(int precision)`	Khinchin's constant $K \approx 2.6854$
`Float::omega(int precision)`	$\Omega$ (Lambert $W(1)$) $\approx 0.5671$
`Float::sin1(int precision)`	$\sin 1 \approx 0.8415$
`Float::cos1(int precision)`	$\cos 1 \approx 0.5403$

Rare Mathematical Constants

Signature	Description
`Float::plastic(int precision)`	Plastic number $\approx 1.3247$
`Float::twin_prime(int precision)`	Twin prime constant $C_2 \approx 0.6602$
`Float::landau_ramanujan(int precision)`	Landau-Ramanujan constant $\approx 0.7642$
`Float::meissel_mertens(int precision)`	Meissel-Mertens constant $\approx 0.2615$
`Float::bernstein(int precision)`	Bernstein's constant $\approx 0.2801$
`Float::gauss_kuzmin(int precision)`	Gauss-Kuzmin-Wirsing constant $\approx 0.3037$
`Float::feigenbaum_delta(int precision)`	Feigenbaum $\delta \approx 4.6692$
`Float::feigenbaum_alpha(int precision)`	Feigenbaum $\alpha \approx 2.5029$
`Float::erdos_borwein(int precision)`	Erdős-Borwein constant $\approx 1.6066$
`Float::laplace_limit(int precision)`	Laplace limit constant $\approx 0.6627$
`Float::soldner(int precision)`	Ramanujan-Soldner constant $\mu \approx 1.4513$
`Float::backhouse(int precision)`	Backhouse's constant $\approx 1.4560$
`Float::porter(int precision)`	Porter's constant $\approx 1.4670$
`Float::lieb_square_ice(int precision)`	Lieb's square ice constant $\approx 1.5396$
`Float::niven(int precision)`	Niven's constant $\approx 1.7052$
`Float::reciprocal_fibonacci(int precision)`	Reciprocal Fibonacci constant $\approx 3.3599$
`Float::sierpinski(int precision)`	Sierpiński's constant $\approx 2.5849$
`Float::mills(int precision)`	Mills' constant $\approx 1.3064$
`Float::dottie(int precision)`	Dottie number $\approx 0.7391$
`Float::golomb_dickman(int precision)`	Golomb-Dickman constant $\approx 0.6243$
`Float::salem(int precision)`	Salem constant $\approx 1.1762$
`Float::cahen(int precision)`	Cahen's constant $\approx 0.6434$
`Float::levy(int precision)`	Lévy's constant $\approx 3.2758$
`Float::copeland_erdos(int precision)`	Copeland-Erdős constant $\approx 0.2357$
`Float::egamma_exp(int precision)`	$e^{\gamma} \approx 1.7811$

Code Example

Float::setDefaultPrecision(100);
Float pi = Float::pi();       // pi = 3.14159265...
Float e  = Float::e();        // e = 2.71828182...
Float g  = Float::euler();    // gamma = 0.57721566... (Euler-Mascheroni)
Float s2 = Float::sqrt2();    // sqrt(2) = 1.41421356...

Precision Control

Signature	Description
`Float& setPrecision(int precision)`	Set the number of significant digits (with rounding). Returns a reference to self
`int precision() const`	Get the current number of significant digits
`int effectiveBits() const`	Get the number of reliable bits
`int requestedBits() const`	Get the target number of bits
`void truncateToApprox(int precision)`	Fast word-level approximate truncation (for intermediate computations)
`static int precisionToBits(int precision)`	Decimal digits → bits
`static int bitsToPrecision(int bits)`	Bits → decimal digits
`static int defaultPrecision()`	Get the thread-local default precision
`static void setDefaultPrecision(int precision)`	Set the thread-local default precision

Rounding Modes

enum class RoundingMode {
    ToNearest,      // Round to nearest (default)
    TowardZero,     // Round toward zero (truncation)
    TowardPositive, // Round toward positive infinity (ceiling)
    TowardNegative, // Round toward negative infinity (floor)
    AwayFromZero    // Round away from zero
};

Signature	Description
`static RoundingMode roundingMode()`	Get the current rounding mode
`static void setRoundingMode(RoundingMode mode)`	Set the rounding mode

State Inspection

Signature	Description
`bool isNaN() const`	`true` if NaN
`bool isInfinity() const`	`true` if $\pm\infty$
`bool isZero() const`	`true` if zero
`bool isNegative() const`	`true` if negative
`bool isPositive() const`	`true` if non-negative (`!isNegative()`)
`bool isExact() const`	`true` if the value is exact (derived from an integer)
`bool isInteger() const`	`true` if the value is an integer (NaN/Infinity returns `false`)
`bool fitsInt() const`	Whether the value fits in int range
`bool fitsInt64() const`	Whether the value fits in int64_t range
`bool fitsDouble() const`	Whether the value fits in double range

Arithmetic Operators

Signature	Description
`Float operator+(const Float&, const Float&)`	Addition
`Float operator-(const Float&, const Float&)`	Subtraction
`Float operator*(const Float&, const Float&)`	Multiplication
`Float operator/(const Float&, const Float&)`	Division
`Float operator/(const Float&, int64_t)`	Integer division (fast path)
`Float& operator+=(const Float&)`	Addition assignment
`Float& operator-=(const Float&)`	Subtraction assignment
`Float& operator*=(const Float&)`	Multiplication assignment
`Float& operator/=(const Float&)`	Division assignment
`Float operator-(const Float&)`	Unary minus (sign flip)
`Float operator<<(const Float&, int)`	Left shift ($\times 2^n$)
`Float operator>>(const Float&, int)`	Right shift ($\div 2^n$)

Based on the precision propagation policy (default: MAX_PROPAGATION), the result's requestedBits is max(lhs, rhs) and effectiveBits is min(lhs, rhs).

Comparison Operators

Signature	Description
`bool operator==(const Float&, const Float&)`	Equality comparison
`std::partial_ordering operator<=>(const Float&, const Float&)`	Three-way comparison (C++20)

The three-way comparison operator automatically generates !=, <, >, <=, >=. Any comparison involving NaN returns std::partial_ordering::unordered (IEEE 754 compliant).

Basic Mathematical Functions

All are free functions in the calx namespace. The precision argument specifies the working precision in decimal digits. Move overloads are also provided but omitted from the tables.

Exponential / Logarithmic

Signature	Description
`Float exp(const Float& x, int precision)`	$e^x$
`Float exp2(const Float& x, int precision)`	$2^x$
`Float exp10(const Float& x, int precision)`	$10^x$
`Float expm1(const Float& x, int precision)`	$e^x - 1$ (high accuracy near $x \approx 0$)
`Float log(const Float& x, int precision)`	$\ln x$ (AGM method)
`Float log2(const Float& x, int precision)`	$\log_2 x$
`Float log10(const Float& x, int precision)`	$\log_{10} x$
`Float log1p(const Float& x, int precision)`	$\ln(1+x)$ (high accuracy near $x \approx 0$)
`Float logUi(unsigned long long n, int precision)`	Logarithm of an integer (high accuracy via prime factorization)

Trigonometric Functions

Signature	Description
`Float sin(const Float& x, int precision)`	$\sin x$
`Float cos(const Float& x, int precision)`	$\cos x$
`Float tan(const Float& x, int precision)`	$\tan x$
`Float sinPi(const Float& x, int precision)`	$\sin(\pi x)$ (exactly 0 at integer points)
`Float cosPi(const Float& x, int precision)`	$\cos(\pi x)$
`Float tanPi(const Float& x, int precision)`	$\tan(\pi x)$
`Float sec(const Float& x, int precision)`	$\sec x = 1/\cos x$
`Float csc(const Float& x, int precision)`	$\csc x = 1/\sin x$
`Float cot(const Float& x, int precision)`	$\cot x = \cos x/\sin x$

Code Example

int prec = 50;
Float x("0.5", prec);
Float s = sin(x, prec);   // sin(0.5)
Float c = cos(x, prec);   // cos(0.5)
Float t = tan(x, prec);   // tan(0.5)

Inverse Trigonometric Functions

Signature	Description
`Float asin(const Float& x, int precision)`	$\arcsin x$
`Float acos(const Float& x, int precision)`	$\arccos x$
`Float atan(const Float& x, int precision)`	$\arctan x$ (Taylor + argument halving)
`Float atan2(const Float& y, const Float& x, int precision)`	$\mathrm{atan2}(y, x)$
`Float asinPi(const Float& x, int precision)`	$\arcsin(x) / \pi$
`Float acosPi(const Float& x, int precision)`	$\arccos(x) / \pi$
`Float atanPi(const Float& x, int precision)`	$\arctan(x) / \pi$

Code Example

int prec = 50;
Float x("0.5", prec);
Float a = asin(x, prec);  // arcsin(0.5) = pi/6
Float b = acos(x, prec);  // arccos(0.5) = pi/3
Float c = atan(x, prec);  // arctan(0.5) = 0.4636...

Hyperbolic Functions

Signature	Description
`Float sinh(const Float& x, int precision)`	$\sinh x$
`Float cosh(const Float& x, int precision)`	$\cosh x$
`Float tanh(const Float& x, int precision)`	$\tanh x$
`Float sech(const Float& x, int precision)`	$\mathrm{sech}\,x = 1/\cosh x$
`Float csch(const Float& x, int precision)`	$\mathrm{csch}\,x = 1/\sinh x$
`Float coth(const Float& x, int precision)`	$\coth x = \cosh x/\sinh x$

Code Example

int prec = 50;
Float x("1.0", prec);
Float s = sinh(x, prec);  // sinh(1) = 1.1752...
Float c = cosh(x, prec);  // cosh(1) = 1.5430...
Float t = tanh(x, prec);  // tanh(1) = 0.7615...

Inverse Hyperbolic Functions

Signature	Description
`Float asinh(const Float& x, int precision)`	$\mathrm{arcsinh}\,x$
`Float acosh(const Float& x, int precision)`	$\mathrm{arccosh}\,x$
`Float atanh(const Float& x, int precision)`	$\mathrm{arctanh}\,x$

Powers / Roots

Signature	Description
`Float sqr(const Float& x, int precision)`	$x^2$ (squaring with optimized algorithm)
`Float pow(const Float& x, const Float& y, int precision)`	$x^y$
`Float pow(const Float& x, int n, int precision)`	$x^n$ (integer power via binary exponentiation)
`Float sqrt(const Float& x, int precision)`	$\sqrt{x}$
`Float cbrt(const Float& x, int precision)`	$\sqrt[3]{x}$
`Float nthRoot(const Float& x, int n, int precision)`	$\sqrt[n]{x}$
`Float recSqrt(const Float& x, int precision)`	$1/\sqrt{x}$ (reciprocal square root)
`Float hypot(const Float& x, const Float& y, int precision)`	$\sqrt{x^2 + y^2}$ (overflow-safe)

Code Example

int prec = 50;
Float x("2.0", prec);
Float s = sqrt(x, prec);                  // sqrt(2) = 1.41421356...
Float c = cbrt(x, prec);                  // cbrt(2) = 1.25992104...
Float p = pow(x, Float("0.5", prec), prec); // 2^0.5 = sqrt(2)

Miscellaneous

Signature	Description
`Float abs(const Float& x)`	$\|x\|$
`Float fma(const Float& a, const Float& b, const Float& c, int precision)`	$ab + c$ (fused multiply-add)
`Float fms(const Float& a, const Float& b, const Float& c, int precision)`	$ab - c$ (fused multiply-subtract)
`Float fmma(const Float& a, const Float& b, const Float& c, const Float& d, int precision)`	$ab + cd$ (double fused multiply-add)
`Float fmms(const Float& a, const Float& b, const Float& c, const Float& d, int precision)`	$ab - cd$ (double fused multiply-subtract)
`Float factorial(int n, int precision)`	$n!$
`void sinCos(const Float& x, Float& s, Float& c, int precision)`	Compute $\sin x$ and $\cos x$ simultaneously
`void sinhCosh(const Float& x, Float& s, Float& c, int precision)`	Compute $\sinh x$ and $\cosh x$ simultaneously
`Float agm(const Float& a, const Float& b, int precision)`	Arithmetic-geometric mean $\mathrm{AGM}(a, b)$
`Float sum(std::span<const Float> values, int precision)`	High-precision summation
`Float dot(std::span<const Float> a, std::span<const Float> b, int precision)`	High-precision dot product

Error Functions / Gamma Functions

Signature	Description
`Float erf(const Float& x, int precision)`	Error function $\mathrm{erf}(x)$
`Float erfc(const Float& x, int precision)`	Complementary error function $\mathrm{erfc}(x) = 1 - \mathrm{erf}(x)$
`Float erfcx(const Float& x, int precision)`	Scaled complementary error function $e^{x^2}\,\mathrm{erfc}(x)$
`Float gamma(const Float& x, int precision)`	$\Gamma(x)$
`Float lnGamma(const Float& x, int precision)`	$\ln\Gamma(x)$
`Float beta(const Float& a, const Float& b, int precision)`	$B(a, b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$
`Float digamma(const Float& x, int precision)`	$\psi(x) = \Gamma'(x)/\Gamma(x)$
`Float trigamma(const Float& x, int precision)`	$\psi'(x)$
`Float polygamma(int n, const Float& x, int precision)`	$\psi^{(n)}(x)$
`Float gammaP(const Float& a, const Float& x, int precision)`	Regularized lower incomplete gamma $P(a,x)$
`Float gammaQ(const Float& a, const Float& x, int precision)`	Regularized upper incomplete gamma $Q(a,x)$
`Float gammaLower(const Float& a, const Float& x, int precision)`	Lower incomplete gamma $\gamma(a,x)$
`Float gammaUpper(const Float& a, const Float& x, int precision)`	Upper incomplete gamma $\Gamma(a,x)$

Code Example

int prec = 50;
Float x("1.0", prec);
Float e = erf(x, prec);                     // erf(1) = 0.84270079...
Float g = gamma(x, prec);                   // Gamma(1) = 1
Float lg = lnGamma(Float("10", prec), prec); // ln(9!) = 12.8018...

Rounding / Integer Part

All are free functions in the calx namespace. Move overloads are also provided.

Signature	Description
`Float floor(const Float& x)`	$\lfloor x \rfloor$ (floor function)
`Float ceil(const Float& x)`	$\lceil x \rceil$ (ceiling function)
`Float round(const Float& x)`	Round to nearest (half-away-from-zero: ties round away from zero)
`Float roundEven(const Float& x)`	Round to nearest even (banker's rounding: ties round to even)
`Float trunc(const Float& x)`	Truncation toward zero
`Float frac(const Float& x)`	Fractional part $x - \lfloor x \rfloor$
`Float nearbyint(const Float& x)`	Equivalent to `round`
`Float rint(const Float& x)`	Equivalent to `round`
`Float modf(const Float& x, Float& iptr)`	Stores integer part in `iptr`, returns fractional part
`Float fmod(const Float& x, const Float& y)`	Floating-point remainder $x - \mathrm{trunc}(x/y) \cdot y$
`Float remainder(const Float& x, const Float& y)`	IEEE 754 remainder $x - \mathrm{roundEven}(x/y) \cdot y$
`std::pair<Float, int> remquo(const Float& x, const Float& y)`	`remainder` + signed low 3 bits of the quotient
`Float ldexp(const Float& x, int exp)`	$x \times 2^{\mathrm{exp}}$
`Float frexp(const Float& x, int* exp)`	Decompose into mantissa $[0.5, 1)$ and exponent
`Float scalbn(const Float& x, int n)`	Equivalent to `ldexp`
`int64_t ilogb(const Float& x)`	$\lfloor \log_2 \|x\| \rfloor$
`Float logb(const Float& x)`	$\lfloor \log_2 \|x\| \rfloor$ (returns as Float)

Utilities

Signature	Description
`Float fmin(const Float& a, const Float& b)`	Minimum ignoring NaN
`Float fmax(const Float& a, const Float& b)`	Maximum ignoring NaN
`Float fdim(const Float& a, const Float& b)`	$\max(a - b, 0)$
`Float copySign(const Float& x, const Float& y)`	Copy the sign of $y$ to $x$
`bool signBit(const Float& x)`	`true` if negative
`Float nextAbove(const Float& x)`	Smallest representable value greater than $x$
`Float nextBelow(const Float& x)`	Largest representable value less than $x$
`Float lerp(const Float& a, const Float& b, const Float& t, int precision)`	Linear interpolation $a + t(b - a)$
`Float midpoint(const Float& a, const Float& b)`	Midpoint $(a + b) / 2$
`void swap(Float& a, Float& b) noexcept`	Swap

String Conversion

Signature	Description
`std::string toString(int precision = -1) const`	Decimal scientific notation (`3.14e+0`)
`std::string toString(int base, int fracDigits) const`	Base-N representation (base 2-36). `toString(2, 8)` → `"11.01010101"`, `toString(16, 4)` → `"3.243f"`
`std::string toDecimalString(int precision = -1) const`	Decimal fixed-point notation
`std::string toScientificString(int precision = -1) const`	Scientific notation ($1.23 \times 10^4$ form)
`double toDouble() const`	Convert to double (possible precision loss)
`Int toInt() const`	Convert to Int (fractional part truncated)
`ostream& operator<<(ostream&, const Float&)`	Stream output
`istream& operator>>(istream&, Float&)`	Stream input

Internal Access

Signature	Description
`const Int& mantissa() const`	Reference to the mantissa (arbitrary-precision integer)
`int64_t exponent() const`	Exponent (binary exponent)
`bool isNegative() const`	Sign flag
`int bitLength() const`	Bit length of the mantissa

Usage Examples

Computing Pi

#include <math/core/mp/Float.hpp>
using namespace calx;

// Compute pi to 1000 digits (Chudnovsky algorithm)
Float::setDefaultPrecision(1000);
Float pi = Float::pi(1000);
std::cout << pi.toDecimalString(1000) << std::endl;
// 3.14159265358979323846264338327950288419716939937510...

Multi-Precision exp / log

#include <math/core/mp/Float.hpp>
using namespace calx;

int prec = 30;
Float::setDefaultPrecision(prec);
Float x("1.5");

Float ex = exp(x, prec);       // e^1.5
Float lx = log(ex, prec);      // ln(e^1.5) = 1.5
std::cout << "exp(1.5) = " << ex.toDecimalString(prec) << std::endl;
std::cout << "log(exp(1.5)) = " << lx.toDecimalString(prec) << std::endl;
// Matches 1.5 exactly at 30-digit precision

Safe NaN / Infinity Propagation

Float inf = Float::positiveInfinity();
Float nan = Float::nan();
Float z   = Float::zero();

Float r1 = inf + Float(1);     // +Inf
Float r2 = inf - inf;          // NaN
Float r3 = nan + Float(42);   // NaN
Float r4 = Float(1) / z;      // +Inf

std::cout << r1.isInfinity()  // true
          << r2.isNaN()       // true
          << r3.isNaN()       // true
          << r4.isInfinity(); // true

Precision Control and Constant Cache

// thread_local cache makes repeated calls at the same precision fast
Float pi1 = Float::pi(500);   // First call: computed
Float pi2 = Float::pi(500);   // Second call: returned from cache instantly
Float pi3 = Float::pi(1000);  // Different precision: recomputed

// Changing the rounding mode
Float::setRoundingMode(RoundingMode::TowardZero);
Float x("1.999");
x.setPrecision(3);  // Rounded with TowardZero

Related Mathematical Background

The following articles explain the mathematical concepts underlying the Float class.

IEEE 754 Floating-Point Standard — Standard floating-point representation
Machine Epsilon — Fundamentals of floating-point precision
Arbitrary-Precision Floating-Point — Internal representation of multi-precision floats
Arbitrary-Precision Float Arithmetic — Rounding, normalization, algorithms

Differences from IEEE 754

calx Float is an arbitrary-precision floating-point type and differs in design goals from IEEE 754 fixed-width formats (float/double). The following table summarizes which IEEE 754 features Float supports and which are intentionally omitted.

IEEE 754 Feature	calx Float	Notes
5 rounding modes	Implemented	Selectable per thread via `setRoundingMode()`. Uses guard bit + sticky bit for correct rounding decisions
NaN / ±Infinity	Fully supported	Generation, propagation, and comparison semantics implemented
Signed zero (±0)	Representable	However, `x - x` always returns `+0`. IEEE 754 mandates `-0` under `roundTowardNegative`, but Float does not make this distinction
Subnormals (denormals)	Not applicable	IEEE 754 subnormals arise from the implicit leading 1 (hidden bit). Float stores the full significand explicitly as an `Int`, so there is no hidden bit and the concept of subnormals does not apply. The exponent range is ±2⁶⁰, so underflow does not occur in normal use. `subnormalize()` is provided as an MPFR-compatible emulation feature
Exception flags	Infrastructure only	Flags such as `FE_INEXACT` are defined but not raised by arithmetic operators. Only `subnormalize()` raises them

Rounding Strategy: Per-Operation Design

IEEE 754 (and MPFR) round every operation to the target precision. calx Float uses a different rounding strategy depending on the type of operation.

Operation	Rounding	Design Rationale
Multiplication / Division	Rounded to `requested_bits_`	Multiplying two significands of the same bit length nearly doubles the result length, so without rounding, costs grow exponentially
Addition / Subtraction	No rounding	Addition only extends the significand by the exponent difference, and the extra low-order bits serve as guard bits for subsequent operations. Rounding after every addition would discard these guard bits, causing rounding errors to accumulate in long summation chains. When the exponent difference exceeds `max(requested_bits, 1000) + 64`, the smaller operand is discarded, preventing unbounded significand growth
`setPrecision()`	Rounded to specified precision	Explicit user control. Use after additions when a specific precision is needed

With this design, in an expression like (a + b) * c, the extra bits from addition act as guard bits for the multiplication, yielding higher accuracy at the same requested_bits_ than the MPFR approach (round every operation). To achieve equivalent accuracy with MPFR, one must allocate additional working precision.

Precision Management in Other Libraries

Library	Precision Management
MPFR	Fixed precision per result variable; every operation performs correct rounding to that precision
GMP (mpf)	Precision set per result variable; rounding mode guarantees are not provided
Arb (FLINT)	Ball arithmetic (midpoint ± error radius); every operation rigorously tracks the error radius
mpmath (Python)	Global precision (`mp.prec`); every operation rounds to that precision
calx Float	Two-value tracking: requested precision (`requested_bits_`) and effective precision (`effective_bits_`). Multiplication and division round to the requested precision; addition and subtraction preserve guard bits to improve accuracy in subsequent operations