Which pivoting strategies does sangi LU offer?

The PivotStrategy enumeration provides None, Partial, and Full. The default is Partial pivoting, which picks the column entry of maximum absolute value below the diagonal. Full pivoting searches the entire remaining submatrix and is more robust on ill-conditioned matrices. For Rational matrices, pivots are chosen by minimum height (max of numerator and denominator absolute values) rather than maximum magnitude to suppress denominator blow-up.

What is the Bareiss algorithm?

A division-free algorithm for the determinant of an integer matrix. Sylvester's identity guarantees that each apparent division during Gaussian elimination is exact, so intermediate values remain integers and the determinant is obtained in O(n^3) without introducing rationals. sangi's Matrix::determinant() dispatches to bareiss_determinant automatically when T is an integer type.

How does Cholesky differ from LDL?

Cholesky factorization A=LL^T requires symmetric positive definiteness and involves a square root. LDL factorization A=LDL^T fixes the diagonal of L to 1 and factors out a diagonal D, eliminating the square root and accommodating symmetric indefinite matrices. Both cost O(n^3/3), roughly half of LU. Use LDL for indefinite symmetric systems and Cholesky when positive definiteness is structurally guaranteed.

What is Wilkinson's growth factor?

The ratio of the largest intermediate matrix entry during Gaussian elimination to the largest input entry. With partial pivoting the worst-case bound is 2^(n-1), but in practice it stays within an order of magnitude. With full pivoting the bound is roughly O(n^(1/2) (n!)^(1/n)), and for symmetric positive definite matrices under Cholesky it equals 1. Small growth implies small backward error.

Where is incomplete Cholesky used?

As a preconditioner for Krylov iterative solvers such as conjugate gradient. Rather than the exact A=LL^T, a sparse approximation L_tilde is computed that preserves the sparsity pattern of A, and L_tilde L_tilde^T is used to approximate A^{-1}. This costs the same as a matrix-vector multiply per iteration and substantially improves CG convergence on ill-conditioned sparse systems.

LU, Cholesky, LDL Factorizations and Pivoting

Overview

The fundamental route to solving $A\mathbf{x} = \mathbf{b}$ is Gaussian elimination together with its factored form, LU decomposition. For symmetric matrices the cost is halved by Cholesky or LDL factorization, and choosing the right factorization for the structure of $A$ (positive definite, indefinite, sparse, integer-valued) simultaneously optimizes flops, memory, and numerical stability.

This page covers the mathematical structure of LU, Cholesky, and LDL factorizations; the meaning of pivoting strategies; the Bareiss algorithm for exact integer determinants; blocked LU as a BLAS-3 kernel; Bunch-Kaufman block-diagonal pivoting; and incomplete Cholesky preconditioners, then explains how sangi dispatches to each implementation.

For API usage, see LinAlg API — LU, Cholesky, LDL, Gaussian elimination, and pivot strategies.

Gaussian Elimination and LU

Gaussian elimination operates on the augmented matrix $[A \mid \mathbf{b}]$: forward elimination reduces it to upper triangular form, and back substitution recovers the solution. Writing the forward elimination as a product of matrices yields LU decomposition.

At stage $k$ the Frobenius transformation

$$M_k = I - \mathbf{m}_k \mathbf{e}_k^\top, \quad m_{ik} = a_{ik}^{(k)} / a_{kk}^{(k)} \;\; (i > k)$$

zeros the subdiagonal entries of column $k$. Composing $n-1$ such steps gives

$$M_{n-1} \cdots M_1 A = U \quad\Longleftrightarrow\quad A = (M_1^{-1} \cdots M_{n-1}^{-1}) U = LU,$$

where each $M_k^{-1} = I + \mathbf{m}_k \mathbf{e}_k^\top$ contributes one column of the unit lower-triangular factor $L$. Adding pivoting yields $PA = LU$ with $P$ a permutation.

Gaussian elimination and LU factorization are two notations for the same computation; both cost $O(n^3)$ flops. The advantage of returning $L$ and $U$ explicitly is reuse: when many right-hand sides share the same coefficient matrix, $A\mathbf{x}_i = \mathbf{b}_i$ can be solved in $O(n^2)$ per right-hand side via triangular substitution. sangi exposes both the factored form via lu_decomposition and the direct solve via gaussian_elimination.

Pivoting Strategies

When $a_{kk}^{(k)}$ vanishes or is small, the multipliers $m_{ik}$ explode and cancellation destroys the result. sangi offers three strategies via the PivotStrategy enumeration.

None

Uses the diagonal entry directly without swaps. Safe only for diagonally dominant or structurally non-singular matrices. Minimum overhead, but catastrophic on ill-conditioned inputs.

Partial Pivoting

At stage $k$, exchange row $k$ with the row containing the column-$k$ entry of largest magnitude on or below the diagonal:

$$p = \arg\max_{i \geq k} \, |a_{ik}^{(k)}|.$$

Per-stage cost is $O(n)$, total $O(n^2)$. In practice it suffices for almost all numerically reasonable matrices, and it is the default in sangi.

Full Pivoting

Search the entire remaining submatrix $A^{(k)}_{[k:n,k:n]}$ for the largest entry and swap both its row and column to position $(k,k)$:

$$(p, q) = \arg\max_{i,j \geq k} \, |a_{ij}^{(k)}|.$$

Column swaps require an additional permutation $Q$, with the solution recovered as $\mathbf{x} = Q (LU)^{-1} P \mathbf{b}$. Per-stage cost is $O((n-k)^2)$ and total $O(n^3)$ — roughly double partial pivoting. Useful as a safety net for nearly rank-deficient or pathologically ill-conditioned matrices.

Rook and Threshold Pivoting

Rook pivoting (named after the chess piece) picks an entry that is simultaneously the largest in its row and largest in its column. Its average cost approaches $O(n^2)$ while providing stability close to full pivoting. Threshold pivoting accepts the current candidate if its magnitude is at least $\tau$ times the maximum candidate ($0 < \tau \le 1$); this is the standard tool for controlling fill-in in sparse direct solvers. sangi uses Partial / Full for dense LU and threshold pivoting internally for sparse LU.

Rational-Type Specialization

When $T = $ Rational, the pivot is chosen by minimum height, where $h(p/q) = \max(|p|, |q|)$. Selecting low-height pivots prevents exponential growth of numerators and denominators during subsequent eliminations. Since rational arithmetic is exact, the goal here is computational cost rather than stability.

Wilkinson's Growth Factor and Stability

The ratio of the maximum intermediate entry to the maximum input entry,

$$\rho_n(A) = \frac{\max_{i,j,k} \, |a_{ij}^{(k)}|}{\max_{i,j} \, |a_{ij}|},$$

is called the growth factor. Backward error analysis (Wilkinson 1961) shows that the computed solution $\hat{\mathbf{x}}$ is the exact solution of a perturbed system $(A + \Delta A) \hat{\mathbf{x}} = \mathbf{b}$ with $\|\Delta A\|_\infty$ bounded by $\rho_n(A) \cdot u \cdot \|A\|_\infty$ ($u$ being unit roundoff). Smaller growth therefore means smaller backward error.

Partial pivoting: worst case $\rho_n \le 2^{n-1}$, but typically $\rho_n \le 10$. Exponential growth occurs only on artificially constructed Wilkinson matrices.
Full pivoting: $\rho_n \le n^{1/2}(2 \cdot 3^{1/2} \cdots n^{1/(n-1)})^{1/2}$, polynomial in $n$.
Symmetric positive definite (Cholesky): $\rho_n = 1$. No growth; pivoting is unnecessary.

Ill-conditioning is measured by $\kappa(A) = \|A\| \cdot \|A^{-1}\|$. When $\kappa(A)$ approaches $1/u$ even pivoted LU loses all significant digits, and SVD-based regularization or iterative refinement becomes necessary; see svd_solve with its rcond damping parameter.

Bareiss Algorithm (Integer Determinants)

Standard floating-point LU contaminates an integer determinant with roundoff, while rational LU keeps the result exact but causes exponential blow-up of numerator and denominator sizes. The Bareiss algorithm computes the determinant of an integer matrix with only integer arithmetic and no genuine division, based on Sylvester's determinant identity.

At stage $k$ the elimination update is

$$M_{ij}^{(k)} = \frac{M_{ij}^{(k-1)} M_{kk}^{(k-1)} - M_{ik}^{(k-1)} M_{kj}^{(k-1)}}{M_{k-1,k-1}^{(k-2)}}.$$

The numerator is a $2 \times 2$ minor; Sylvester's identity guarantees that it is exactly divisible by $M_{k-1,k-1}^{(k-2)}$, so this "division" is an integer division. The final pivot $M_{n-1,n-1}^{(n-2)}$ equals $\det(A)$.

Complexity is $O(n^3)$, but intermediate integers can grow large, so multi-precision integers (sangi's Int) are required. By Hadamard's inequality $|\det A| \le \prod_i \|\mathbf{a}_i\|_2$, intermediate bit lengths are bounded by $O(n \log(nM))$ where $M$ is the largest input magnitude.

sangi's Matrix::determinant() dispatches to bareiss_determinant automatically when numeric_traits<T>::is_integer is true. Floating-point types follow the LU path (lu_determinant, product of diagonals times the sign of the permutation).

Blocked LU and BLAS-3 Optimization

Naive LU is dominated by Level-2 BLAS (axpy-like) operations whose flop-to-byte ratio is bounded by memory bandwidth. Blocked LU partitions $A$ into $b \times b$ tiles and recasts each update step as a matrix multiply (Level-3 BLAS gemm), maximizing cache reuse.

Partition

$$A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}.$$

Factor the $b \times b$ block $A_{11} = L_{11} U_{11}$ with the naive algorithm, then compute

$$L_{21} = A_{21} U_{11}^{-1}, \quad U_{12} = L_{11}^{-1} A_{12}, \quad A_{22}^{(1)} = A_{22} - L_{21} U_{12}.$$

Here $L_{21}$ and $U_{12}$ are obtained via triangular solves (trsm, Level-3), and the Schur complement update is a single gemm. Each step has flop-to-memory ratio $O(1/b)$, far better than the unblocked version. Recursion on the $(n-b) \times (n-b)$ Schur complement gives the full factorization.

Block size $b$ is chosen from L1/L2 cache sizes and arithmetic intensity; typical values are $b = 32 \sim 128$. sangi invokes dgetrf through MKL/OpenBLAS when available, and uses an internal panel-update implementation otherwise.

Cholesky Factorization

If $A$ is symmetric and positive definite ($\mathbf{x}^\top A \mathbf{x} > 0$ for all $\mathbf{x} \neq 0$), there exists a unique lower-triangular $L$ with $A = LL^\top$.

The recursive formulas are

$$L_{kk} = \sqrt{A_{kk} - \sum_{j=1}^{k-1} L_{kj}^2}, \quad L_{ik} = \frac{1}{L_{kk}} \left(A_{ik} - \sum_{j=1}^{k-1} L_{ij} L_{kj}\right) \quad (i > k).$$

Symmetry halves the flop count to $n^3/3$, positive definiteness guarantees $L_{kk}^2 > 0$ so no pivoting is needed, and the growth factor $\rho_n = 1$ makes the factorization exceptionally stable.

Cholesky underlies least-squares normal equations $A^\top A \mathbf{x} = A^\top \mathbf{b}$, Kalman filter covariance updates, Gaussian-process log-likelihoods, and the diagonal blocks of KKT systems in quadratic programming — anywhere positive definiteness is structurally guaranteed.

A blocked variant uses Cholesky on the diagonal block, trsm off-diagonal, and syrk (symmetric rank-$k$ update) for the Schur complement.

LDL Factorization and Bunch-Kaufman Pivoting

When symmetry holds but positive definiteness does not, the Cholesky square root fails. The remedy is $A = LDL^\top$ with unit lower-triangular $L$ and diagonal $D$:

$$D_{kk} = A_{kk} - \sum_{j=1}^{k-1} D_{jj} L_{kj}^2, \quad L_{ik} = \frac{1}{D_{kk}} \left(A_{ik} - \sum_{j=1}^{k-1} D_{jj} L_{ij} L_{kj}\right).$$

Eliminating the square root sidesteps the cost of $\sqrt{\,}$ and avoids failure on negative diagonal entries. Flop count is the same as Cholesky, $n^3/3$.

Bunch-Kaufman Pivoting

Plain LDL still fails when a pivot is small or zero. Bunch-Kaufman pivoting introduces a block-diagonal $D$ with $1 \times 1$ and $2 \times 2$ blocks:

$$D = \mathrm{blockdiag}(D_1, D_2, \ldots), \quad D_k \in \mathbb{R}^{1\times 1} \cup \mathbb{R}^{2\times 2}.$$

At each step the algorithm tests whether a symmetric row/column swap yields a safe $1 \times 1$ pivot, and otherwise groups two adjacent diagonal entries into a $2 \times 2$ block. The strategy preserves symmetry while handling indefinite or near-singular symmetric matrices.

Applications include saddle-point KKT systems in constrained optimization, indefinite Hessians in trust-region methods, and the block-diagonal saddle systems arising in symmetric interior-point methods.

Incomplete Cholesky as a Preconditioner

Iterative solvers (conjugate gradient, MINRES, etc.) on large sparse systems converge faster when accelerated by a preconditioner $M \approx A$. Incomplete Cholesky (IC) preserves the symmetric structure while limiting fill-in.

The level-zero variant $\mathrm{IC}(0)$ uses the sparsity pattern $\mathcal{S} = \{(i,j) : A_{ij} \neq 0\}$ and discards any fill-in produced by elimination:

$$\tilde{L}_{ik} = \begin{cases} (A_{ik} - \sum_{j<k} \tilde{L}_{ij} \tilde{L}_{kj}) / \tilde{L}_{kk} & (i,k) \in \mathcal{S} \\ 0 & \text{otherwise.} \end{cases}$$

The approximation $\tilde{M} = \tilde{L} \tilde{L}^\top$ is not $A$, but using $\tilde{M}^{-1} \mathbf{r}$ inside preconditioned CG (PCG) dramatically improves $\kappa(\tilde{M}^{-1} A)$. Variants $\mathrm{IC}(p)$ allow $p$ levels of fill-in, and modified IC adjusts the diagonal to match row sums.

Even for symmetric positive definite $A$, incomplete Cholesky can produce negative pivots; a small diagonal shift $\alpha I$ (Manteuffel shift) is a standard remedy.

Incomplete LU (ILU) is the asymmetric counterpart; sangi's sparse iterative solvers and their preconditioner choices are covered in the iterative-and-sparse page.

sangi's Automatic Dispatch

sangi LinAlg factorization functions select their internal implementation based on the type and structure of the input. Users typically only need to call the high-level lu_solve, cholesky_solve, ldl_solve wrappers.

Input	Factorization used	Notes
Dense general square	LU + partial pivoting	Default; `PivotStrategy::Full` switches to full pivoting
Dense symmetric positive definite	Cholesky	Explicit `cholesky_solve`; MKL path uses `dpotrf`/`dpotrs`
Dense symmetric indefinite	LDL + Bunch-Kaufman	Explicit `ldl_solve`
Integer `determinant()`	Bareiss	Division-free, all intermediates remain integer
`Matrix<Rational>`	LU + minimum-height pivoting	Mitigates denominator blow-up
Sparse (CSR/CSC)	Simplicial Cholesky / sparse LU	See the sparse page

By size, sangi uses naive LU for $n < 64$, blocked LU for $n \ge 64$ (MKL/OpenBLAS when available, internal fallback otherwise), and parallel panel factorization at $n \ge 1024$. Thresholds are empirically tuned and require no user action.

References

Golub, G. H. and Van Loan, C. F. (2013). Matrix Computations, 4th ed., Johns Hopkins University Press. Chapters 3–4.
Wilkinson, J. H. (1961). "Error analysis of direct methods of matrix inversion". J. ACM, 8(3), 281–330.
Bareiss, E. H. (1968). "Sylvester's identity and multistep integer-preserving Gaussian elimination". Math. Comp., 22(103), 565–578.
Bunch, J. R. and Kaufman, L. (1977). "Some stable methods for calculating inertia and solving symmetric linear systems". Math. Comp., 31(137), 163–179.
Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM.