Matrix Determinant and Inverse

Overview

For an $N \times N$ square matrix $A$, the determinant $\det A$ and the inverse $A^{-1}$ are the textbook objects whose mathematical definition and computational procedure diverge the most. The cofactor / adjugate formulas yield $O(N!)$ complexity if implemented literally. The practical solution routes through LU decomposition, achieving $O(N^3)$.

This article describes how sangi det and inverse select an algorithm based on size, element type, and intended usage. Decomposition internals appear in the Matrix API Reference and in dedicated articles on LU / QR / Cholesky / SVD.

Cofactor Expansion (Small Matrices)

Cofactor expansion is the textbook definition of the determinant:

$$\det A = \sum_{j=0}^{N-1} (-1)^{i+j} a_{ij} \det M_{ij}$$

($M_{ij}$ is the $(N-1) \times (N-1)$ submatrix obtained by deleting row $i$ and column $j$.)

A naive recursion has complexity $O(N!)$ and explodes. sangi restricts the closed-form / unrolled path to $N \leq 4$:

$N$ComplexityNotes
1$O(1)$$a_{00}$ directly
2$O(1)$$a_{00} a_{11} - a_{01} a_{10}$
3$O(1)$Sarrus rule (six terms)
4$O(1)$$2 \times 2$ block cofactor expansion
$\geq 5$Dispatch to LU$O(N^3)$

Because $N \in \{2, 3, 4\}$ appears constantly in SE(3) transforms ($4 \times 4$ homogeneous) and $3 \times 3$ rotations, these cases are fully unrolled with dedicated code. Compiler constant folding keeps the entire computation in registers.

Related articles: Cofactor Expansion / Cramer's Rule

Determinant via LU Decomposition

For $N \geq 5$, sangi routes through LU decomposition with partial pivoting. Given $PA = LU$:

$$\det A = \det(P^{-1}) \cdot \det L \cdot \det U = (-1)^s \cdot \prod_{i=0}^{N-1} u_{ii}$$

($s$ is the number of pivot row swaps; $\det L = 1$ because $L$ is unit lower triangular.)

The LU decomposition itself is $O(N^3)$; the product of the diagonal is $O(N)$. This is a dramatic improvement over the cofactor expansion's $O(N!)$.

Sign Tracking

Partial pivoting may swap rows at each step, and each swap flips the sign of $\det$. sangi keeps an integer counter for swaps and multiplies $(-1)^s$ into the diagonal product at the end.

Overflow Resistance

The product $\prod u_{ii}$ overflows or underflows easily for large $N$. The log-domain formulation is:

$$\log |\det A| = \sum_i \log |u_{ii}|, \quad \mathrm{sign}(\det A) = (-1)^s \cdot \prod_i \mathrm{sign}(u_{ii})$$

In statistics (likelihood functions) and machine learning (Gaussian log-likelihood) the log-domain accumulation is the standard form. A dedicated logdet public API is planned but not yet shipped.

Bareiss Algorithm (Integer Domain)

When the element type is Int (sangi arbitrary-precision integer) or Rational, the floating-point LU is inappropriate. Division by pivots introduces fractions, and intermediate entries can grow exponentially (intermediate expression swell).

The Bareiss algorithm (1968) resolves this by folding an exact division by the previous pivot into each update so that entries remain integers:

$$a^{(k+1)}_{ij} = \frac{a^{(k)}_{kk} a^{(k)}_{ij} - a^{(k)}_{ik} a^{(k)}_{kj}}{a^{(k-1)}_{k-1,k-1}}$$

(With the convention $a^{(-1)}_{-1,-1} := 1$.)

The numerator is clearly an integer, and Bareiss's theorem guarantees the denominator divides it exactly. Hadamard's inequality bounds the intermediate magnitude by $\leq N^{N/2} \cdot M^N$ ($M = \max |a_{ij}|$).

The final entry $a^{(N-1)}_{N-1,N-1}$ is $\det A$ directly. No floating-point arithmetic appears and no rounding error is introduced.

Complexity is $O(N^3)$ multiprecision divisions; with $L$-bit entries the total cost is $O(N^3 M(L))$ ($M(L)$ is the cost of $L$-bit multiplication). The constant factor exceeds floating-point LU, but exactness in integer / rational domains is essential for computer algebra.

sangi det inspects the element type and selects LU for floating-point or Bareiss for integer / rational automatically.

Closed-Form Inverse

For $N = 2, 3$, the adjugate-based closed form is fast and numerically adequate.

2×2

$$A^{-1} = \frac{1}{\det A} \begin{pmatrix} a_{11} & -a_{01} \\ -a_{10} & a_{00} \end{pmatrix}$$

3×3

Each entry is a $2 \times 2$ cofactor:

$$(A^{-1})_{ij} = \frac{(-1)^{i+j}}{\det A} \det M_{ji}$$

(Note the index transposition $M_{ji}$.)

sangi uses fully unrolled closed forms for $N = 2, 3$. A closed form exists for $4 \times 4$ as well, but the term count is high (24 or more) and the speed barely surpasses LU, so sangi switches to the LU path at $N \geq 4$.

Inverse via Gauss-Jordan

Gauss-Jordan elimination transforms the augmented matrix $[A \mid I]$ into $[I \mid A^{-1}]$ via row operations:

  1. Reduce the left half to row-echelon form using row operations
  2. Apply the same operations to the right half ($I$ initially)
  3. When the left half is $I$, the right half is $A^{-1}$

Complexity is $O(N^3)$, roughly twice that of LU (it performs both forward elimination and back substitution). The current sangi inverse dispatches to the LU path described below by default; a dedicated inverse_gauss_jordan entry point is planned but not yet shipped.

Inverse via LU

$A^{-1}$ is the solution of $AX = I$. Splitting $X$ column-wise into $\mathbf{x}_j$ reduces the problem to $A \mathbf{x}_j = \mathbf{e}_j$ for each column.

Once $PA = LU$ is computed, each column requires only two triangular solves:

$$L \mathbf{y}_j = P \mathbf{e}_j, \quad U \mathbf{x}_j = \mathbf{y}_j$$

Complexity Breakdown

  • LU decomposition: roughly $O(N^3 / 3)$
  • $N$ triangular solves: each $O(N^2)$, totaling $O(N^3)$
  • Overall: $O(N^3)$

Exploiting the unit-matrix structure of the right-hand side, forward substitution can skip leading rows of zero entries, improving the constant factor further. sangi inverse applies this optimization.

Singularity and Condition Number

If $A$ is singular, $A^{-1}$ does not exist. During LU decomposition, a pivot that hits zero signals $\det A = 0$ and singularity.

In floating-point arithmetic the probability of an exactly zero pivot is zero, but matrices that are "nearly singular" (very small pivots) appear regularly. The condition number quantifies this:

$$\kappa(A) = \|A\| \cdot \|A^{-1}\|$$

When $\kappa(A) \gg 1$, small input perturbations produce large output deviations and the computed $A^{-1}$ is unreliable. Under IEEE 754 double precision ($\epsilon \approx 10^{-16}$), $\kappa(A) > 10^{14}$ implies that the result carries no numerically meaningful precision.

Computing $\kappa(A)$ exactly requires the SVD, $O(N^3)$. A cheap LU-based estimator (analogous to LAPACK's dgecon) is available. A separate article in this guide covers condition numbers in detail.

Why solve Is Preferred over inverse

Textbooks write the solution of $A \mathbf{x} = \mathbf{b}$ as $\mathbf{x} = A^{-1} \mathbf{b}$. In numerical computation, materializing $A^{-1}$ is almost always the wrong move. Reasons:

Cost

  • Form $A^{-1}$, then multiply: $O(N^3) + O(N^2) \approx O(N^3)$
  • LU then solve directly: $O(N^3 / 3) + O(N^2) \approx O(N^3 / 3)$

Forming the inverse pays roughly a 3x cost, and the resulting $A^{-1}$ is usually discarded immediately.

Numerical Precision

Entries of computed $A^{-1}$ carry greater error than the values produced by triangular solves of LU. Multiplying $A^{-1} \mathbf{b}$ accumulates additional floating-point error. LU plus triangular solve has a classical error analysis with guaranteed backward stability.

Legitimate Exceptions

When the same $A$ is paired with thousands of right-hand sides, computing $A^{-1}$ once and applying it to each $\mathbf{b}_i$ might be cheaper than repeated solves. Even then, repeatedly calling LU.solve(b) generally wins on both speed and precision, and the only legitimate reason to materialize $A^{-1}$ is when an external library interface requires it explicitly.

sangi provides inverse, but the API documentation consistently recommends solve(A, b) or A.lu().solve(b).

Related article: Gaussian Elimination

Design Decisions

Closed Form for Small Sizes, LU for the Rest

For $N \leq 3$ the closed forms (Sarrus, adjugate) win on speed. For $N = 4$ the closed form and LU are close, but sangi prefers the closed form through $N = 4$ for implementation simplicity. Being able to handle SE(3) ($4 \times 4$ homogeneous transforms) entirely in closed form is a major design benefit.

Bareiss for Integer Domains

In computer algebra, floating-point LU loses correctness guarantees. The Bareiss algorithm is the classical solution that controls intermediate expression swell, and sangi selects it automatically based on the element type. For Rational entries Bareiss minimizes the cost of fraction reduction as well.

API Design Recommending solve

inverse is provided for compatibility with textbook notation, but the documentation repeatedly recommends solve. Many users translate mathematical notation directly into code, so the message "if you want $A^{-1} b$, use solve(A, b)" must be reinforced in both documentation and sample code.

References

  • Bareiss, E. H. (1968). "Sylvester's Identity and Multistep Integer-Preserving Gaussian Elimination". Mathematics of Computation, 22 (103), 565–578.
  • Golub, G. H. and Van Loan, C. F. Matrix Computations. 4th ed. Johns Hopkins University Press, 2013.
  • Higham, N. J. Accuracy and Stability of Numerical Algorithms. 2nd ed. SIAM, 2002. (Discussion of inverse vs. solve)
  • Trefethen, L. N. and Bau, D. Numerical Linear Algebra. SIAM, 1997. (Foundations of condition numbers and numerical linear algebra)