Iterative and Sparse Solvers — Jacobi, CG, GMRES, Preconditioners, Sparse Direct
Overview
Iterative methods construct a sequence of approximations $\mathbf{x}_k$ to the solution of $A \mathbf{x} = \mathbf{b}$ without explicitly factoring $A$. For large sparse $A$ at $n = 10^6$ to $10^9$, direct factorizations exceed available memory and iterative methods become the only practical option.
The major classes:
- Classical iterations: Jacobi, Gauss-Seidel, SOR — local row updates; slow alone, still useful as smoothers and preconditioners.
- Krylov subspace methods: CG, MINRES, GMRES, BiCGStab, IDR(s), QMR — use only matrix-vector products and find optimal iterates in $\mathcal{K}_k(A, \mathbf{r}_0)$.
- Multigrid: hierarchical coarsening and smoothing; asymptotically optimal $O(n)$ for discretized PDEs.
This page covers each iterative class, preconditioners, sparse storage, sparse direct methods, and stopping/restart strategies. See iterative API, sparse direct, and sparse iterative in the API reference.
Classical Iterations: Jacobi, Gauss-Seidel, SOR
Write $A = D + L + U$ (diagonal, strictly lower, strictly upper) and update $\mathbf{x}_{k+1}$ from $\mathbf{x}_k$.
Jacobi
$$\mathbf{x}_{k+1} = D^{-1} (\mathbf{b} - (L + U) \mathbf{x}_k).$$
Each component depends only on others, making the update embarrassingly parallel. Convergence requires $\rho(D^{-1}(L+U)) < 1$, satisfied by diagonally dominant matrices. Rate is set by $\rho$; for the discrete Poisson equation $\rho = 1 - O(h^2)$, painfully slow.
Gauss-Seidel
$$(D + L) \mathbf{x}_{k+1} = \mathbf{b} - U \mathbf{x}_k.$$
Uses updated components within a sweep — faster than Jacobi but inherently sequential. Converges for diagonally dominant and SPD matrices; its spectral radius is roughly the square of Jacobi's.
SOR (Successive Over-Relaxation)
With relaxation parameter $\omega$,
$$(D + \omega L) \mathbf{x}_{k+1} = \omega \mathbf{b} + ((1 - \omega) D - \omega U) \mathbf{x}_k.$$
$\omega = 1$ recovers Gauss-Seidel; $\omega \in (1, 2)$ is over-relaxation. For the discrete Laplacian the optimal $\omega$ is
$$\omega_{\mathrm{opt}} = \frac{2}{1 + \sqrt{1 - \rho_J^2}},$$
improving convergence to the square root of the Jacobi rate. Outside structured PDEs the optimal $\omega$ is hard to estimate; Krylov methods supersede SOR as standalone solvers but it remains relevant as a smoother and preconditioner.
See also: Jacobi Method / Gauss-Seidel / SOR / Iterative Methods Overview
SPD Krylov: CG and PCG
The conjugate gradient method (CG) solves $A\mathbf{x} = \mathbf{b}$ for symmetric positive definite $A$ by minimizing $\varphi(\mathbf{x}) = \tfrac{1}{2} \mathbf{x}^\top A \mathbf{x} - \mathbf{b}^\top \mathbf{x}$.
The Method
At each step, $A$-conjugate search directions $\mathbf{p}_k$ are built from the residual $\mathbf{r}_k = \mathbf{b} - A \mathbf{x}_k$, and $\mathbf{x}_k$ moves by the step size $\alpha_k$ that minimizes $\varphi$. Crucially, $\mathbf{x}_k$ is the $A$-norm optimal approximation in $\mathcal{K}_k(A, \mathbf{r}_0)$, and in exact arithmetic CG terminates in at most $n$ steps. In practice rounding spoils finite termination but the goal is convergence in $k \ll n$ iterations.
Convergence
With condition number $\kappa = \kappa_2(A)$,
$$\frac{\|\mathbf{x}_k - \mathbf{x}^*\|_A}{\|\mathbf{x}_0 - \mathbf{x}^*\|_A} \leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k.$$
Ill-conditioning makes convergence slow, so a preconditioner $M \approx A$ is almost always used. Preconditioned CG (PCG) operates on $\mathbf{z}_k = M^{-1} \mathbf{r}_k$ and reduces the effective condition number to $\kappa(M^{-1} A)$.
Where CG Wins
Elliptic PDEs (potential, heat, structural statics) discretized by FEM/FD, graph Laplacians, normal equations of least squares, and quadratic subproblems in machine learning optimizers. Memory is $O(n)$ (three vectors) and per-iteration cost is one SpMV plus a few inner products — $O(\mathrm{nnz})$.
See also: Krylov Subspaces
Symmetric Indefinite: MINRES and SYMMLQ
For symmetric but indefinite $A$, CG's foundation (convex quadratic minimization) fails. Paige and Saunders' MINRES and SYMMLQ build on the symmetric Lanczos recurrence without requiring positive definiteness.
MINRES
Minimizes $\|\mathbf{r}_k\|_2$ in the Krylov subspace. It QR-factorizes the Lanczos tridiagonal as the iteration proceeds, keeping memory at $O(n)$, on par with CG. Standard for symmetric saddle-point KKT systems and the Stokes equations.
SYMMLQ
Minimizes $\|\mathbf{x}_k - \mathbf{x}^*\|_2$ rather than the residual, more stable on singular or nearly-singular $A$ but with a trickier stopping test.
Nonsymmetric: GMRES, BiCGStab, IDR(s), QMR
Nonsymmetric $A$ admits no symmetric three-term recurrence; the methods are more involved.
GMRES (Generalized Minimal Residual)
Arnoldi iteration builds an orthonormal basis $Q_k$ of $\mathcal{K}_k$ and an upper Hessenberg $H_k$; GMRES minimizes $\|\mathbf{b} - A \mathbf{x}_k\|_2$ over $\mathcal{K}_k$. Convergence is monotone, but memory grows linearly with $k$ ($O(nk)$) and so does the per-iteration MGS cost. Practical use is restarted GMRES, GMRES(m), with $m = 20$–$50$. Restarts erase convergence history and can stagnate, but with a decent preconditioner convergence is usually quick.
BiCGStab
Short-recurrence method (CG-like 3-term updates) for nonsymmetric $A$. Combines BiCG steps with a GMRES(1)-style smoothing to dampen Bi-Lanczos instability. Memory $O(n)$, two SpMVs per iteration. Less robust than GMRES; convergence can oscillate or stall.
IDR(s) (Induced Dimension Reduction)
Sonneveld and van Gijzen (2008). Uses $s$ shadow vectors to compress the Krylov subspace, achieving smoother convergence than BiCGStab with fewer SpMVs. Typical $s$ is 4-8. Memory-efficient compared to GMRES and more robust than BiCGStab. (Not yet implemented in sangi; planned for a future release.)
QMR (Quasi-Minimal Residual)
Bi-Lanczos based. Requires multiplications by $A^\top$ and quasi-minimizes the residual. Predominant before BiCGStab; TFQMR variants by Freund avoid the transpose. (Not yet implemented in sangi; planned for a future release.)
Preconditioners
A preconditioner $M$ approximates $A$ in the sense that $M^{-1} A \approx I$, transforming $A\mathbf{x} = \mathbf{b}$ into $M^{-1} A \mathbf{x} = M^{-1} \mathbf{b}$ and reducing the effective condition number. $M^{-1}$ must be cheap to apply each iteration ($O(\mathrm{nnz})$ ideally).
Jacobi (Diagonal)
$M = D = \mathrm{diag}(A)$. Simplest, helpful only for diagonally dominant problems, maximally parallel.
SSOR (Symmetric SOR)
Symmetric variant of SOR, usable for SPD systems: $M^{-1} = (D/\omega + U)^{-1} (D/\omega) (D/\omega + L)^{-1}$. Zero fill-in; very cheap setup.
Incomplete LU / Cholesky (ILU / IC)
Approximate factorizations $\tilde{L} \tilde{U} \approx A$ on the sparsity pattern of $A$. Fill level $p$ and drop threshold $\tau$ control the quality–sparsity trade-off. Use IC for SPD systems, ILU otherwise. Setup is $O(\mathrm{nnz})$, application is forward/backward substitution at $O(\mathrm{nnz})$. A Manteuffel shift $\alpha I$ often stabilizes the factorization. See the LU/Cholesky page for details.
AMG (Algebraic Multigrid)
Builds a hierarchy of coarse grids and smoothers from the algebraic structure of $A$ alone, with no geometric input. Gauss-Seidel or Jacobi smoothing combined with strong-connection coarsening gives asymptotically optimal $O(n)$ preconditioning for elliptic PDEs. Implementation complexity drives most users to specialized libraries (HYPRE, Trilinos). (Not yet implemented in sangi; planned for a future release.)
See also: Incomplete LU / Multigrid
Sparse Storage Formats
Storing only the nonzeros slashes memory and per-iteration cost when nonzero density $\mathrm{nnz}/n^2$ is small.
CSR (Compressed Sparse Row)
values[k]: nonzero values in row-major order, length $\mathrm{nnz}$col_idx[k]: column index of each nonzero, length $\mathrm{nnz}$row_ptr[i]: index intovaluesof the first nonzero of row $i$, length $n+1$
SpMV $\mathbf{y} = A \mathbf{x}$ is an inner product of values[row_ptr[i] : row_ptr[i+1]] with x[col_idx[...]] for each row.
Default format in sangi.
CSC (Compressed Sparse Column)
Column version of CSR. Optimal for $A^\top \mathbf{x}$ and column-wise operations such as LU pivoting. Some solvers store both CSR and CSC of the same matrix.
COO (Coordinate)
List of $(i, j, v)$ triples. Convenient for input and matrix assembly, inefficient for SpMV; convert to CSR/CSC after construction.
BSR (Block Sparse Row)
Nonzero $b \times b$ blocks stored CSR-style. PDE discretizations on structured meshes often produce small dense blocks, and BSR lets SpMV use BLAS-3.
SpMV is Memory-Bound
SpMV bandwidth is typically limited by random access to $\mathbf{x}$ via col_idx.
Reorderings such as Cuthill-McKee or RCM improve cache locality.
Roofline-model optimization is common.
See also: Sparse Matrices / Sparse Matrix Theory
Sparse Direct Methods
When iterative methods converge poorly (strongly indefinite, ill-conditioned, or with many right-hand sides), sparse direct solvers remain competitive.
Symbolic and Numeric Phases
- Symbolic factorization: determine the nonzero pattern of $L, U$, choose a row/column ordering to minimize fill-in (AMD, COLAMD, METIS), and build the elimination tree.
- Numeric factorization: fill values into the predetermined pattern.
When only the right-hand side changes (e.g., time-stepping ODEs, Newton iterations), the symbolic phase is reused and only numeric work is repeated.
Simplicial vs Supernodal/Multifrontal
Simplicial methods update one element at a time (CHOLMOD's simplicial mode). Supernodal/multifrontal methods aggregate rows of similar pattern into supernodes and update with BLAS-3. The latter is faster but more complex. CHOLMOD, SuperLU, MUMPS, and PARDISO are typical implementations.
Ordering
Fill-in depends strongly on the elimination order. AMD (Approximate Minimum Degree), Nested Dissection (METIS), and COLAMD (for unsymmetric matrices) are the standard heuristics. Optimal ordering is NP-hard.
sangi's Sparse Direct
sangi provides simplicial Cholesky and sparse LU through the sparse direct API. These are well suited to medium-sized sparse matrices ($n \lesssim 10^5$); beyond that, iterative methods with preconditioning are the practical route.
Stopping Criteria, Stagnation, Restart
Stopping Criteria
- Absolute residual: $\|\mathbf{r}_k\|_2 < \tau_a$
- Relative residual (standard): $\|\mathbf{r}_k\|_2 / \|\mathbf{b}\|_2 < \tau_r$
- Backward-error based: $\|\mathbf{r}_k\|_2 / (\|A\|_2 \|\mathbf{x}_k\|_2 + \|\mathbf{b}\|_2) < \tau$
- Iteration cap: $k < k_{\max}$
Higham (2002) recommends the backward-error form to avoid over-iteration on ill-conditioned problems. The forward relative error is bounded by $\kappa(A) \cdot \|\mathbf{r}_k\| / \|\mathbf{b}\|$, so the relative residual alone does not bound the forward error.
Stagnation Detection
When $\|\mathbf{r}_k\|_2$ stops decreasing over several iterations, options include:
- Strengthen the preconditioner (e.g., IC(0) $\to$ IC(1))
- Restart: GMRES restart, or CG warm-restart from current iterate
- Iterative refinement: fall back to a direct method, recompute residual, refine
- Switch solver: the iterative method may not fit the problem
Restart Strategy
GMRES(m) restart trades memory for convergence quality. Too small $m$ causes stagnation; too large erases the advantage. Start with $m = 20$–$50$ and double when convergence is slow.
Advanced variants — adaptive restart, deflated GMRES that retains spectral information — exist for difficult problems.
sangi's Choices
| Problem | Recommended solver | Preconditioner |
|---|---|---|
| SPD, dense | Cholesky direct | — |
| SPD, sparse, $n \lesssim 10^5$ | Simplicial Cholesky | — |
| SPD, sparse, $n \gg 10^5$ | CG / PCG | IC(0), Jacobi (AMG planned) |
| Symmetric indefinite, sparse | MINRES / SYMMLQ | SSOR, block-diagonal |
| Nonsymmetric, sparse, diagonally dominant | BiCGStab (IDR(s) planned) | ILU(0), Jacobi |
| Nonsymmetric, sparse, ill-conditioned | restarted GMRES(m) | ILU (AMG planned) |
| Nonsymmetric, $n \lesssim 10^5$ | Sparse LU | — |
| Classical iteration as a preconditioner | Jacobi / Gauss-Seidel / SOR via iterative API | — |
References
- Saad, Y. (2003). Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM.
- Hestenes, M. R. and Stiefel, E. (1952). "Methods of conjugate gradients for solving linear systems". J. Res. Nat. Bur. Standards, 49, 409–436.
- Saad, Y. and Schultz, M. H. (1986). "GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems". SIAM J. Sci. Stat. Comput., 7(3), 856–869.
- van der Vorst, H. A. (1992). "BiCGSTAB: A fast and smoothly converging variant of Bi-CG". SIAM J. Sci. Stat. Comput., 13(2), 631–644.
- Sonneveld, P. and van Gijzen, M. B. (2008). "IDR(s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations". SIAM J. Sci. Comput., 31(2), 1035–1062.
- Davis, T. A. (2006). Direct Methods for Sparse Linear Systems, SIAM.
- Briggs, W. L., Henson, V. E., and McCormick, S. F. (2000). A Multigrid Tutorial, 2nd ed., SIAM.