Eigenvalue Decomposition — Hessenberg, Francis QR, Tridiagonal, Krylov
Overview
Eigenvalue decomposition extracts the intrinsic invariants of a linear operator and appears throughout numerical linear algebra: differential equations, vibrational modes, graph Laplacians, quantum spectra, principal components (technically SVD). For a square $A \in \mathbb{R}^{n \times n}$, eigenpairs $(\lambda, \mathbf{v})$ satisfy
$$A \mathbf{v} = \lambda \mathbf{v}.$$
When $A$ is diagonalizable, $A = V \Lambda V^{-1}$; when symmetric, $A = V \Lambda V^\top$ with $V$ orthogonal. Although the problem is stated for square matrices, the algorithm splits sharply by structure (general, symmetric, Hermitian, Hessenberg, banded, sparse).
This page covers standard methods for general, symmetric, and generalized eigenvalue problems; power-iteration variants; Krylov methods; Gershgorin localization; and Bauer-Fike perturbation theory. See LinAlg API — Eigenvalues and Gershgorin disks for the API.
General Matrices: Hessenberg + Francis QR
The full eigenvalue computation for a general nonsymmetric $A$ proceeds in two stages.
Stage 1: Hessenberg Reduction
Two-sided Householder similarity transformations bring $A$ to upper Hessenberg form (only the subdiagonal and above can be nonzero):
$$Q^\top A Q = H, \quad H_{ij} = 0 \text{ for } i > j + 1.$$
$Q$ is the product of $n - 2$ Householders. Flop count is $\tfrac{10}{3} n^3$, reduced to $\tfrac{4}{3} n^3$ when symmetric (tridiagonalization). Hessenberg structure is preserved by subsequent QR sweeps, dropping each sweep to $O(n^2)$.
Stage 2: Francis Double-Shift QR
Implicit-shift QR sweeps run on the Hessenberg form. With shift $\sigma$,
$$H - \sigma I = Q R, \quad H' = R Q + \sigma I = Q^\top H Q,$$
preserving similarity and Hessenberg structure. When $|H_{n,n-1}|$ falls to machine precision the last eigenvalue is locked in, the matrix is deflated, and the procedure recurses on the leading principal submatrix.
Real nonsymmetric matrices have complex-conjugate eigenvalues that one complex shift cannot capture with real arithmetic. Francis double-shift applies a conjugate pair $(\sigma, \bar{\sigma})$ together in a single real sweep:
$$M = (H - \sigma I)(H - \bar{\sigma} I) = H^2 - 2 \mathrm{Re}(\sigma) H + |\sigma|^2 I.$$
Only the first column of $M$ is used to construct one Householder; the resulting bulge (the transient nonzero below the subdiagonal) is chased down by a chain of Givens rotations. The implicit Q theorem guarantees this is equivalent to the explicit QR step, eliminating the need to compute QR explicitly. Each sweep is $O(n^2)$ and yields two eigenvalues.
Shift Strategy
The standard shift is Wilkinson's — the eigenvalue of the trailing $2 \times 2$ block closer to $h_{nn}$. For nearly equal eigenvalues or cyclic behavior, ad hoc or random shifts are used to break the symmetry. On average $O(n)$ sweeps are needed and total cost is $O(n^3)$.
sangi's eigen calls LAPACK dgeev when MKL is available and uses an internal Hessenberg + Francis QR implementation otherwise.
Symmetric Matrices: Tridiagonalization and MR3
For $A = A^\top$, Hessenberg reduction automatically lands on symmetric tridiagonal form:
$$T = Q^\top A Q = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ \beta_1 & \alpha_2 & \beta_2 & \\ & \beta_2 & \ddots & \beta_{n-1} \\ & & \beta_{n-1} & \alpha_n \end{pmatrix}.$$
Several algorithms compute the eigensystem of $T$, with different strengths:
Symmetric QR Iteration
Implicit-shift QR on a tridiagonal matrix needs only one Givens to push out the $2 \times 2$ bulge, so each sweep costs $O(n)$. Full eigenvectors take $O(n^3)$.
Divide-and-Conquer
$T$ is split in the middle into two subtridiagonals, recursively decomposed, then merged via a secular equation.
Cost is typically $O(n^{2.3})$, beating QR iteration for $n \gtrsim 100$. LAPACK DSTEDC.
MR3 (Multiple Relatively Robust Representations)
Dhillon and Parlett (2004) compute relatively robust $LDL^\top$ representations for each eigenvalue cluster of $T$,
producing the eigenvector in $O(n)$ from each. The full eigensystem costs $O(n^2)$, the fastest known.
LAPACK DSTEMR and Eigen's internal solver follow this approach.
Complexity comparison:
| Method | Eigenvalues only | With eigenvectors | Accuracy |
|---|---|---|---|
QR iteration (DSTERF/DSTEQR) | $O(n^2)$ | $O(n^3)$ | High |
Divide-and-conquer (DSTEDC) | — | $O(n^{2.3})$ | High |
MR3 (DSTEMR) | — | $O(n^2)$ | High (careful impl.) |
| Jacobi | $O(n^3)$ | $O(n^3)$ | Highest (relative) |
sangi's eigen_symmetric currently dispatches to LAPACK DSYEV (tridiagonalization + QR iteration) through MKL/OpenBLAS, and falls back to Jacobi when no BLAS backend is available.
Switching to DSYEVR (MR3 / DSTEMR) or DSYEVD (divide-and-conquer) for large matrices is on the roadmap.
Hermitian Matrices
Eigenvalues of a complex Hermitian $A^H = A$ are real.
sangi routes through LAPACK ZHEEV when available, and otherwise embeds the $n \times n$ Hermitian into a $2n \times 2n$ real symmetric and solves via eigen_symmetric.
Generalized Eigenvalues and QZ
The generalized eigenproblem $A \mathbf{x} = \lambda B \mathbf{x}$ reduces to $B^{-1} A \mathbf{x} = \lambda \mathbf{x}$ when $B$ is well-conditioned, but breaks down numerically when $B$ is ill-conditioned or singular. The QZ algorithm instead reduces $(A, B)$ jointly to upper triangular form:
$$Q^\top A Z = S, \quad Q^\top B Z = T, \quad S, T \text{ upper triangular.}$$
Eigenpairs are $(\sigma_i, \tau_i) = (S_{ii}, T_{ii})$ with $\lambda_i = \sigma_i / \tau_i$. Entries with $\tau_i = 0$ are infinite eigenvalues, naturally representing the null space of $B$.
QZ reduces $(A, B)$ to a Hessenberg-triangular pair, then runs QZ sweeps (Francis QR generalized) to diagonalize.
LAPACK DGGEV.
When $A$ is symmetric and $B$ symmetric positive definite, Cholesky $B = L L^\top$ converts the problem to the symmetric eigenproblem for $L^{-1} A L^{-\top}$
(sangi's eigen_generalized_symmetric).
Power, Inverse, and Rayleigh Quotient Iteration
Classical iterative methods for the extreme or near-target eigenpairs when full decomposition is unnecessary.
Power Iteration
$\mathbf{x}_{k+1} = A \mathbf{x}_k / \|A \mathbf{x}_k\|$ converges to the eigenvector of the dominant eigenvalue $\lambda_1$ at rate $|\lambda_2 / \lambda_1|^k$. Each iteration is $O(n^2)$ (or $O(\mathrm{nnz})$ for sparse $A$). Useful when the spectral gap is large. PageRank is essentially this method.
Inverse Iteration
Given shift $\mu$,
$$\mathbf{x}_{k+1} = (A - \mu I)^{-1} \mathbf{x}_k / \|(A - \mu I)^{-1} \mathbf{x}_k\|$$
converges to the eigenvector of the eigenvalue nearest $\mu$ at rate $|(\lambda_i - \mu)/(\lambda_j - \mu)|^k$. Once an approximate eigenvalue is in hand, inverse iteration refines the eigenvector quickly, making it the standard companion to QR iteration.
Rayleigh Quotient Iteration
For symmetric $A$, set the shift each iteration to the Rayleigh quotient of the current approximation:
$$\mu_k = \frac{\mathbf{v}_k^\top A \mathbf{v}_k}{\mathbf{v}_k^\top \mathbf{v}_k}.$$
If $\mathbf{v}_k$ is $O(\varepsilon)$ away from the true eigenvector, the Rayleigh quotient is $O(\varepsilon^2)$ away from the true eigenvalue, and the inverse step shifted by it produces $\mathbf{v}_{k+1}$ accurate to $O(\varepsilon^3)$. Cubic convergence makes it extraordinarily fast for symmetric refinement; nonsymmetric problems retain only quadratic convergence.
See also: Power Method / Inverse Iteration
Krylov Methods: Lanczos and Arnoldi
When $n$ is huge and $A$ is sparse, even Hessenberg reduction at $O(n^3)$ is infeasible. Krylov methods project $A$ onto a much smaller subspace $\mathcal{K}_k(A, \mathbf{v}) = \mathrm{span}(\mathbf{v}, A\mathbf{v}, \ldots, A^{k-1}\mathbf{v})$ and extract $k \ll n$ extremal eigenvalues from the projection.
Lanczos (Symmetric)
For symmetric $A$, sequential orthogonalization of Krylov basis vectors reduces to a three-term recurrence:
$$A \mathbf{q}_j = \beta_{j-1} \mathbf{q}_{j-1} + \alpha_j \mathbf{q}_j + \beta_j \mathbf{q}_{j+1}.$$
The projection $T_k = Q_k^\top A Q_k$ is symmetric tridiagonal, and its eigenvalues (Ritz values) approximate the extremal eigenvalues of $A$. Each iteration is $O(n)$ ($O(\mathrm{nnz})$ for sparse) and $k$ iterations yield $k$ eigenvalues. In finite precision the orthogonality of the $\mathbf{q}_j$ deteriorates and selective reorthogonalization or Paige-style analysis is required.
Arnoldi (Nonsymmetric)
Nonsymmetric $A$ has no three-term recurrence; Arnoldi MGS-orthogonalizes against all prior basis vectors and produces an upper Hessenberg $H_k$. Iteration $k$ costs $O(nk)$ ($O(\mathrm{nnz} \cdot k)$ for sparse). Eigenvalues of $H_k$ are Ritz values approximating the extremal eigenvalues of $A$. Arnoldi also drives GMRES.
Shift-and-Invert
To target the smallest eigenvalues or eigenvalues near a specific point $\mu$, apply Krylov methods to $(A - \mu I)^{-1}$ instead of $A$. The eigenvalue map $\lambda \mapsto 1/(\lambda - \mu)$ amplifies eigenvalues near $\mu$, accelerating convergence. Each iteration requires one linear solve with $A - \mu I$; this LU factorization is often the dominant cost.
Restart Strategies
As the Krylov basis grows, orthogonalization and storage costs balloon. Implicitly Restarted Arnoldi (IRA) (Sorensen / ARPACK) retains converged Ritz vectors and compresses the basis to $m$ dimensions, then re-expands with $k - m$ new vectors. It is the standard for large-scale eigenvalue computation.
See also: Krylov Subspaces / Lanczos Algorithm / Arnoldi Iteration
Gershgorin Disk Theorem
The Gershgorin inclusion theorem localizes eigenvalues without iteration:
$$\text{every eigenvalue} \in \bigcup_{i=1}^{n} D_i, \quad D_i = \left\{z \in \mathbb{C} : |z - a_{ii}| \leq \sum_{j \neq i} |a_{ij}|\right\}.$$
$D_i$ is the closed disk centered at the diagonal entry $a_{ii}$ with radius equal to the absolute off-diagonal row sum. A stronger version says: if $k$ disks form a connected component disjoint from the rest, that component contains exactly $k$ eigenvalues.
Applications
- Positive-definiteness test: if every $D_i$ lies in the right half-plane, all eigenvalues have positive real part; for symmetric $A$ this implies positive definiteness.
- Convergence prediction: diagonal dominance ($|a_{ii}| > \sum_{j \neq i} |a_{ij}|$) is a Gershgorin-based sufficient condition for $\rho(B_J) < 1$ in Jacobi iteration.
- Shift selection: Gershgorin disks give cheap eigenvalue range estimates for shifted QR or shift-and-invert.
A column-version (Gershgorin applied to $A^\top$) also holds, and the intersection of row and column disks gives a tighter bound. sangi's Gershgorin disk function returns both row and column disks.
See also: Gershgorin Disk Theorem
Perturbation and Conditioning (Bauer-Fike)
How input perturbations propagate to eigenvalues governs the numerical stability of any eigenvalue solver. For diagonalizable $A = V \Lambda V^{-1}$ the Bauer-Fike theorem bounds the perturbed eigenvalues $\hat{\lambda}$ of $A + E$:
$$\min_i |\hat{\lambda} - \lambda_i| \leq \kappa_2(V) \cdot \|E\|_2,$$
where $\kappa_2(V) = \|V\|_2 \|V^{-1}\|_2$ is the condition number of the eigenvector matrix.
Consequences:
- Symmetric matrices: $V$ is orthogonal, $\kappa_2(V) = 1$, so $|\hat{\lambda} - \lambda_i| \leq \|E\|_2$ (Weyl's inequality, optimal).
- Normal matrices: $V$ is unitary; same bound.
- Nonnormal: $\kappa_2(V)$ can be enormous, leading to eigenvalues sensitive to small perturbations. Wilkinson's pathological matrices have $\kappa_2(V) \sim 10^{20}$.
Per-eigenvalue conditioning is captured by $s_i = 1 / |\mathbf{u}_i^\top \mathbf{v}_i|$ in terms of left and right eigenvectors $\mathbf{u}_i, \mathbf{v}_i$; $s_i \cdot \|E\|_2$ bounds the perturbation of $\lambda_i$. For nearly defective matrices (linearly dependent eigenvectors), $s_i$ blows up.
sangi's Choices
| API / Scenario | Method | Notes |
|---|---|---|
eigen, dense general | Hessenberg + Francis double-shift QR | MKL dgeev; internal fallback |
eigen_symmetric, real symmetric | Tridiagonalization + QR iteration / Jacobi | MKL dsyev; Jacobi fallback (MR3/D&C switch planned) |
eigen_hermitian | Tridiagonalization + QR iteration | MKL zheev; real symmetric embedding fallback |
eigen_generalized | QZ algorithm | MKL dggev |
eigen_generalized_symmetric | Cholesky $\to$ symmetric eigenproblem | Reduction to $L^{-1} A L^{-\top}$ |
power_method | Power iteration | Dominant eigenpair, $O(n^2)$/iter |
| Huge sparse | Lanczos / Arnoldi (Krylov) | Shift-and-invert for interior eigenvalues |
| Localization only | Gershgorin disks | $O(n^2)$, no iteration |
References
- Francis, J. G. F. (1961, 1962). "The QR Transformation, I and II". Computer J., 4, 265–271, 332–345.
- Parlett, B. N. (1998). The Symmetric Eigenvalue Problem, SIAM (classic ed.).
- Dhillon, I. S. and Parlett, B. N. (2004). "Orthogonal eigenvectors and relative gaps". SIAM J. Matrix Anal. Appl., 25(3), 858–899.
- Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators". J. Res. Nat. Bur. Standards, 45, 255–282.
- Arnoldi, W. E. (1951). "The principle of minimized iterations in the solution of the matrix eigenvalue problem". Quart. Appl. Math., 9, 17–29.
- Bauer, F. L. and Fike, C. T. (1960). "Norms and exclusion theorems". Numer. Math., 2, 137–141.
- Stewart, G. W. (2001). Matrix Algorithms, Volume II: Eigensystems, SIAM.