Eigenvalue Decomposition — Hessenberg, Francis QR, Tridiagonal, Krylov

Overview

Eigenvalue decomposition extracts the intrinsic invariants of a linear operator and appears throughout numerical linear algebra: differential equations, vibrational modes, graph Laplacians, quantum spectra, principal components (technically SVD). For a square $A \in \mathbb{R}^{n \times n}$, eigenpairs $(\lambda, \mathbf{v})$ satisfy

$$A \mathbf{v} = \lambda \mathbf{v}.$$

When $A$ is diagonalizable, $A = V \Lambda V^{-1}$; when symmetric, $A = V \Lambda V^\top$ with $V$ orthogonal. Although the problem is stated for square matrices, the algorithm splits sharply by structure (general, symmetric, Hermitian, Hessenberg, banded, sparse).

This page covers standard methods for general, symmetric, and generalized eigenvalue problems; power-iteration variants; Krylov methods; Gershgorin localization; and Bauer-Fike perturbation theory. See LinAlg API — Eigenvalues and Gershgorin disks for the API.

General Matrices: Hessenberg + Francis QR

The full eigenvalue computation for a general nonsymmetric $A$ proceeds in two stages.

Stage 1: Hessenberg Reduction

Two-sided Householder similarity transformations bring $A$ to upper Hessenberg form (only the subdiagonal and above can be nonzero):

$$Q^\top A Q = H, \quad H_{ij} = 0 \text{ for } i > j + 1.$$

$Q$ is the product of $n - 2$ Householders. Flop count is $\tfrac{10}{3} n^3$, reduced to $\tfrac{4}{3} n^3$ when symmetric (tridiagonalization). Hessenberg structure is preserved by subsequent QR sweeps, dropping each sweep to $O(n^2)$.

Stage 2: Francis Double-Shift QR

Implicit-shift QR sweeps run on the Hessenberg form. With shift $\sigma$,

$$H - \sigma I = Q R, \quad H' = R Q + \sigma I = Q^\top H Q,$$

preserving similarity and Hessenberg structure. When $|H_{n,n-1}|$ falls to machine precision the last eigenvalue is locked in, the matrix is deflated, and the procedure recurses on the leading principal submatrix.

Real nonsymmetric matrices have complex-conjugate eigenvalues that one complex shift cannot capture with real arithmetic. Francis double-shift applies a conjugate pair $(\sigma, \bar{\sigma})$ together in a single real sweep:

$$M = (H - \sigma I)(H - \bar{\sigma} I) = H^2 - 2 \mathrm{Re}(\sigma) H + |\sigma|^2 I.$$

Only the first column of $M$ is used to construct one Householder; the resulting bulge (the transient nonzero below the subdiagonal) is chased down by a chain of Givens rotations. The implicit Q theorem guarantees this is equivalent to the explicit QR step, eliminating the need to compute QR explicitly. Each sweep is $O(n^2)$ and yields two eigenvalues.

Shift Strategy

The standard shift is Wilkinson's — the eigenvalue of the trailing $2 \times 2$ block closer to $h_{nn}$. For nearly equal eigenvalues or cyclic behavior, ad hoc or random shifts are used to break the symmetry. On average $O(n)$ sweeps are needed and total cost is $O(n^3)$.

sangi's eigen calls LAPACK dgeev when MKL is available and uses an internal Hessenberg + Francis QR implementation otherwise.

Symmetric Matrices: Tridiagonalization and MR3

For $A = A^\top$, Hessenberg reduction automatically lands on symmetric tridiagonal form:

$$T = Q^\top A Q = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ \beta_1 & \alpha_2 & \beta_2 & \\ & \beta_2 & \ddots & \beta_{n-1} \\ & & \beta_{n-1} & \alpha_n \end{pmatrix}.$$

Several algorithms compute the eigensystem of $T$, with different strengths:

Symmetric QR Iteration

Implicit-shift QR on a tridiagonal matrix needs only one Givens to push out the $2 \times 2$ bulge, so each sweep costs $O(n)$. Full eigenvectors take $O(n^3)$.

Divide-and-Conquer

$T$ is split in the middle into two subtridiagonals, recursively decomposed, then merged via a secular equation. Cost is typically $O(n^{2.3})$, beating QR iteration for $n \gtrsim 100$. LAPACK DSTEDC.

MR3 (Multiple Relatively Robust Representations)

Dhillon and Parlett (2004) compute relatively robust $LDL^\top$ representations for each eigenvalue cluster of $T$, producing the eigenvector in $O(n)$ from each. The full eigensystem costs $O(n^2)$, the fastest known. LAPACK DSTEMR and Eigen's internal solver follow this approach.

Complexity comparison:

MethodEigenvalues onlyWith eigenvectorsAccuracy
QR iteration (DSTERF/DSTEQR)$O(n^2)$$O(n^3)$High
Divide-and-conquer (DSTEDC)$O(n^{2.3})$High
MR3 (DSTEMR)$O(n^2)$High (careful impl.)
Jacobi$O(n^3)$$O(n^3)$Highest (relative)

sangi's eigen_symmetric currently dispatches to LAPACK DSYEV (tridiagonalization + QR iteration) through MKL/OpenBLAS, and falls back to Jacobi when no BLAS backend is available. Switching to DSYEVR (MR3 / DSTEMR) or DSYEVD (divide-and-conquer) for large matrices is on the roadmap.

Hermitian Matrices

Eigenvalues of a complex Hermitian $A^H = A$ are real. sangi routes through LAPACK ZHEEV when available, and otherwise embeds the $n \times n$ Hermitian into a $2n \times 2n$ real symmetric and solves via eigen_symmetric.

Generalized Eigenvalues and QZ

The generalized eigenproblem $A \mathbf{x} = \lambda B \mathbf{x}$ reduces to $B^{-1} A \mathbf{x} = \lambda \mathbf{x}$ when $B$ is well-conditioned, but breaks down numerically when $B$ is ill-conditioned or singular. The QZ algorithm instead reduces $(A, B)$ jointly to upper triangular form:

$$Q^\top A Z = S, \quad Q^\top B Z = T, \quad S, T \text{ upper triangular.}$$

Eigenpairs are $(\sigma_i, \tau_i) = (S_{ii}, T_{ii})$ with $\lambda_i = \sigma_i / \tau_i$. Entries with $\tau_i = 0$ are infinite eigenvalues, naturally representing the null space of $B$.

QZ reduces $(A, B)$ to a Hessenberg-triangular pair, then runs QZ sweeps (Francis QR generalized) to diagonalize. LAPACK DGGEV. When $A$ is symmetric and $B$ symmetric positive definite, Cholesky $B = L L^\top$ converts the problem to the symmetric eigenproblem for $L^{-1} A L^{-\top}$ (sangi's eigen_generalized_symmetric).

Power, Inverse, and Rayleigh Quotient Iteration

Classical iterative methods for the extreme or near-target eigenpairs when full decomposition is unnecessary.

Power Iteration

$\mathbf{x}_{k+1} = A \mathbf{x}_k / \|A \mathbf{x}_k\|$ converges to the eigenvector of the dominant eigenvalue $\lambda_1$ at rate $|\lambda_2 / \lambda_1|^k$. Each iteration is $O(n^2)$ (or $O(\mathrm{nnz})$ for sparse $A$). Useful when the spectral gap is large. PageRank is essentially this method.

Inverse Iteration

Given shift $\mu$,

$$\mathbf{x}_{k+1} = (A - \mu I)^{-1} \mathbf{x}_k / \|(A - \mu I)^{-1} \mathbf{x}_k\|$$

converges to the eigenvector of the eigenvalue nearest $\mu$ at rate $|(\lambda_i - \mu)/(\lambda_j - \mu)|^k$. Once an approximate eigenvalue is in hand, inverse iteration refines the eigenvector quickly, making it the standard companion to QR iteration.

Rayleigh Quotient Iteration

For symmetric $A$, set the shift each iteration to the Rayleigh quotient of the current approximation:

$$\mu_k = \frac{\mathbf{v}_k^\top A \mathbf{v}_k}{\mathbf{v}_k^\top \mathbf{v}_k}.$$

If $\mathbf{v}_k$ is $O(\varepsilon)$ away from the true eigenvector, the Rayleigh quotient is $O(\varepsilon^2)$ away from the true eigenvalue, and the inverse step shifted by it produces $\mathbf{v}_{k+1}$ accurate to $O(\varepsilon^3)$. Cubic convergence makes it extraordinarily fast for symmetric refinement; nonsymmetric problems retain only quadratic convergence.

Krylov Methods: Lanczos and Arnoldi

When $n$ is huge and $A$ is sparse, even Hessenberg reduction at $O(n^3)$ is infeasible. Krylov methods project $A$ onto a much smaller subspace $\mathcal{K}_k(A, \mathbf{v}) = \mathrm{span}(\mathbf{v}, A\mathbf{v}, \ldots, A^{k-1}\mathbf{v})$ and extract $k \ll n$ extremal eigenvalues from the projection.

Lanczos (Symmetric)

For symmetric $A$, sequential orthogonalization of Krylov basis vectors reduces to a three-term recurrence:

$$A \mathbf{q}_j = \beta_{j-1} \mathbf{q}_{j-1} + \alpha_j \mathbf{q}_j + \beta_j \mathbf{q}_{j+1}.$$

The projection $T_k = Q_k^\top A Q_k$ is symmetric tridiagonal, and its eigenvalues (Ritz values) approximate the extremal eigenvalues of $A$. Each iteration is $O(n)$ ($O(\mathrm{nnz})$ for sparse) and $k$ iterations yield $k$ eigenvalues. In finite precision the orthogonality of the $\mathbf{q}_j$ deteriorates and selective reorthogonalization or Paige-style analysis is required.

Arnoldi (Nonsymmetric)

Nonsymmetric $A$ has no three-term recurrence; Arnoldi MGS-orthogonalizes against all prior basis vectors and produces an upper Hessenberg $H_k$. Iteration $k$ costs $O(nk)$ ($O(\mathrm{nnz} \cdot k)$ for sparse). Eigenvalues of $H_k$ are Ritz values approximating the extremal eigenvalues of $A$. Arnoldi also drives GMRES.

Shift-and-Invert

To target the smallest eigenvalues or eigenvalues near a specific point $\mu$, apply Krylov methods to $(A - \mu I)^{-1}$ instead of $A$. The eigenvalue map $\lambda \mapsto 1/(\lambda - \mu)$ amplifies eigenvalues near $\mu$, accelerating convergence. Each iteration requires one linear solve with $A - \mu I$; this LU factorization is often the dominant cost.

Restart Strategies

As the Krylov basis grows, orthogonalization and storage costs balloon. Implicitly Restarted Arnoldi (IRA) (Sorensen / ARPACK) retains converged Ritz vectors and compresses the basis to $m$ dimensions, then re-expands with $k - m$ new vectors. It is the standard for large-scale eigenvalue computation.

Gershgorin Disk Theorem

The Gershgorin inclusion theorem localizes eigenvalues without iteration:

$$\text{every eigenvalue} \in \bigcup_{i=1}^{n} D_i, \quad D_i = \left\{z \in \mathbb{C} : |z - a_{ii}| \leq \sum_{j \neq i} |a_{ij}|\right\}.$$

$D_i$ is the closed disk centered at the diagonal entry $a_{ii}$ with radius equal to the absolute off-diagonal row sum. A stronger version says: if $k$ disks form a connected component disjoint from the rest, that component contains exactly $k$ eigenvalues.

Applications

  • Positive-definiteness test: if every $D_i$ lies in the right half-plane, all eigenvalues have positive real part; for symmetric $A$ this implies positive definiteness.
  • Convergence prediction: diagonal dominance ($|a_{ii}| > \sum_{j \neq i} |a_{ij}|$) is a Gershgorin-based sufficient condition for $\rho(B_J) < 1$ in Jacobi iteration.
  • Shift selection: Gershgorin disks give cheap eigenvalue range estimates for shifted QR or shift-and-invert.

A column-version (Gershgorin applied to $A^\top$) also holds, and the intersection of row and column disks gives a tighter bound. sangi's Gershgorin disk function returns both row and column disks.

Perturbation and Conditioning (Bauer-Fike)

How input perturbations propagate to eigenvalues governs the numerical stability of any eigenvalue solver. For diagonalizable $A = V \Lambda V^{-1}$ the Bauer-Fike theorem bounds the perturbed eigenvalues $\hat{\lambda}$ of $A + E$:

$$\min_i |\hat{\lambda} - \lambda_i| \leq \kappa_2(V) \cdot \|E\|_2,$$

where $\kappa_2(V) = \|V\|_2 \|V^{-1}\|_2$ is the condition number of the eigenvector matrix.

Consequences:

  • Symmetric matrices: $V$ is orthogonal, $\kappa_2(V) = 1$, so $|\hat{\lambda} - \lambda_i| \leq \|E\|_2$ (Weyl's inequality, optimal).
  • Normal matrices: $V$ is unitary; same bound.
  • Nonnormal: $\kappa_2(V)$ can be enormous, leading to eigenvalues sensitive to small perturbations. Wilkinson's pathological matrices have $\kappa_2(V) \sim 10^{20}$.

Per-eigenvalue conditioning is captured by $s_i = 1 / |\mathbf{u}_i^\top \mathbf{v}_i|$ in terms of left and right eigenvectors $\mathbf{u}_i, \mathbf{v}_i$; $s_i \cdot \|E\|_2$ bounds the perturbation of $\lambda_i$. For nearly defective matrices (linearly dependent eigenvectors), $s_i$ blows up.

sangi's Choices

API / ScenarioMethodNotes
eigen, dense generalHessenberg + Francis double-shift QRMKL dgeev; internal fallback
eigen_symmetric, real symmetricTridiagonalization + QR iteration / JacobiMKL dsyev; Jacobi fallback (MR3/D&C switch planned)
eigen_hermitianTridiagonalization + QR iterationMKL zheev; real symmetric embedding fallback
eigen_generalizedQZ algorithmMKL dggev
eigen_generalized_symmetricCholesky $\to$ symmetric eigenproblemReduction to $L^{-1} A L^{-\top}$
power_methodPower iterationDominant eigenpair, $O(n^2)$/iter
Huge sparseLanczos / Arnoldi (Krylov)Shift-and-invert for interior eigenvalues
Localization onlyGershgorin disks$O(n^2)$, no iteration

References

  • Francis, J. G. F. (1961, 1962). "The QR Transformation, I and II". Computer J., 4, 265–271, 332–345.
  • Parlett, B. N. (1998). The Symmetric Eigenvalue Problem, SIAM (classic ed.).
  • Dhillon, I. S. and Parlett, B. N. (2004). "Orthogonal eigenvectors and relative gaps". SIAM J. Matrix Anal. Appl., 25(3), 858–899.
  • Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators". J. Res. Nat. Bur. Standards, 45, 255–282.
  • Arnoldi, W. E. (1951). "The principle of minimized iterations in the solution of the matrix eigenvalue problem". Quart. Appl. Math., 9, 17–29.
  • Bauer, F. L. and Fike, C. T. (1960). "Norms and exclusion theorems". Numer. Math., 2, 137–141.
  • Stewart, G. W. (2001). Matrix Algorithms, Volume II: Eigensystems, SIAM.