What is Francis double-shift QR?

A technique that applies a pair of complex-conjugate shifts (sigma, sigma_bar) simultaneously to a real nonsymmetric Hessenberg matrix, so all arithmetic stays in real numbers while still converging on complex-conjugate eigenvalues. A single Householder forms the bulge and a chain of Givens rotations chases it down the subdiagonal. Each sweep costs O(n^2) and produces two eigenvalues. LAPACK's DHSEQR implements it; sangi uses an equivalent internal implementation when MKL is unavailable.

What is the implicit Q theorem?

In a Hessenberg reduction H = Q^T A Q, if the first column of Q is fixed and H has nonzero subdiagonal entries, then the remaining columns of Q are essentially unique. This is the mathematical basis for implicit-shift QR: only the first Givens rotation after the shift needs to be specified explicitly, and the rest of the sweep unfolds automatically via bulge chasing. Without this theorem each step would require an explicit QR factorization.

Multiple Relatively Robust Representations, by Dhillon and Parlett (2004). For each eigenvalue cluster of a symmetric tridiagonal matrix, MR3 picks an LDL^T representation whose relative roundoff cannot destroy the relative accuracy of nearby eigenvalues, then computes the corresponding eigenvector in O(n). Overall cost is O(n^2) for the full eigensystem — faster than QR iteration's O(n^3) or Jacobi's O(n^3). LAPACK's DSTEMR uses MR3.

Why is Rayleigh quotient iteration cubically convergent?

For a symmetric matrix, if v_k is O(ε) away from the true eigenvector, then the Rayleigh quotient v_k^T A v_k / v_k^T v_k is only O(ε^2) away from the true eigenvalue. The next inverse-iteration step runs with that O(ε^2)-off shift, producing v_{k+1} accurate to O(ε^3). Each iteration triples the precision exponent. The analysis breaks down for nonsymmetric matrices, where convergence reduces to quadratic.

Lanczos vs Arnoldi — what's the difference?

Both build an orthonormal basis of the Krylov subspace K_k(A, v). Lanczos applies to symmetric A and reduces to a three-term recurrence in which only the last two basis vectors need re-orthogonalization (O(n) per iteration). Arnoldi works for nonsymmetric A and produces an upper Hessenberg matrix; full MGS against all previous vectors costs O(nk) at iteration k. Lanczos is used for symmetric extremal eigenvalues; Arnoldi underpins GMRES and Krylov-Schur eigenvalue solvers for nonsymmetric problems.

Eigenvalue Decomposition — Hessenberg, Francis QR, Tridiagonal, Krylov

Overview

Eigenvalue decomposition extracts the intrinsic invariants of a linear operator and appears throughout numerical linear algebra: differential equations, vibrational modes, graph Laplacians, quantum spectra, principal components (technically SVD). For a square $A \in \mathbb{R}^{n \times n}$, eigenpairs $(\lambda, \mathbf{v})$ satisfy

$$A \mathbf{v} = \lambda \mathbf{v}.$$

When $A$ is diagonalizable, $A = V \Lambda V^{-1}$; when symmetric, $A = V \Lambda V^\top$ with $V$ orthogonal. Although the problem is stated for square matrices, the algorithm splits sharply by structure (general, symmetric, Hermitian, Hessenberg, banded, sparse).

This page covers standard methods for general, symmetric, and generalized eigenvalue problems; power-iteration variants; Krylov methods; Gershgorin localization; and Bauer-Fike perturbation theory. See LinAlg API — Eigenvalues and Gershgorin disks for the API.

General Matrices: Hessenberg + Francis QR

The full eigenvalue computation for a general nonsymmetric $A$ proceeds in two stages.

Stage 1: Hessenberg Reduction

Two-sided Householder similarity transformations bring $A$ to upper Hessenberg form (only the subdiagonal and above can be nonzero):

$$Q^\top A Q = H, \quad H_{ij} = 0 \text{ for } i > j + 1.$$

$Q$ is the product of $n - 2$ Householders. Flop count is $\tfrac{10}{3} n^3$, reduced to $\tfrac{4}{3} n^3$ when symmetric (tridiagonalization). Hessenberg structure is preserved by subsequent QR sweeps, dropping each sweep to $O(n^2)$.

Stage 2: Francis Double-Shift QR

Implicit-shift QR sweeps run on the Hessenberg form. With shift $\sigma$,

$$H - \sigma I = Q R, \quad H' = R Q + \sigma I = Q^\top H Q,$$

preserving similarity and Hessenberg structure. When $|H_{n,n-1}|$ falls to machine precision the last eigenvalue is locked in, the matrix is deflated, and the procedure recurses on the leading principal submatrix.

Real nonsymmetric matrices have complex-conjugate eigenvalues that one complex shift cannot capture with real arithmetic. Francis double-shift applies a conjugate pair $(\sigma, \bar{\sigma})$ together in a single real sweep:

$$M = (H - \sigma I)(H - \bar{\sigma} I) = H^2 - 2 \mathrm{Re}(\sigma) H + |\sigma|^2 I.$$

Only the first column of $M$ is used to construct one Householder; the resulting bulge (the transient nonzero below the subdiagonal) is chased down by a chain of Givens rotations. The implicit Q theorem guarantees this is equivalent to the explicit QR step, eliminating the need to compute QR explicitly. Each sweep is $O(n^2)$ and yields two eigenvalues.

Shift Strategy

The standard shift is Wilkinson's — the eigenvalue of the trailing $2 \times 2$ block closer to $h_{nn}$. For nearly equal eigenvalues or cyclic behavior, ad hoc or random shifts are used to break the symmetry. On average $O(n)$ sweeps are needed and total cost is $O(n^3)$.

sangi's eigen calls LAPACK dgeev when MKL is available and uses an internal Hessenberg + Francis QR implementation otherwise.

Symmetric Matrices: Tridiagonalization and MR3

For $A = A^\top$, Hessenberg reduction automatically lands on symmetric tridiagonal form:

$$T = Q^\top A Q = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ \beta_1 & \alpha_2 & \beta_2 & \\ & \beta_2 & \ddots & \beta_{n-1} \\ & & \beta_{n-1} & \alpha_n \end{pmatrix}.$$

Several algorithms compute the eigensystem of $T$, with different strengths:

Symmetric QR Iteration

Implicit-shift QR on a tridiagonal matrix needs only one Givens to push out the $2 \times 2$ bulge, so each sweep costs $O(n)$. Full eigenvectors take $O(n^3)$.

Divide-and-Conquer

$T$ is split in the middle into two subtridiagonals, recursively decomposed, then merged via a secular equation. Cost is typically $O(n^{2.3})$, beating QR iteration for $n \gtrsim 100$. LAPACK DSTEDC.

MR3 (Multiple Relatively Robust Representations)

Dhillon and Parlett (2004) compute relatively robust $LDL^\top$ representations for each eigenvalue cluster of $T$, producing the eigenvector in $O(n)$ from each. The full eigensystem costs $O(n^2)$, the fastest known. LAPACK DSTEMR and Eigen's internal solver follow this approach.

Complexity comparison:

Method	Eigenvalues only	With eigenvectors	Accuracy
QR iteration (`DSTERF`/`DSTEQR`)	$O(n^2)$	$O(n^3)$	High
Divide-and-conquer (`DSTEDC`)	—	$O(n^{2.3})$	High
MR3 (`DSTEMR`)	—	$O(n^2)$	High (careful impl.)
Jacobi	$O(n^3)$	$O(n^3)$	Highest (relative)

sangi's eigen_symmetric currently dispatches to LAPACK DSYEV (tridiagonalization + QR iteration) through MKL/OpenBLAS, and falls back to Jacobi when no BLAS backend is available. Switching to DSYEVR (MR3 / DSTEMR) or DSYEVD (divide-and-conquer) for large matrices is on the roadmap.

Hermitian Matrices

Eigenvalues of a complex Hermitian $A^H = A$ are real. sangi routes through LAPACK ZHEEV when available, and otherwise embeds the $n \times n$ Hermitian into a $2n \times 2n$ real symmetric and solves via eigen_symmetric.

Generalized Eigenvalues and QZ

The generalized eigenproblem $A \mathbf{x} = \lambda B \mathbf{x}$ reduces to $B^{-1} A \mathbf{x} = \lambda \mathbf{x}$ when $B$ is well-conditioned, but breaks down numerically when $B$ is ill-conditioned or singular. The QZ algorithm instead reduces $(A, B)$ jointly to upper triangular form:

$$Q^\top A Z = S, \quad Q^\top B Z = T, \quad S, T \text{ upper triangular.}$$

Eigenpairs are $(\sigma_i, \tau_i) = (S_{ii}, T_{ii})$ with $\lambda_i = \sigma_i / \tau_i$. Entries with $\tau_i = 0$ are infinite eigenvalues, naturally representing the null space of $B$.

QZ reduces $(A, B)$ to a Hessenberg-triangular pair, then runs QZ sweeps (Francis QR generalized) to diagonalize. LAPACK DGGEV. When $A$ is symmetric and $B$ symmetric positive definite, Cholesky $B = L L^\top$ converts the problem to the symmetric eigenproblem for $L^{-1} A L^{-\top}$ (sangi's eigen_generalized_symmetric).

Power, Inverse, and Rayleigh Quotient Iteration

Classical iterative methods for the extreme or near-target eigenpairs when full decomposition is unnecessary.

Power Iteration

$\mathbf{x}_{k+1} = A \mathbf{x}_k / \|A \mathbf{x}_k\|$ converges to the eigenvector of the dominant eigenvalue $\lambda_1$ at rate $|\lambda_2 / \lambda_1|^k$. Each iteration is $O(n^2)$ (or $O(\mathrm{nnz})$ for sparse $A$). Useful when the spectral gap is large. PageRank is essentially this method.

Inverse Iteration

Given shift $\mu$,

$$\mathbf{x}_{k+1} = (A - \mu I)^{-1} \mathbf{x}_k / \|(A - \mu I)^{-1} \mathbf{x}_k\|$$

converges to the eigenvector of the eigenvalue nearest $\mu$ at rate $|(\lambda_i - \mu)/(\lambda_j - \mu)|^k$. Once an approximate eigenvalue is in hand, inverse iteration refines the eigenvector quickly, making it the standard companion to QR iteration.

Rayleigh Quotient Iteration

For symmetric $A$, set the shift each iteration to the Rayleigh quotient of the current approximation:

$$\mu_k = \frac{\mathbf{v}_k^\top A \mathbf{v}_k}{\mathbf{v}_k^\top \mathbf{v}_k}.$$

If $\mathbf{v}_k$ is $O(\varepsilon)$ away from the true eigenvector, the Rayleigh quotient is $O(\varepsilon^2)$ away from the true eigenvalue, and the inverse step shifted by it produces $\mathbf{v}_{k+1}$ accurate to $O(\varepsilon^3)$. Cubic convergence makes it extraordinarily fast for symmetric refinement; nonsymmetric problems retain only quadratic convergence.

See also: Power Method / Inverse Iteration

Krylov Methods: Lanczos and Arnoldi

When $n$ is huge and $A$ is sparse, even Hessenberg reduction at $O(n^3)$ is infeasible. Krylov methods project $A$ onto a much smaller subspace $\mathcal{K}_k(A, \mathbf{v}) = \mathrm{span}(\mathbf{v}, A\mathbf{v}, \ldots, A^{k-1}\mathbf{v})$ and extract $k \ll n$ extremal eigenvalues from the projection.

Lanczos (Symmetric)

For symmetric $A$, sequential orthogonalization of Krylov basis vectors reduces to a three-term recurrence:

$$A \mathbf{q}_j = \beta_{j-1} \mathbf{q}_{j-1} + \alpha_j \mathbf{q}_j + \beta_j \mathbf{q}_{j+1}.$$

The projection $T_k = Q_k^\top A Q_k$ is symmetric tridiagonal, and its eigenvalues (Ritz values) approximate the extremal eigenvalues of $A$. Each iteration is $O(n)$ ($O(\mathrm{nnz})$ for sparse) and $k$ iterations yield $k$ eigenvalues. In finite precision the orthogonality of the $\mathbf{q}_j$ deteriorates and selective reorthogonalization or Paige-style analysis is required.

Arnoldi (Nonsymmetric)

Nonsymmetric $A$ has no three-term recurrence; Arnoldi MGS-orthogonalizes against all prior basis vectors and produces an upper Hessenberg $H_k$. Iteration $k$ costs $O(nk)$ ($O(\mathrm{nnz} \cdot k)$ for sparse). Eigenvalues of $H_k$ are Ritz values approximating the extremal eigenvalues of $A$. Arnoldi also drives GMRES.

Shift-and-Invert

To target the smallest eigenvalues or eigenvalues near a specific point $\mu$, apply Krylov methods to $(A - \mu I)^{-1}$ instead of $A$. The eigenvalue map $\lambda \mapsto 1/(\lambda - \mu)$ amplifies eigenvalues near $\mu$, accelerating convergence. Each iteration requires one linear solve with $A - \mu I$; this LU factorization is often the dominant cost.

Restart Strategies

As the Krylov basis grows, orthogonalization and storage costs balloon. Implicitly Restarted Arnoldi (IRA) (Sorensen / ARPACK) retains converged Ritz vectors and compresses the basis to $m$ dimensions, then re-expands with $k - m$ new vectors. It is the standard for large-scale eigenvalue computation.

Gershgorin Disk Theorem

The Gershgorin inclusion theorem localizes eigenvalues without iteration:

$$\text{every eigenvalue} \in \bigcup_{i=1}^{n} D_i, \quad D_i = \left\{z \in \mathbb{C} : |z - a_{ii}| \leq \sum_{j \neq i} |a_{ij}|\right\}.$$

$D_i$ is the closed disk centered at the diagonal entry $a_{ii}$ with radius equal to the absolute off-diagonal row sum. A stronger version says: if $k$ disks form a connected component disjoint from the rest, that component contains exactly $k$ eigenvalues.

Applications

Positive-definiteness test: if every $D_i$ lies in the right half-plane, all eigenvalues have positive real part; for symmetric $A$ this implies positive definiteness.
Convergence prediction: diagonal dominance ($|a_{ii}| > \sum_{j \neq i} |a_{ij}|$) is a Gershgorin-based sufficient condition for $\rho(B_J) < 1$ in Jacobi iteration.
Shift selection: Gershgorin disks give cheap eigenvalue range estimates for shifted QR or shift-and-invert.

A column-version (Gershgorin applied to $A^\top$) also holds, and the intersection of row and column disks gives a tighter bound. sangi's Gershgorin disk function returns both row and column disks.

Perturbation and Conditioning (Bauer-Fike)

How input perturbations propagate to eigenvalues governs the numerical stability of any eigenvalue solver. For diagonalizable $A = V \Lambda V^{-1}$ the Bauer-Fike theorem bounds the perturbed eigenvalues $\hat{\lambda}$ of $A + E$:

$$\min_i |\hat{\lambda} - \lambda_i| \leq \kappa_2(V) \cdot \|E\|_2,$$

where $\kappa_2(V) = \|V\|_2 \|V^{-1}\|_2$ is the condition number of the eigenvector matrix.

Consequences:

Symmetric matrices: $V$ is orthogonal, $\kappa_2(V) = 1$, so $|\hat{\lambda} - \lambda_i| \leq \|E\|_2$ (Weyl's inequality, optimal).
Normal matrices: $V$ is unitary; same bound.
Nonnormal: $\kappa_2(V)$ can be enormous, leading to eigenvalues sensitive to small perturbations. Wilkinson's pathological matrices have $\kappa_2(V) \sim 10^{20}$.

Per-eigenvalue conditioning is captured by $s_i = 1 / |\mathbf{u}_i^\top \mathbf{v}_i|$ in terms of left and right eigenvectors $\mathbf{u}_i, \mathbf{v}_i$; $s_i \cdot \|E\|_2$ bounds the perturbation of $\lambda_i$. For nearly defective matrices (linearly dependent eigenvectors), $s_i$ blows up.

sangi's Choices

API / Scenario	Method	Notes
`eigen`, dense general	Hessenberg + Francis double-shift QR	MKL `dgeev`; internal fallback
`eigen_symmetric`, real symmetric	Tridiagonalization + QR iteration / Jacobi	MKL `dsyev`; Jacobi fallback (MR3/D&C switch planned)
`eigen_hermitian`	Tridiagonalization + QR iteration	MKL `zheev`; real symmetric embedding fallback
`eigen_generalized`	QZ algorithm	MKL `dggev`
`eigen_generalized_symmetric`	Cholesky $\to$ symmetric eigenproblem	Reduction to $L^{-1} A L^{-\top}$
`power_method`	Power iteration	Dominant eigenpair, $O(n^2)$/iter
Huge sparse	Lanczos / Arnoldi (Krylov)	Shift-and-invert for interior eigenvalues
Localization only	Gershgorin disks	$O(n^2)$, no iteration

References

Francis, J. G. F. (1961, 1962). "The QR Transformation, I and II". Computer J., 4, 265–271, 332–345.
Parlett, B. N. (1998). The Symmetric Eigenvalue Problem, SIAM (classic ed.).
Dhillon, I. S. and Parlett, B. N. (2004). "Orthogonal eigenvectors and relative gaps". SIAM J. Matrix Anal. Appl., 25(3), 858–899.
Lanczos, C. (1950). "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators". J. Res. Nat. Bur. Standards, 45, 255–282.
Arnoldi, W. E. (1951). "The principle of minimized iterations in the solution of the matrix eigenvalue problem". Quart. Appl. Math., 9, 17–29.
Bauer, F. L. and Fike, C. T. (1960). "Norms and exclusion theorems". Numer. Math., 2, 137–141.
Stewart, G. W. (2001). Matrix Algorithms, Volume II: Eigensystems, SIAM.