Chapter 23: Power Method and Inverse Iteration
Iterative Eigenvalue Algorithms
1. Introduction and Motivation
There are two broad approaches to numerically solving the eigenvalue problem $A\mathbf{v} = \lambda \mathbf{v}$ for an $N \times N$ matrix $A$.
- Methods that compute all eigenvalues simultaneously: the QR algorithm (Chapter 5)
- Iterative methods that selectively compute specific eigenvalues: the power method and its variants
This chapter covers approach (2) in detail. The power method is the most fundamental technique for finding the eigenvalue of largest absolute value, and inverse iteration is a variant that can efficiently find the eigenvalue closest to any target value.
Historical Note
The power method was systematized by Richard von Mises in 1929. It is also known as the power iteration or Von Mises iteration.
2. Power Method
2.1 Prerequisites
Prerequisites for the Power Method
- $A$ is diagonalizable (the eigenvectors form a basis for $\mathbb{R}^N$ or $\mathbb{C}^N$).
- When the eigenvalues are ordered by absolute value, the dominant eigenvalue is unique: $$|\lambda_1| > |\lambda_2| \geq |\lambda_3| \geq \cdots \geq |\lambda_N|$$
- When the initial vector $\mathbf{x}^{(0)}$ is expanded in the eigenvector basis $$\mathbf{x}^{(0)} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_N \mathbf{v}_N$$ the coefficient satisfies $c_1 \neq 0$.
Meaning of $c_1 \neq 0$
If the initial vector has no component in the $\mathbf{v}_1$ direction, the power method will not converge to $\lambda_1$. For symmetric matrices, the eigenvectors are orthogonal, so $c_1 = \mathbf{x}^{(0)} \cdot \mathbf{v}_1 / \|\mathbf{v}_1\|^2$, which is equivalent to $\mathbf{x}^{(0)} \cdot \mathbf{v}_1 \neq 0$. For general matrices, however, the eigenvectors are not orthogonal, so $c_1 \neq 0$ is the precise condition. In practice, choosing a random initial vector virtually guarantees $c_1 \neq 0$.
2.2 Algorithm
Power Method Algorithm
Input: $N \times N$ matrix $A$, initial vector $\mathbf{x}^{(0)}$ ($\|\mathbf{x}^{(0)}\| = 1$), convergence threshold $\varepsilon$, maximum iterations $k_{\max}$
- For $k = 1, 2, \ldots, k_{\max}$:
- $\mathbf{y}^{(k)} = A \mathbf{x}^{(k-1)}$ (matrix-vector product: $O(N^2)$)
- $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$ (normalization)
- $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$ (Rayleigh quotient for eigenvalue approximation)
- If $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$, terminate
Output: Eigenvalue approximation $\mu^{(k)}$, eigenvector approximation $\mathbf{x}^{(k)}$
3. Convergence Proof
3.1 Basic Property: $A^k \mathbf{v}_n = \lambda_n^k \mathbf{v}_n$
From the eigenvalue equation $A\mathbf{v}_n = \lambda_n \mathbf{v}_n$, the following holds by induction.
\begin{equation} A^k \mathbf{v}_n = \lambda_n^k \mathbf{v}_n \quad (k = 1, 2, 3, \ldots) \label{eq:Akvn} \end{equation}Proof: By induction on $k$. The case $k = 1$ holds by definition. Assuming the result holds for $k-1$,
$$A^k \mathbf{v}_n = A(A^{k-1} \mathbf{v}_n) = A(\lambda_n^{k-1} \mathbf{v}_n) = \lambda_n^{k-1} (A\mathbf{v}_n) = \lambda_n^{k-1} (\lambda_n \mathbf{v}_n) = \lambda_n^k \mathbf{v}_n$$3.2 Convergence via Eigenvector Expansion
Convergence Theorem for the Power Method
Suppose $A$ is diagonalizable with $|\lambda_1| > |\lambda_2| \geq \cdots \geq |\lambda_N|$. If $c_1 \neq 0$ in the eigenvector expansion of the initial vector, then the sequence $\{\mathbf{x}^{(k)}\}$ generated by the normalized power method converges to the direction of the eigenvector corresponding to $\lambda_1$.
Proof: Expand the initial vector in the eigenvector basis.
$$\mathbf{x}^{(0)} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_N \mathbf{v}_N, \quad c_1 \neq 0$$Applying $A^k$ and using equation $\eqref{eq:Akvn}$,
\begin{align} A^k \mathbf{x}^{(0)} &= c_1 \lambda_1^k \mathbf{v}_1 + c_2 \lambda_2^k \mathbf{v}_2 + \cdots + c_N \lambda_N^k \mathbf{v}_N \\ &= c_1 \lambda_1^k \left[ \mathbf{v}_1 + \frac{c_2}{c_1}\left(\frac{\lambda_2}{\lambda_1}\right)^k \mathbf{v}_2 + \cdots + \frac{c_N}{c_1}\left(\frac{\lambda_N}{\lambda_1}\right)^k \mathbf{v}_N \right] \label{eq:expansion} \end{align}Since $|\lambda_1| > |\lambda_i|$ for $i \geq 2$, we have $|\lambda_i / \lambda_1| < 1$, so
$$\lim_{k \to \infty} \left(\frac{\lambda_i}{\lambda_1}\right)^k = 0 \quad (i = 2, 3, \ldots, N)$$Therefore, as $k \to \infty$ the bracketed expression converges to $\mathbf{v}_1$, and after normalization,
$$\lim_{k \to \infty} \frac{A^k \mathbf{x}^{(0)}}{\|A^k \mathbf{x}^{(0)}\|} = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|}$$$\square$
3.3 Eigenvalue Extraction
With the normalization $\|\mathbf{x}^{(k)}\| = 1$, the eigenvalue can be approximated via the Rayleigh quotient.
$$\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$$Since $\mathbf{x}^{(k)} \to \mathbf{v}_1 / \|\mathbf{v}_1\|$, we have $A\mathbf{x}^{(k)} \to \lambda_1 \mathbf{x}^{(k)}$, so
$$\lim_{k \to \infty} \mu^{(k)} = \lambda_1 \|\mathbf{x}^{(k)}\|^2 = \lambda_1$$3.4 Necessity of Normalization
If $|\lambda_1| \neq 1$, iterating without normalization causes overflow ($|\lambda_1| > 1$) or underflow ($|\lambda_1| < 1$) due to the factor $\lambda_1^k$. Normalization at each step prevents this.
4. Convergence Rate Analysis
Convergence Rate of the Power Method
Eigenvector convergence: The error between $\mathbf{x}^{(k)}$ and the true eigenvector is
$$\|\mathbf{x}^{(k)} - (\pm \mathbf{v}_1 / \|\mathbf{v}_1\|)\| = O\!\left(\left|\frac{\lambda_2}{\lambda_1}\right|^k\right)$$Eigenvalue convergence: The error in the Rayleigh quotient $\mu^{(k)}$ is
$$|\mu^{(k)} - \lambda_1| = O\!\left(\left|\frac{\lambda_2}{\lambda_1}\right|^{2k}\right)$$The Rayleigh quotient has second-order accuracy with respect to the eigenvector error, so the eigenvalue converges twice as fast as the eigenvector.
The convergence ratio $r = |\lambda_2/\lambda_1|$ takes values between 0 and 1; the smaller $r$, the faster the convergence. The number of iterations required to reduce the relative eigenvalue error below $\varepsilon$ is
$$k \geq \frac{\log \varepsilon}{2 \log r}$$Iteration Count Estimate
When $\lambda_1 = 10,\ \lambda_2 = 9$, we have $r = 0.9$. To achieve 6-digit accuracy ($\varepsilon = 10^{-6}$),
$$k \geq \frac{-6}{2 \log_{10} 0.9} = \frac{6}{2 \times 0.0458} \approx 66 \text{ iterations}$$When $|\lambda_2 / \lambda_1|$ is close to 1, many iterations are required.
5. Inverse Power Method (Inverse Iteration)
5.1 Principle
If $A$ has eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_N$, then $A^{-1}$ has eigenvalues $1/\lambda_1, 1/\lambda_2, \ldots, 1/\lambda_N$ with the same eigenvectors. Applying the power method to $A^{-1}$ therefore converges to the eigenvector corresponding to the largest $|1/\lambda_i|$, i.e., the smallest $|\lambda_i|$.
Inverse Iteration
Apply the power method to $A^{-1}$, i.e., iterate $\mathbf{y}^{(k)} = A^{-1} \mathbf{x}^{(k-1)}$. In practice, instead of computing the inverse explicitly, solve the linear system $A\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$.
5.2 Algorithm
Inverse Iteration Algorithm
Preprocessing: Compute $A = LU$ (LU decomposition, once: $O(N^3)$)
- For $k = 1, 2, \ldots$:
- Solve $L\mathbf{z} = \mathbf{x}^{(k-1)}$ by forward substitution ($O(N^2)$)
- Solve $U\mathbf{y}^{(k)} = \mathbf{z}$ by back substitution ($O(N^2)$)
- $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
- $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$ (Rayleigh quotient)
Output: Eigenvalue of smallest absolute value $\mu^{(k)}$ and corresponding eigenvector $\mathbf{x}^{(k)}$
Computational Cost
The LU decomposition is performed once at $O(N^3)$ cost, and each iteration requires only $O(N^2)$ for forward and back substitution. This achieves the same per-iteration cost as the power method while computing the eigenvalue of smallest absolute value.
6. Shifted Inverse Power Method
6.1 Introducing the Shift
Introduce a target shift value $\sigma$ and apply the power method to $(A - \sigma I)^{-1}$. Since the eigenvalues of $(A - \sigma I)$ are $\lambda_i - \sigma$, the eigenvalues of $(A - \sigma I)^{-1}$ are
$$\frac{1}{\lambda_i - \sigma}$$For the eigenvalue $\lambda_j$ closest to $\sigma$, $|1/(\lambda_j - \sigma)|$ is the largest, so the power method converges to the eigenvector of $\lambda_j$.
Shifted Inverse Iteration
Finds the eigenvalue closest to the shift $\sigma$ and its eigenvector. At each iteration, solve the linear system
$$(A - \sigma I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$$and normalize $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$.
6.2 Algorithm
Shifted Inverse Iteration Algorithm
Input: $A$, shift $\sigma$, $\mathbf{x}^{(0)}$ ($\|\mathbf{x}^{(0)}\| = 1$), $\varepsilon$
Preprocessing: Compute $A - \sigma I = LU$ (LU decomposition, once)
- For $k = 1, 2, \ldots$:
- Solve $(A - \sigma I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$ by forward/back substitution
- $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
- $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$
- If $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$, terminate
Output: Eigenvalue closest to $\sigma$: $\mu^{(k)}$, and eigenvector $\mathbf{x}^{(k)}$
6.3 Convergence Rate
Convergence Rate of Shifted Inverse Iteration
Let $\lambda_j$ be the eigenvalue closest to $\sigma$, and $\lambda_{\text{next}}$ the second closest. The eigenvector convergence ratio is
$$r = \left|\frac{\lambda_j - \sigma}{\lambda_{\text{next}} - \sigma}\right|$$The closer $\sigma$ is to $\lambda_j$, the smaller $r$ becomes, accelerating convergence.
Choosing the Shift
- Use the Gershgorin circle theorem to estimate the approximate range of eigenvalues
- Use approximate eigenvalues obtained during the QR algorithm
- Use diagonal elements after tridiagonalization (Householder transformation)
In LAPACK, the standard strategy is to compute eigenvalues with the QR algorithm and then compute eigenvectors using shifted inverse iteration.
Shift Estimation via the Gershgorin Circle Theorem
For $A = \begin{pmatrix}2&1\\1&3\end{pmatrix}$, the Gershgorin discs are:
- Row 1: center $2$, radius $|1| = 1$ → disc $[1, 3]$
- Row 2: center $3$, radius $|1| = 1$ → disc $[2, 4]$
All eigenvalues lie in $[1, 4]$. To target the second eigenvalue, choose $\sigma = 1.3$ (near the left end); to target the first, choose $\sigma = 3.5$ (near the right end).
7. Rayleigh Quotient Iteration
7.1 Properties of the Rayleigh Quotient
Rayleigh Quotient
$$R(\mathbf{x}) = \frac{\mathbf{x}^T A \mathbf{x}}{\mathbf{x}^T \mathbf{x}}$$When $\mathbf{x}$ coincides with an eigenvector $\mathbf{v}_j$, $R(\mathbf{v}_j) = \lambda_j$.
Second-Order Approximation Property of the Rayleigh Quotient
When $\mathbf{x} = \mathbf{v}_j + \boldsymbol{\varepsilon}$,
$$R(\mathbf{x}) = \lambda_j + O(\|\boldsymbol{\varepsilon}\|^2)$$That is, the Rayleigh quotient has second-order accuracy with respect to the eigenvector error.
7.2 Algorithm
Update the shift $\sigma$ in shifted inverse iteration at every step using the Rayleigh quotient.
Rayleigh Quotient Iteration
- For $k = 1, 2, \ldots$:
- $\sigma^{(k)} = R(\mathbf{x}^{(k-1)}) = (\mathbf{x}^{(k-1)})^T A \mathbf{x}^{(k-1)}$ (update the shift)
- Solve $(A - \sigma^{(k)} I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$ (since $\sigma^{(k)}$ changes each iteration, the LU factorization must be recomputed)
- $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
- If $|\sigma^{(k)} - \sigma^{(k-1)}| < \varepsilon$, terminate
Convergence Rate of Rayleigh Quotient Iteration
- General matrices: quadratic convergence
- Symmetric matrices: cubic convergence
For symmetric matrices, machine precision is typically reached in 3–4 iterations.
Advantages and Disadvantages
- Advantage: Extremely fast convergence (cubic for symmetric matrices)
- Disadvantage: Since the shift changes at each step, the LU decomposition must be recomputed every iteration ($O(N^3)$/iteration)
- Disadvantage: Which eigenvalue it converges to depends on the initial vector and cannot be predicted in advance
In practice, it is effective to perform a few steps of shifted inverse iteration to obtain an approximate eigenvector, then switch to Rayleigh quotient iteration.
8. Numerical Examples
8.1 Power Method
Example 1: Power Method on a 2×2 Matrix
Consider the matrix
$$A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$$Its eigenvalues are $\lambda_1 = \frac{5+\sqrt{5}}{2} \approx 3.618$ and $\lambda_2 = \frac{5-\sqrt{5}}{2} \approx 1.382$, giving a convergence ratio of $|\lambda_2/\lambda_1| \approx 0.382$.
Starting with $\mathbf{x}^{(0)} = (1/\sqrt{2},\ 1/\sqrt{2})^T$, the iteration produces:
| $k$ | $\mathbf{x}^{(k)}$ | $\mu^{(k)}$ | $|\mu^{(k)} - \lambda_1|$ |
|---|---|---|---|
| 0 | $(0.7071,\ 0.7071)$ | $3.5000$ | $1.2 \times 10^{-1}$ |
| 1 | $(0.6000,\ 0.8000)$ | $3.6000$ | $1.8 \times 10^{-2}$ |
| 2 | $(0.5547,\ 0.8321)$ | $3.6154$ | $2.7 \times 10^{-3}$ |
| 3 | $(0.5369,\ 0.8436)$ | $3.6176$ | $3.8 \times 10^{-4}$ |
| 4 | $(0.5300,\ 0.8480)$ | $3.6180$ | $5.5 \times 10^{-5}$ |
The error in $\mu^{(k)}$ decreases by a factor of approximately $0.382^2 \approx 0.146$ per iteration, confirming the theoretical convergence rate.
8.2 Shifted Inverse Iteration
Example 2: Shifted Inverse Iteration
For the same matrix $A$, with shift $\sigma = 1.3$, we target the second eigenvalue $\lambda_2 \approx 1.382$.
Since $\sigma = 1.3$ is close to $\lambda_2$, the convergence ratio is
$$r = \left|\frac{\lambda_2 - \sigma}{\lambda_1 - \sigma}\right| = \left|\frac{1.382 - 1.3}{3.618 - 1.3}\right| = \frac{0.082}{2.318} \approx 0.035$$which is very small, so convergence occurs in 2–3 iterations to high accuracy. Compared to the power method's $r = 0.382$, the effect of shift selection on convergence acceleration is dramatic.
8.3 Rayleigh Quotient Iteration
Example 3: Cubic Convergence of Rayleigh Quotient Iteration
Applying Rayleigh quotient iteration to the same matrix $A$ with initial vector $\mathbf{x}^{(0)} = (1/\sqrt{2},\ 1/\sqrt{2})^T$:
| $k$ | $\sigma^{(k)}$ | $|\sigma^{(k)} - \lambda_1|$ |
|---|---|---|
| 0 | $3.5000$ | $1.2 \times 10^{-1}$ |
| 1 | $3.6176$ | $3.9 \times 10^{-4}$ |
| 2 | $3.618033988738$ | $1.2 \times 10^{-11}$ |
| 3 | $3.618033988749895$ | $\approx 10^{-16}$ |
With cubic convergence, the number of correct digits roughly triples each iteration. Indeed, the correct digits progress as $1 \to 3 \to 11 \to 16$ (machine precision), reaching full accuracy in just 3 iterations.
9. Convergence Visualization
9.1 Power Method Convergence Graph
9.2 Comparison of Three Methods
10. Implementation Notes
10.1 Choice of Initial Vector
Using a random vector as the initial value virtually guarantees $c_1 \neq 0$. In practice, a random initial vector is sufficient, but if prior information is available, it can be exploited to reduce the number of iterations.
10.2 Convergence Criteria
Either of the following can be used:
- Eigenvalue change: $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$
- Residual: $\|A\mathbf{x}^{(k)} - \mu^{(k)}\mathbf{x}^{(k)}\| < \varepsilon$ (residual norm)
Residual-based criteria are more reliable. When the eigenvalue is very small, absolute error is insufficient, so implementations often use the relative error
$$\frac{|\mu^{(k)} - \mu^{(k-1)}|}{\max(1,\, |\mu^{(k)}|)} < \varepsilon$$10.3 When the Power Method Fails
Non-convergent Cases
- $|\lambda_1| = |\lambda_2|$: For example, $\lambda_1 = -\lambda_2$ (opposite signs) or $\lambda_1 = \overline{\lambda_2}$ (complex conjugates) cause the power method to oscillate without converging.
- $A$ is not diagonalizable: When Jordan blocks are present, convergence is not guaranteed.
In these cases, use the QR algorithm (Chapter 5) or the Lanczos method.
10.4 Computational Cost Comparison
| Method | Preprocessing | Per Iteration | Convergence Rate |
|---|---|---|---|
| Power Method | None | $O(N^2)$ | Linear $|\lambda_2/\lambda_1|^k$ |
| Inverse Iteration | LU decomp. $O(N^3)$ | $O(N^2)$ | Linear $|\lambda_{N-1}/\lambda_N|^k$ |
| Shifted Inverse Iter. | LU decomp. $O(N^3)$ | $O(N^2)$ | Linear ($\sigma$-dependent) |
| Rayleigh Quotient Iter. | None | $O(N^3)$ | Quadratic / Cubic (symmetric) |
10.5 Relationship to Other Methods
- QR Algorithm: Computes all eigenvalues simultaneously. Can be viewed as an implicit parallel execution of the power method.
- Lanczos Method: Eigenvalue computation for large sparse matrices. An extension of the power method to Krylov subspaces.
- LAPACK Strategy: Compute eigenvalues with the QR algorithm, then compute eigenvectors using shifted inverse iteration.
11. Summary
- The power method is the most fundamental technique for finding the eigenvalue of largest absolute value. Its convergence rate is $O(|\lambda_2/\lambda_1|^k)$, and convergence is slow when $|\lambda_2/\lambda_1|$ is close to 1.
- Inverse iteration applies the power method to $A^{-1}$ to find the eigenvalue of smallest absolute value.
- Shifted inverse iteration applies the power method to $(A - \sigma I)^{-1}$ to find the eigenvalue closest to $\sigma$. This is the standard LAPACK strategy for computing eigenvectors.
- Rayleigh quotient iteration dynamically updates the shift using the Rayleigh quotient, achieving cubic convergence for symmetric matrices.
References
- G. H. Golub, C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
- L. N. Trefethen, D. Bau III, Numerical Linear Algebra, SIAM, 1997.
- B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, 1998.
- Y. Saad, Numerical Methods for Large Eigenvalue Problems, Revised ed., SIAM, 2011.
- Wikipedia: Power iteration