Chapter 23: Power Method and Inverse Iteration

Iterative Eigenvalue Algorithms

1. Introduction and Motivation

There are two broad approaches to numerically solving the eigenvalue problem $A\mathbf{v} = \lambda \mathbf{v}$ for an $N \times N$ matrix $A$.

  1. Methods that compute all eigenvalues simultaneously: the QR algorithm (Chapter 5)
  2. Iterative methods that selectively compute specific eigenvalues: the power method and its variants

This chapter covers approach (2) in detail. The power method is the most fundamental technique for finding the eigenvalue of largest absolute value, and inverse iteration is a variant that can efficiently find the eigenvalue closest to any target value.

Historical Note

The power method was systematized by Richard von Mises in 1929. It is also known as the power iteration or Von Mises iteration.

2. Power Method

2.1 Prerequisites

Prerequisites for the Power Method

  1. $A$ is diagonalizable (the eigenvectors form a basis for $\mathbb{R}^N$ or $\mathbb{C}^N$).
  2. When the eigenvalues are ordered by absolute value, the dominant eigenvalue is unique: $$|\lambda_1| > |\lambda_2| \geq |\lambda_3| \geq \cdots \geq |\lambda_N|$$
  3. When the initial vector $\mathbf{x}^{(0)}$ is expanded in the eigenvector basis $$\mathbf{x}^{(0)} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_N \mathbf{v}_N$$ the coefficient satisfies $c_1 \neq 0$.

Meaning of $c_1 \neq 0$

If the initial vector has no component in the $\mathbf{v}_1$ direction, the power method will not converge to $\lambda_1$. For symmetric matrices, the eigenvectors are orthogonal, so $c_1 = \mathbf{x}^{(0)} \cdot \mathbf{v}_1 / \|\mathbf{v}_1\|^2$, which is equivalent to $\mathbf{x}^{(0)} \cdot \mathbf{v}_1 \neq 0$. For general matrices, however, the eigenvectors are not orthogonal, so $c_1 \neq 0$ is the precise condition. In practice, choosing a random initial vector virtually guarantees $c_1 \neq 0$.

2.2 Algorithm

Power Method Algorithm

Input: $N \times N$ matrix $A$, initial vector $\mathbf{x}^{(0)}$ ($\|\mathbf{x}^{(0)}\| = 1$), convergence threshold $\varepsilon$, maximum iterations $k_{\max}$

  1. For $k = 1, 2, \ldots, k_{\max}$:
    1. $\mathbf{y}^{(k)} = A \mathbf{x}^{(k-1)}$ (matrix-vector product: $O(N^2)$)
    2. $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$ (normalization)
    3. $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$ (Rayleigh quotient for eigenvalue approximation)
    4. If $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$, terminate

Output: Eigenvalue approximation $\mu^{(k)}$, eigenvector approximation $\mathbf{x}^{(k)}$

3. Convergence Proof

3.1 Basic Property: $A^k \mathbf{v}_n = \lambda_n^k \mathbf{v}_n$

From the eigenvalue equation $A\mathbf{v}_n = \lambda_n \mathbf{v}_n$, the following holds by induction.

\begin{equation} A^k \mathbf{v}_n = \lambda_n^k \mathbf{v}_n \quad (k = 1, 2, 3, \ldots) \label{eq:Akvn} \end{equation}

Proof: By induction on $k$. The case $k = 1$ holds by definition. Assuming the result holds for $k-1$,

$$A^k \mathbf{v}_n = A(A^{k-1} \mathbf{v}_n) = A(\lambda_n^{k-1} \mathbf{v}_n) = \lambda_n^{k-1} (A\mathbf{v}_n) = \lambda_n^{k-1} (\lambda_n \mathbf{v}_n) = \lambda_n^k \mathbf{v}_n$$

3.2 Convergence via Eigenvector Expansion

Convergence Theorem for the Power Method

Suppose $A$ is diagonalizable with $|\lambda_1| > |\lambda_2| \geq \cdots \geq |\lambda_N|$. If $c_1 \neq 0$ in the eigenvector expansion of the initial vector, then the sequence $\{\mathbf{x}^{(k)}\}$ generated by the normalized power method converges to the direction of the eigenvector corresponding to $\lambda_1$.

Proof: Expand the initial vector in the eigenvector basis.

$$\mathbf{x}^{(0)} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_N \mathbf{v}_N, \quad c_1 \neq 0$$

Applying $A^k$ and using equation $\eqref{eq:Akvn}$,

\begin{align} A^k \mathbf{x}^{(0)} &= c_1 \lambda_1^k \mathbf{v}_1 + c_2 \lambda_2^k \mathbf{v}_2 + \cdots + c_N \lambda_N^k \mathbf{v}_N \\ &= c_1 \lambda_1^k \left[ \mathbf{v}_1 + \frac{c_2}{c_1}\left(\frac{\lambda_2}{\lambda_1}\right)^k \mathbf{v}_2 + \cdots + \frac{c_N}{c_1}\left(\frac{\lambda_N}{\lambda_1}\right)^k \mathbf{v}_N \right] \label{eq:expansion} \end{align}

Since $|\lambda_1| > |\lambda_i|$ for $i \geq 2$, we have $|\lambda_i / \lambda_1| < 1$, so

$$\lim_{k \to \infty} \left(\frac{\lambda_i}{\lambda_1}\right)^k = 0 \quad (i = 2, 3, \ldots, N)$$

Therefore, as $k \to \infty$ the bracketed expression converges to $\mathbf{v}_1$, and after normalization,

$$\lim_{k \to \infty} \frac{A^k \mathbf{x}^{(0)}}{\|A^k \mathbf{x}^{(0)}\|} = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|}$$

$\square$

3.3 Eigenvalue Extraction

With the normalization $\|\mathbf{x}^{(k)}\| = 1$, the eigenvalue can be approximated via the Rayleigh quotient.

$$\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$$

Since $\mathbf{x}^{(k)} \to \mathbf{v}_1 / \|\mathbf{v}_1\|$, we have $A\mathbf{x}^{(k)} \to \lambda_1 \mathbf{x}^{(k)}$, so

$$\lim_{k \to \infty} \mu^{(k)} = \lambda_1 \|\mathbf{x}^{(k)}\|^2 = \lambda_1$$

3.4 Necessity of Normalization

If $|\lambda_1| \neq 1$, iterating without normalization causes overflow ($|\lambda_1| > 1$) or underflow ($|\lambda_1| < 1$) due to the factor $\lambda_1^k$. Normalization at each step prevents this.

4. Convergence Rate Analysis

Convergence Rate of the Power Method

Eigenvector convergence: The error between $\mathbf{x}^{(k)}$ and the true eigenvector is

$$\|\mathbf{x}^{(k)} - (\pm \mathbf{v}_1 / \|\mathbf{v}_1\|)\| = O\!\left(\left|\frac{\lambda_2}{\lambda_1}\right|^k\right)$$

Eigenvalue convergence: The error in the Rayleigh quotient $\mu^{(k)}$ is

$$|\mu^{(k)} - \lambda_1| = O\!\left(\left|\frac{\lambda_2}{\lambda_1}\right|^{2k}\right)$$

The Rayleigh quotient has second-order accuracy with respect to the eigenvector error, so the eigenvalue converges twice as fast as the eigenvector.

The convergence ratio $r = |\lambda_2/\lambda_1|$ takes values between 0 and 1; the smaller $r$, the faster the convergence. The number of iterations required to reduce the relative eigenvalue error below $\varepsilon$ is

$$k \geq \frac{\log \varepsilon}{2 \log r}$$

Iteration Count Estimate

When $\lambda_1 = 10,\ \lambda_2 = 9$, we have $r = 0.9$. To achieve 6-digit accuracy ($\varepsilon = 10^{-6}$),

$$k \geq \frac{-6}{2 \log_{10} 0.9} = \frac{6}{2 \times 0.0458} \approx 66 \text{ iterations}$$

When $|\lambda_2 / \lambda_1|$ is close to 1, many iterations are required.

5. Inverse Power Method (Inverse Iteration)

5.1 Principle

If $A$ has eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_N$, then $A^{-1}$ has eigenvalues $1/\lambda_1, 1/\lambda_2, \ldots, 1/\lambda_N$ with the same eigenvectors. Applying the power method to $A^{-1}$ therefore converges to the eigenvector corresponding to the largest $|1/\lambda_i|$, i.e., the smallest $|\lambda_i|$.

Inverse Iteration

Apply the power method to $A^{-1}$, i.e., iterate $\mathbf{y}^{(k)} = A^{-1} \mathbf{x}^{(k-1)}$. In practice, instead of computing the inverse explicitly, solve the linear system $A\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$.

5.2 Algorithm

Inverse Iteration Algorithm

Preprocessing: Compute $A = LU$ (LU decomposition, once: $O(N^3)$)

  1. For $k = 1, 2, \ldots$:
    1. Solve $L\mathbf{z} = \mathbf{x}^{(k-1)}$ by forward substitution ($O(N^2)$)
    2. Solve $U\mathbf{y}^{(k)} = \mathbf{z}$ by back substitution ($O(N^2)$)
    3. $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
    4. $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$ (Rayleigh quotient)

Output: Eigenvalue of smallest absolute value $\mu^{(k)}$ and corresponding eigenvector $\mathbf{x}^{(k)}$

Computational Cost

The LU decomposition is performed once at $O(N^3)$ cost, and each iteration requires only $O(N^2)$ for forward and back substitution. This achieves the same per-iteration cost as the power method while computing the eigenvalue of smallest absolute value.

6. Shifted Inverse Power Method

6.1 Introducing the Shift

Introduce a target shift value $\sigma$ and apply the power method to $(A - \sigma I)^{-1}$. Since the eigenvalues of $(A - \sigma I)$ are $\lambda_i - \sigma$, the eigenvalues of $(A - \sigma I)^{-1}$ are

$$\frac{1}{\lambda_i - \sigma}$$

For the eigenvalue $\lambda_j$ closest to $\sigma$, $|1/(\lambda_j - \sigma)|$ is the largest, so the power method converges to the eigenvector of $\lambda_j$.

Shifted Inverse Iteration

Finds the eigenvalue closest to the shift $\sigma$ and its eigenvector. At each iteration, solve the linear system

$$(A - \sigma I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$$

and normalize $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$.

6.2 Algorithm

Shifted Inverse Iteration Algorithm

Input: $A$, shift $\sigma$, $\mathbf{x}^{(0)}$ ($\|\mathbf{x}^{(0)}\| = 1$), $\varepsilon$

Preprocessing: Compute $A - \sigma I = LU$ (LU decomposition, once)

  1. For $k = 1, 2, \ldots$:
    1. Solve $(A - \sigma I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$ by forward/back substitution
    2. $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
    3. $\mu^{(k)} = (\mathbf{x}^{(k)})^T A \mathbf{x}^{(k)}$
    4. If $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$, terminate

Output: Eigenvalue closest to $\sigma$: $\mu^{(k)}$, and eigenvector $\mathbf{x}^{(k)}$

6.3 Convergence Rate

Convergence Rate of Shifted Inverse Iteration

Let $\lambda_j$ be the eigenvalue closest to $\sigma$, and $\lambda_{\text{next}}$ the second closest. The eigenvector convergence ratio is

$$r = \left|\frac{\lambda_j - \sigma}{\lambda_{\text{next}} - \sigma}\right|$$

The closer $\sigma$ is to $\lambda_j$, the smaller $r$ becomes, accelerating convergence.

Choosing the Shift

  • Use the Gershgorin circle theorem to estimate the approximate range of eigenvalues
  • Use approximate eigenvalues obtained during the QR algorithm
  • Use diagonal elements after tridiagonalization (Householder transformation)

In LAPACK, the standard strategy is to compute eigenvalues with the QR algorithm and then compute eigenvectors using shifted inverse iteration.

Shift Estimation via the Gershgorin Circle Theorem

For $A = \begin{pmatrix}2&1\\1&3\end{pmatrix}$, the Gershgorin discs are:

  • Row 1: center $2$, radius $|1| = 1$ → disc $[1, 3]$
  • Row 2: center $3$, radius $|1| = 1$ → disc $[2, 4]$

All eigenvalues lie in $[1, 4]$. To target the second eigenvalue, choose $\sigma = 1.3$ (near the left end); to target the first, choose $\sigma = 3.5$ (near the right end).

7. Rayleigh Quotient Iteration

7.1 Properties of the Rayleigh Quotient

Rayleigh Quotient

$$R(\mathbf{x}) = \frac{\mathbf{x}^T A \mathbf{x}}{\mathbf{x}^T \mathbf{x}}$$

When $\mathbf{x}$ coincides with an eigenvector $\mathbf{v}_j$, $R(\mathbf{v}_j) = \lambda_j$.

Second-Order Approximation Property of the Rayleigh Quotient

When $\mathbf{x} = \mathbf{v}_j + \boldsymbol{\varepsilon}$,

$$R(\mathbf{x}) = \lambda_j + O(\|\boldsymbol{\varepsilon}\|^2)$$

That is, the Rayleigh quotient has second-order accuracy with respect to the eigenvector error.

7.2 Algorithm

Update the shift $\sigma$ in shifted inverse iteration at every step using the Rayleigh quotient.

Rayleigh Quotient Iteration

  1. For $k = 1, 2, \ldots$:
    1. $\sigma^{(k)} = R(\mathbf{x}^{(k-1)}) = (\mathbf{x}^{(k-1)})^T A \mathbf{x}^{(k-1)}$ (update the shift)
    2. Solve $(A - \sigma^{(k)} I)\mathbf{y}^{(k)} = \mathbf{x}^{(k-1)}$ (since $\sigma^{(k)}$ changes each iteration, the LU factorization must be recomputed)
    3. $\mathbf{x}^{(k)} = \mathbf{y}^{(k)} / \|\mathbf{y}^{(k)}\|$
    4. If $|\sigma^{(k)} - \sigma^{(k-1)}| < \varepsilon$, terminate

Convergence Rate of Rayleigh Quotient Iteration

  • General matrices: quadratic convergence
  • Symmetric matrices: cubic convergence

For symmetric matrices, machine precision is typically reached in 3–4 iterations.

Advantages and Disadvantages

  • Advantage: Extremely fast convergence (cubic for symmetric matrices)
  • Disadvantage: Since the shift changes at each step, the LU decomposition must be recomputed every iteration ($O(N^3)$/iteration)
  • Disadvantage: Which eigenvalue it converges to depends on the initial vector and cannot be predicted in advance

In practice, it is effective to perform a few steps of shifted inverse iteration to obtain an approximate eigenvector, then switch to Rayleigh quotient iteration.

8. Numerical Examples

8.1 Power Method

Example 1: Power Method on a 2×2 Matrix

Consider the matrix

$$A = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$$

Its eigenvalues are $\lambda_1 = \frac{5+\sqrt{5}}{2} \approx 3.618$ and $\lambda_2 = \frac{5-\sqrt{5}}{2} \approx 1.382$, giving a convergence ratio of $|\lambda_2/\lambda_1| \approx 0.382$.

Starting with $\mathbf{x}^{(0)} = (1/\sqrt{2},\ 1/\sqrt{2})^T$, the iteration produces:

$k$$\mathbf{x}^{(k)}$$\mu^{(k)}$$|\mu^{(k)} - \lambda_1|$
0$(0.7071,\ 0.7071)$$3.5000$$1.2 \times 10^{-1}$
1$(0.6000,\ 0.8000)$$3.6000$$1.8 \times 10^{-2}$
2$(0.5547,\ 0.8321)$$3.6154$$2.7 \times 10^{-3}$
3$(0.5369,\ 0.8436)$$3.6176$$3.8 \times 10^{-4}$
4$(0.5300,\ 0.8480)$$3.6180$$5.5 \times 10^{-5}$

The error in $\mu^{(k)}$ decreases by a factor of approximately $0.382^2 \approx 0.146$ per iteration, confirming the theoretical convergence rate.

8.2 Shifted Inverse Iteration

Example 2: Shifted Inverse Iteration

For the same matrix $A$, with shift $\sigma = 1.3$, we target the second eigenvalue $\lambda_2 \approx 1.382$.

Since $\sigma = 1.3$ is close to $\lambda_2$, the convergence ratio is

$$r = \left|\frac{\lambda_2 - \sigma}{\lambda_1 - \sigma}\right| = \left|\frac{1.382 - 1.3}{3.618 - 1.3}\right| = \frac{0.082}{2.318} \approx 0.035$$

which is very small, so convergence occurs in 2–3 iterations to high accuracy. Compared to the power method's $r = 0.382$, the effect of shift selection on convergence acceleration is dramatic.

8.3 Rayleigh Quotient Iteration

Example 3: Cubic Convergence of Rayleigh Quotient Iteration

Applying Rayleigh quotient iteration to the same matrix $A$ with initial vector $\mathbf{x}^{(0)} = (1/\sqrt{2},\ 1/\sqrt{2})^T$:

$k$$\sigma^{(k)}$$|\sigma^{(k)} - \lambda_1|$
0$3.5000$$1.2 \times 10^{-1}$
1$3.6176$$3.9 \times 10^{-4}$
2$3.618033988738$$1.2 \times 10^{-11}$
3$3.618033988749895$$\approx 10^{-16}$

With cubic convergence, the number of correct digits roughly triples each iteration. Indeed, the correct digits progress as $1 \to 3 \to 11 \to 16$ (machine precision), reaching full accuracy in just 3 iterations.

9. Convergence Visualization

9.1 Power Method Convergence Graph

Log plot of Rayleigh quotient error and eigenvector angle error versus iteration count for the power method Power Method Convergence (Rayleigh Quotient Error) Iteration k log₁₀|μ⁽ᵏ⁾ − λ₁| 0 −2 −4 −6 −8 −10 0 1 2 3 4 5 6 7 8 9 10 Computed Theoretical |λ₂/λ₁|²ᵏ
Figure 1: Convergence of the power method for $A = \begin{pmatrix}2&1\\1&3\end{pmatrix}$. The Rayleigh quotient error decreases by a factor of $|\lambda_2/\lambda_1|^2 \approx 0.146$ per iteration.

9.2 Comparison of Three Methods

Graph comparing eigenvalue error convergence rates of the power method, shifted inverse iteration, and Rayleigh quotient iteration Convergence Comparison of Three Methods Iteration k log₁₀|error| 0 −2 −4 −6 −8 −10 −12 −14 −16 0 1 2 3 4 5 6 7 Power Shifted Inv. Rayleigh Quotient Iter.
Figure 2: Convergence rate comparison of three methods ($\mathbf{x}_0 = (1,1)^T/\sqrt{2}$, shift $\sigma = 3.5$). The power method (blue) shows linear convergence, shifted inverse iteration (green) shows fast linear convergence, and Rayleigh quotient iteration (red) shows cubic convergence. All three start from the same initial vector; at k=1, shifted inverse iteration and Rayleigh quotient iteration reach the same point (since the initial shift is identical).

10. Implementation Notes

10.1 Choice of Initial Vector

Using a random vector as the initial value virtually guarantees $c_1 \neq 0$. In practice, a random initial vector is sufficient, but if prior information is available, it can be exploited to reduce the number of iterations.

10.2 Convergence Criteria

Either of the following can be used:

  • Eigenvalue change: $|\mu^{(k)} - \mu^{(k-1)}| < \varepsilon$
  • Residual: $\|A\mathbf{x}^{(k)} - \mu^{(k)}\mathbf{x}^{(k)}\| < \varepsilon$ (residual norm)

Residual-based criteria are more reliable. When the eigenvalue is very small, absolute error is insufficient, so implementations often use the relative error

$$\frac{|\mu^{(k)} - \mu^{(k-1)}|}{\max(1,\, |\mu^{(k)}|)} < \varepsilon$$

10.3 When the Power Method Fails

Non-convergent Cases

  • $|\lambda_1| = |\lambda_2|$: For example, $\lambda_1 = -\lambda_2$ (opposite signs) or $\lambda_1 = \overline{\lambda_2}$ (complex conjugates) cause the power method to oscillate without converging.
  • $A$ is not diagonalizable: When Jordan blocks are present, convergence is not guaranteed.

In these cases, use the QR algorithm (Chapter 5) or the Lanczos method.

10.4 Computational Cost Comparison

Method Preprocessing Per Iteration Convergence Rate
Power Method None $O(N^2)$ Linear $|\lambda_2/\lambda_1|^k$
Inverse Iteration LU decomp. $O(N^3)$ $O(N^2)$ Linear $|\lambda_{N-1}/\lambda_N|^k$
Shifted Inverse Iter. LU decomp. $O(N^3)$ $O(N^2)$ Linear ($\sigma$-dependent)
Rayleigh Quotient Iter. None $O(N^3)$ Quadratic / Cubic (symmetric)

10.5 Relationship to Other Methods

  • QR Algorithm: Computes all eigenvalues simultaneously. Can be viewed as an implicit parallel execution of the power method.
  • Lanczos Method: Eigenvalue computation for large sparse matrices. An extension of the power method to Krylov subspaces.
  • LAPACK Strategy: Compute eigenvalues with the QR algorithm, then compute eigenvectors using shifted inverse iteration.

11. Summary

  • The power method is the most fundamental technique for finding the eigenvalue of largest absolute value. Its convergence rate is $O(|\lambda_2/\lambda_1|^k)$, and convergence is slow when $|\lambda_2/\lambda_1|$ is close to 1.
  • Inverse iteration applies the power method to $A^{-1}$ to find the eigenvalue of smallest absolute value.
  • Shifted inverse iteration applies the power method to $(A - \sigma I)^{-1}$ to find the eigenvalue closest to $\sigma$. This is the standard LAPACK strategy for computing eigenvectors.
  • Rayleigh quotient iteration dynamically updates the shift using the Rayleigh quotient, achieving cubic convergence for symmetric matrices.

References

  • G. H. Golub, C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
  • L. N. Trefethen, D. Bau III, Numerical Linear Algebra, SIAM, 1997.
  • B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, 1998.
  • Y. Saad, Numerical Methods for Large Eigenvalue Problems, Revised ed., SIAM, 2011.
  • Wikipedia: Power iteration