Proofs Chapter 9: Eigenvalue Derivatives (Basic Formulas)

証明集 第9章:固有値の微分(基本公式)

This chapter proves the derivatives of eigenvalues and eigenvectors. Eigenvalue sensitivity analysis is an important theme at the intersection of physics, engineering, and statistics, including structural eigenfrequency design optimization in vibration engineering, variance-explained fluctuations in principal component analysis (PCA), and the Hellmann–Feynman theorem in quantum mechanics. We cover both symmetric and general matrices, from derivatives of the sum and product of eigenvalues to perturbation formulas for individual eigenvalues and eigenvectors.

Prerequisites: Chapter 7 (Derivatives of Determinants), Chapter 8 (Derivatives of the Inverse). Chapters that use results from this chapter: Chapter 10 (Derivatives of Quadratic Forms), Chapter 15 (Derivatives of Special Matrices).

9. Eigenvalue and Eigenvector Derivatives

Assumptions for this chapter
Unless stated otherwise, the formulas in this chapter hold under the following conditions:
  • All formulas are based on the denominator layout
  • Eigenvalues are assumed to be simple (multiplicity 1). Repeated eigenvalues require special treatment
  • Eigenvectors are assumed to be appropriately normalized

9.1 Derivative of the Sum of Eigenvalues

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \sum_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \boldsymbol{I}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{n \times n}$, $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues of $\boldsymbol{X}$ (counted with multiplicity)
Proof

We verify the relationship between eigenvalues and the trace. The characteristic polynomial of the $n \times n$ matrix $\boldsymbol{X}$ can be written as

\begin{equation}\det(\boldsymbol{X} - \lambda \boldsymbol{I}) = (-1)^n (\lambda - \lambda_0)(\lambda - \lambda_1) \cdots (\lambda - \lambda_{n-1}) \label{eq:9-1-1}\end{equation}

where $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues (with multiplicity).

Consider the coefficient of $\lambda^{n-1}$ when expanding the right-hand side of $\eqref{eq:9-1-1}$.

\begin{equation}(-1)^n (\lambda - \lambda_0) \cdots (\lambda - \lambda_{n-1}) = (-1)^n \left[ \lambda^n - (\lambda_0 + \cdots + \lambda_{n-1})\lambda^{n-1} + \cdots \right] \label{eq:9-1-2}\end{equation}

Thus the coefficient of $\lambda^{n-1}$ is $(-1)^{n+1}(\lambda_0 + \cdots + \lambda_{n-1})$.

On the other hand, when the left-hand side of $\eqref{eq:9-1-1}$, $\det(\boldsymbol{X} - \lambda \boldsymbol{I})$, is computed by cofactor expansion, it is known that the coefficient of $\lambda^{n-1}$ is $(-1)^{n+1}\text{tr}(\boldsymbol{X})$ (this is shown in the proof of the Cayley–Hamilton theorem).

Comparing the coefficients from $\eqref{eq:9-1-2}$ and the above,

\begin{equation}(-1)^{n+1}(\lambda_0 + \cdots + \lambda_{n-1}) = (-1)^{n+1}\text{tr}(\boldsymbol{X}) \label{eq:9-1-3}\end{equation}

Dividing both sides by $(-1)^{n+1}$ shows that the sum of the eigenvalues equals the trace.

\begin{equation}\sum_{i=0}^{n-1} \lambda_i = \text{tr}(\boldsymbol{X}) \label{eq:9-1-4}\end{equation}

Expressing the trace in terms of components,

\begin{equation}\text{tr}(\boldsymbol{X}) = \sum_{i=0}^{n-1} X_{ii} \label{eq:9-1-5}\end{equation}

which is the sum of the diagonal entries of $\boldsymbol{X}$.

Differentiating $\eqref{eq:9-1-5}$ with respect to $X_{jk}$. Since $X_{ii}$ depends on $X_{jk}$ only when $i = j$ and $i = k$,

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{X})}{\partial X_{jk}} = \frac{\partial}{\partial X_{jk}} \sum_{i=0}^{n-1} X_{ii} = \delta_{jk} \label{eq:9-1-6}\end{equation}

where $\delta_{jk}$ is the Kronecker delta.

Expressing the result of $\eqref{eq:9-1-6}$ in matrix form: the matrix whose $(j, k)$ entry is $\delta_{jk}$ is the identity matrix $\boldsymbol{I}$, so

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{X}) = \boldsymbol{I} \label{eq:9-1-7}\end{equation}

Combining $\eqref{eq:9-1-4}$ and $\eqref{eq:9-1-7}$: since $\sum_{i=0}^{n-1} \lambda_i = \text{tr}(\boldsymbol{X})$,

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \sum_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{X}) = \boldsymbol{I} \label{eq:9-1-8}\end{equation}

Remark: This formula holds regardless of whether $\boldsymbol{X}$ is diagonalizable. Even when the eigenvalues are complex, their sum is always real (equal to the trace).

9.2 Derivative of the Product of Eigenvalues

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{n \times n}$, $\boldsymbol{X}$ is nonsingular, $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues of $\boldsymbol{X}$ (counted with multiplicity)
Proof

We verify the relationship between eigenvalues and the determinant. Substituting $\lambda = 0$ into the characteristic polynomial of the $n \times n$ matrix $\boldsymbol{X}$,

\begin{equation}\det(\boldsymbol{X} - 0 \cdot \boldsymbol{I}) = \det(\boldsymbol{X}) \label{eq:9-2-1}\end{equation}

On the other hand, the characteristic polynomial can be factored using the eigenvalues as

\begin{equation}\det(\boldsymbol{X} - \lambda \boldsymbol{I}) = (-1)^n (\lambda - \lambda_0)(\lambda - \lambda_1) \cdots (\lambda - \lambda_{n-1}) \label{eq:9-2-2}\end{equation}

Substituting $\lambda = 0$ into $\eqref{eq:9-2-2}$,

\begin{equation}\det(\boldsymbol{X} - 0 \cdot \boldsymbol{I}) = (-1)^n (0 - \lambda_0)(0 - \lambda_1) \cdots (0 - \lambda_{n-1}) \label{eq:9-2-3}\end{equation}

Simplifying the right-hand side of $\eqref{eq:9-2-3}$,

\begin{equation}(-1)^n (-\lambda_0)(-\lambda_1) \cdots (-\lambda_{n-1}) = (-1)^n \cdot (-1)^n \lambda_0 \lambda_1 \cdots \lambda_{n-1} = \lambda_0 \lambda_1 \cdots \lambda_{n-1} \label{eq:9-2-4}\end{equation}

From $\eqref{eq:9-2-1}$, $\eqref{eq:9-2-3}$, and $\eqref{eq:9-2-4}$, the determinant equals the product of the eigenvalues.

\begin{equation}\det(\boldsymbol{X}) = \prod_{i=0}^{n-1} \lambda_i \label{eq:9-2-5}\end{equation}

Differentiating both sides of $\eqref{eq:9-2-5}$ with respect to $\boldsymbol{X}$. Since the right-hand side equals $\det(\boldsymbol{X})$,

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i = \frac{\partial}{\partial \boldsymbol{X}} \det(\boldsymbol{X}) \label{eq:9-2-6}\end{equation}

Applying the determinant derivative formula from 7.1,

\begin{equation}\frac{\partial \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \det(\boldsymbol{X}) (\boldsymbol{X}^{-1})^\top = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top} \label{eq:9-2-7}\end{equation}

Combining $\eqref{eq:9-2-6}$ and $\eqref{eq:9-2-7}$ gives the final result.

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top} \label{eq:9-2-8}\end{equation}

Remark: Here $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top = (\boldsymbol{X}^\top)^{-1}$. When $\boldsymbol{X}$ is singular ($\det(\boldsymbol{X}) = 0$), the inverse does not exist and this formula cannot be applied.

9.3 Derivative of an Eigenvalue

Formula: $\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i$
Conditions: $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is a real symmetric matrix, $\lambda_i$ is a simple eigenvalue (multiplicity 1), $\boldsymbol{v}_i \in \mathbb{R}^n$ is the corresponding normalized eigenvector ($\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$)
Proof

Write down the eigenvalue equation. The eigenvalue $\lambda_i$ and eigenvector $\boldsymbol{v}_i$ of $\boldsymbol{A}$ satisfy

\begin{equation}\boldsymbol{A} \boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i \label{eq:9-3-1}\end{equation}

Differentiate both sides of $\eqref{eq:9-3-1}$, where $\partial$ denotes differentiation with respect to the entries of $\boldsymbol{A}$. Applying the product rule (1.25),

\begin{equation}(\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{A} (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i + \lambda_i (\partial \boldsymbol{v}_i) \label{eq:9-3-2}\end{equation}

Left-multiply both sides of $\eqref{eq:9-3-2}$ by $\boldsymbol{v}_i^\top$:

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i^\top \boldsymbol{v}_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-3}\end{equation}

Applying the normalization condition $\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$ to $\eqref{eq:9-3-3}$,

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = \partial \lambda_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-4}\end{equation}

Use the symmetry of $\boldsymbol{A}$. Since $\boldsymbol{A} = \boldsymbol{A}^\top$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} = (\boldsymbol{A}^\top \boldsymbol{v}_i)^\top = (\boldsymbol{A} \boldsymbol{v}_i)^\top \label{eq:9-3-5}\end{equation}

Substituting the eigenvalue equation $\eqref{eq:9-3-1}$ into $\eqref{eq:9-3-5}$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} = (\lambda_i \boldsymbol{v}_i)^\top = \lambda_i \boldsymbol{v}_i^\top \label{eq:9-3-6}\end{equation}

Applying $\eqref{eq:9-3-6}$ to the second term of $\eqref{eq:9-3-4}$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-7}\end{equation}

Substituting $\eqref{eq:9-3-7}$ into $\eqref{eq:9-3-4}$,

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = \partial \lambda_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-8}\end{equation}

Subtracting $\lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i)$ from both sides of $\eqref{eq:9-3-8}$, the terms involving $\partial \boldsymbol{v}_i$ cancel:

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i = \partial \lambda_i \label{eq:9-3-9}\end{equation}

Rewriting $\eqref{eq:9-3-9}$ gives the final result.

\begin{equation}\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-3-10}\end{equation}

Remark: This formula is also known as the "eigenvalue perturbation formula." A key point is that computing $\partial \boldsymbol{v}_i$ is not required. Setting $\partial \boldsymbol{A} = \boldsymbol{E}_{jk}$ (the matrix with 1 in entry $(j,k)$ and 0 elsewhere) gives $\displaystyle\frac{\partial \lambda_i}{\partial A_{jk}} = v_{i,j} v_{i,k}$.

9.4 Derivative of an Eigenvector

Formula: $\partial \boldsymbol{v}_i = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \boldsymbol{A}) \boldsymbol{v}_i$
Conditions: $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is a real symmetric matrix, $\lambda_i$ is a simple eigenvalue, $\boldsymbol{v}_i \in \mathbb{R}^n$ is the corresponding normalized eigenvector ($\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$), $(\cdot)^+$ denotes the Moore–Penrose pseudoinverse
Proof

Differentiating the eigenvalue equation $\boldsymbol{A} \boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i$,

\begin{equation}(\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{A} (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i + \lambda_i (\partial \boldsymbol{v}_i) \label{eq:9-4-1}\end{equation}

Rearranging $\eqref{eq:9-4-1}$ for $\partial \boldsymbol{v}_i$ by collecting $\boldsymbol{A}(\partial \boldsymbol{v}_i)$ and $\lambda_i(\partial \boldsymbol{v}_i)$,

\begin{equation}\boldsymbol{A} (\partial \boldsymbol{v}_i) - \lambda_i (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-2}\end{equation}

Factoring the left-hand side of $\eqref{eq:9-4-2}$,

\begin{equation}(\boldsymbol{A} - \lambda_i \boldsymbol{I}) (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-3}\end{equation}

Multiplying both sides of $\eqref{eq:9-4-3}$ by $-1$ to reverse the sign,

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = (\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i \label{eq:9-4-4}\end{equation}

Differentiating the normalization condition $\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$,

\begin{equation}(\partial \boldsymbol{v}_i)^\top \boldsymbol{v}_i + \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0 \label{eq:9-4-5}\end{equation}

Since $(\partial \boldsymbol{v}_i)^\top \boldsymbol{v}_i$ is a scalar, it equals its transpose $\boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i)$. Therefore $\eqref{eq:9-4-5}$ gives

\begin{equation}2 \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0 \label{eq:9-4-6}\end{equation}

Hence $\boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0$, i.e., $\partial \boldsymbol{v}_i$ is orthogonal to $\boldsymbol{v}_i$.

Examine the matrix $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$. Since $\boldsymbol{A}$ is symmetric, it is orthogonally diagonalizable with eigenvalues $\lambda_0, \ldots, \lambda_{n-1}$. The eigenvalues of $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ are $\lambda_i - \lambda_0, \ldots, \lambda_i - \lambda_{n-1}$, and in particular $\lambda_i - \lambda_i = 0$.

Therefore $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ is singular, and its null space is the one-dimensional subspace $\text{span}\{\boldsymbol{v}_i\}$.

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A}) \boldsymbol{v}_i = \boldsymbol{0} \label{eq:9-4-7}\end{equation}

Decompose the right-hand side of $\eqref{eq:9-4-4}$ into a component along $\boldsymbol{v}_i$ and a component orthogonal to $\boldsymbol{v}_i$. From 9.3, $\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i$. The component along $\boldsymbol{v}_i$ is

\begin{equation}\boldsymbol{v}_i \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i = \boldsymbol{0} \label{eq:9-4-8}\end{equation}

So the right-hand side has only a component orthogonal to $\boldsymbol{v}_i$.

Use the Moore–Penrose pseudoinverse $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$. Since $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ is symmetric, its pseudoinverse is also symmetric and acts as an inverse on the orthogonal complement of the null space $\text{span}\{\boldsymbol{v}_i\}$.

Since the right-hand side is orthogonal to $\boldsymbol{v}_i$ by $\eqref{eq:9-4-8}$, we can left-multiply both sides of $\eqref{eq:9-4-4}$ by $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$:

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ [(\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i] \label{eq:9-4-9}\end{equation}

From $\eqref{eq:9-4-6}$, $\partial \boldsymbol{v}_i \perp \boldsymbol{v}_i$, so $\partial \boldsymbol{v}_i$ is orthogonal to the null space of $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$. Therefore the left-hand side of $\eqref{eq:9-4-9}$ becomes

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = \partial \boldsymbol{v}_i \label{eq:9-4-10}\end{equation}

On the right-hand side of $\eqref{eq:9-4-9}$, $(\partial \lambda_i) \boldsymbol{v}_i$ lies in the $\boldsymbol{v}_i$ direction and belongs to the null space, so $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \lambda_i) \boldsymbol{v}_i = \boldsymbol{0}$.

Combining $\eqref{eq:9-4-10}$ and the above gives the final result.

\begin{equation}\partial \boldsymbol{v}_i = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-11}\end{equation}

Remark: When $\boldsymbol{A}$ is symmetric and the eigenvectors $\{\boldsymbol{v}_j\}$ form an orthonormal set, the pseudoinverse $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$ can be expressed as $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ = \sum_{j \neq i} \displaystyle\frac{1}{\lambda_i - \lambda_j} \boldsymbol{v}_j \boldsymbol{v}_j^\top$. This agrees with the result of Nelson's method in 9.14.

References

  • Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
  • Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
  • Matrix calculus — Wikipedia