How do you compute the derivative of the sum of eigenvalues?

The sum of the eigenvalues of a matrix A(t) equals the trace tr(A(t)), so the derivative of the sum of eigenvalues is simply tr(A'(t)). This convenient property allows computation without finding individual eigenvalues.

How do you compute the derivative of the product of eigenvalues?

The product of the eigenvalues of A(t) equals the determinant det(A(t)), so the derivative of the product is given by Jacobi's formula: d/dt det(A) = tr(adj(A) · A'(t)). When A is nonsingular, this simplifies to det(A) · tr(A⁻¹ A'(t)).

What conditions are needed for eigenvalue and eigenvector derivatives?

When the eigenvalue is simple (multiplicity 1), the derivatives of the eigenvalue and eigenvector with respect to matrix parameters are well-defined. For degenerate (repeated) eigenvalues, a perturbation-theoretic treatment is required.

Proofs Chapter 9: Eigenvalue Derivatives (Basic Formulas)

証明集第9章：固有値の微分（基本公式）

This chapter proves the derivatives of eigenvalues and eigenvectors. Eigenvalue sensitivity analysis is an important theme at the intersection of physics, engineering, and statistics, including structural eigenfrequency design optimization in vibration engineering, variance-explained fluctuations in principal component analysis (PCA), and the Hellmann–Feynman theorem in quantum mechanics. We cover both symmetric and general matrices, from derivatives of the sum and product of eigenvalues to perturbation formulas for individual eigenvalues and eigenvectors.

Prerequisites: Chapter 7 (Derivatives of Determinants), Chapter 8 (Derivatives of the Inverse). Chapters that use results from this chapter: Chapter 10 (Derivatives of Quadratic Forms), Chapter 15 (Derivatives of Special Matrices).

9. Eigenvalue and Eigenvector Derivatives

Assumptions for this chapter
Unless stated otherwise, the formulas in this chapter hold under the following conditions:

All formulas are based on the denominator layout
Eigenvalues are assumed to be simple (multiplicity 1). Repeated eigenvalues require special treatment
Eigenvectors are assumed to be appropriately normalized

9.1 Derivative of the Sum of Eigenvalues

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \sum_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \boldsymbol{I}$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{n \times n}$, $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues of $\boldsymbol{X}$ (counted with multiplicity)

Proof

We verify the relationship between eigenvalues and the trace. The characteristic polynomial of the $n \times n$ matrix $\boldsymbol{X}$ can be written as

\begin{equation}\det(\boldsymbol{X} - \lambda \boldsymbol{I}) = (-1)^n (\lambda - \lambda_0)(\lambda - \lambda_1) \cdots (\lambda - \lambda_{n-1}) \label{eq:9-1-1}\end{equation}

where $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues (with multiplicity).

Consider the coefficient of $\lambda^{n-1}$ when expanding the right-hand side of $\eqref{eq:9-1-1}$.

\begin{equation}(-1)^n (\lambda - \lambda_0) \cdots (\lambda - \lambda_{n-1}) = (-1)^n \left[ \lambda^n - (\lambda_0 + \cdots + \lambda_{n-1})\lambda^{n-1} + \cdots \right] \label{eq:9-1-2}\end{equation}

Thus the coefficient of $\lambda^{n-1}$ is $(-1)^{n+1}(\lambda_0 + \cdots + \lambda_{n-1})$.

On the other hand, when the left-hand side of $\eqref{eq:9-1-1}$, $\det(\boldsymbol{X} - \lambda \boldsymbol{I})$, is computed by cofactor expansion, it is known that the coefficient of $\lambda^{n-1}$ is $(-1)^{n+1}\text{tr}(\boldsymbol{X})$ (this is shown in the proof of the Cayley–Hamilton theorem).

Comparing the coefficients from $\eqref{eq:9-1-2}$ and the above,

\begin{equation}(-1)^{n+1}(\lambda_0 + \cdots + \lambda_{n-1}) = (-1)^{n+1}\text{tr}(\boldsymbol{X}) \label{eq:9-1-3}\end{equation}

Dividing both sides by $(-1)^{n+1}$ shows that the sum of the eigenvalues equals the trace.

\begin{equation}\sum_{i=0}^{n-1} \lambda_i = \text{tr}(\boldsymbol{X}) \label{eq:9-1-4}\end{equation}

Expressing the trace in terms of components,

\begin{equation}\text{tr}(\boldsymbol{X}) = \sum_{i=0}^{n-1} X_{ii} \label{eq:9-1-5}\end{equation}

which is the sum of the diagonal entries of $\boldsymbol{X}$.

Differentiating $\eqref{eq:9-1-5}$ with respect to $X_{jk}$. Since $X_{ii}$ depends on $X_{jk}$ only when $i = j$ and $i = k$,

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{X})}{\partial X_{jk}} = \frac{\partial}{\partial X_{jk}} \sum_{i=0}^{n-1} X_{ii} = \delta_{jk} \label{eq:9-1-6}\end{equation}

where $\delta_{jk}$ is the Kronecker delta.

Expressing the result of $\eqref{eq:9-1-6}$ in matrix form: the matrix whose $(j, k)$ entry is $\delta_{jk}$ is the identity matrix $\boldsymbol{I}$, so

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{X}) = \boldsymbol{I} \label{eq:9-1-7}\end{equation}

Combining $\eqref{eq:9-1-4}$ and $\eqref{eq:9-1-7}$: since $\sum_{i=0}^{n-1} \lambda_i = \text{tr}(\boldsymbol{X})$,

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \sum_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{X}) = \boldsymbol{I} \label{eq:9-1-8}\end{equation}

Remark: This formula holds regardless of whether $\boldsymbol{X}$ is diagonalizable. Even when the eigenvalues are complex, their sum is always real (equal to the trace).

9.2 Derivative of the Product of Eigenvalues

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top}$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{n \times n}$, $\boldsymbol{X}$ is nonsingular, $\lambda_0, \ldots, \lambda_{n-1}$ are the eigenvalues of $\boldsymbol{X}$ (counted with multiplicity)

Proof

We verify the relationship between eigenvalues and the determinant. Substituting $\lambda = 0$ into the characteristic polynomial of the $n \times n$ matrix $\boldsymbol{X}$,

\begin{equation}\det(\boldsymbol{X} - 0 \cdot \boldsymbol{I}) = \det(\boldsymbol{X}) \label{eq:9-2-1}\end{equation}

On the other hand, the characteristic polynomial can be factored using the eigenvalues as

\begin{equation}\det(\boldsymbol{X} - \lambda \boldsymbol{I}) = (-1)^n (\lambda - \lambda_0)(\lambda - \lambda_1) \cdots (\lambda - \lambda_{n-1}) \label{eq:9-2-2}\end{equation}

Substituting $\lambda = 0$ into $\eqref{eq:9-2-2}$,

\begin{equation}\det(\boldsymbol{X} - 0 \cdot \boldsymbol{I}) = (-1)^n (0 - \lambda_0)(0 - \lambda_1) \cdots (0 - \lambda_{n-1}) \label{eq:9-2-3}\end{equation}

Simplifying the right-hand side of $\eqref{eq:9-2-3}$,

\begin{equation}(-1)^n (-\lambda_0)(-\lambda_1) \cdots (-\lambda_{n-1}) = (-1)^n \cdot (-1)^n \lambda_0 \lambda_1 \cdots \lambda_{n-1} = \lambda_0 \lambda_1 \cdots \lambda_{n-1} \label{eq:9-2-4}\end{equation}

From $\eqref{eq:9-2-1}$, $\eqref{eq:9-2-3}$, and $\eqref{eq:9-2-4}$, the determinant equals the product of the eigenvalues.

\begin{equation}\det(\boldsymbol{X}) = \prod_{i=0}^{n-1} \lambda_i \label{eq:9-2-5}\end{equation}

Differentiating both sides of $\eqref{eq:9-2-5}$ with respect to $\boldsymbol{X}$. Since the right-hand side equals $\det(\boldsymbol{X})$,

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i = \frac{\partial}{\partial \boldsymbol{X}} \det(\boldsymbol{X}) \label{eq:9-2-6}\end{equation}

Applying the determinant derivative formula from 7.1,

\begin{equation}\frac{\partial \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \det(\boldsymbol{X}) (\boldsymbol{X}^{-1})^\top = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top} \label{eq:9-2-7}\end{equation}

Combining $\eqref{eq:9-2-6}$ and $\eqref{eq:9-2-7}$ gives the final result.

\begin{equation}\frac{\partial}{\partial \boldsymbol{X}} \prod_{i=0}^{n-1} \lambda_i(\boldsymbol{X}) = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top} \label{eq:9-2-8}\end{equation}

Remark: Here $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top = (\boldsymbol{X}^\top)^{-1}$. When $\boldsymbol{X}$ is singular ($\det(\boldsymbol{X}) = 0$), the inverse does not exist and this formula cannot be applied.

9.3 Derivative of an Eigenvalue

Formula: $\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is a real symmetric matrix, $\lambda_i$ is a simple eigenvalue (multiplicity 1), $\boldsymbol{v}_i \in \mathbb{R}^n$ is the corresponding normalized eigenvector ($\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$)

Proof

Write down the eigenvalue equation. The eigenvalue $\lambda_i$ and eigenvector $\boldsymbol{v}_i$ of $\boldsymbol{A}$ satisfy

\begin{equation}\boldsymbol{A} \boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i \label{eq:9-3-1}\end{equation}

Differentiate both sides of $\eqref{eq:9-3-1}$, where $\partial$ denotes differentiation with respect to the entries of $\boldsymbol{A}$. Applying the product rule (1.25),

\begin{equation}(\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{A} (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i + \lambda_i (\partial \boldsymbol{v}_i) \label{eq:9-3-2}\end{equation}

Left-multiply both sides of $\eqref{eq:9-3-2}$ by $\boldsymbol{v}_i^\top$:

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i^\top \boldsymbol{v}_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-3}\end{equation}

Applying the normalization condition $\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$ to $\eqref{eq:9-3-3}$,

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = \partial \lambda_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-4}\end{equation}

Use the symmetry of $\boldsymbol{A}$. Since $\boldsymbol{A} = \boldsymbol{A}^\top$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} = (\boldsymbol{A}^\top \boldsymbol{v}_i)^\top = (\boldsymbol{A} \boldsymbol{v}_i)^\top \label{eq:9-3-5}\end{equation}

Substituting the eigenvalue equation $\eqref{eq:9-3-1}$ into $\eqref{eq:9-3-5}$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} = (\lambda_i \boldsymbol{v}_i)^\top = \lambda_i \boldsymbol{v}_i^\top \label{eq:9-3-6}\end{equation}

Applying $\eqref{eq:9-3-6}$ to the second term of $\eqref{eq:9-3-4}$,

\begin{equation}\boldsymbol{v}_i^\top \boldsymbol{A} (\partial \boldsymbol{v}_i) = \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-7}\end{equation}

Substituting $\eqref{eq:9-3-7}$ into $\eqref{eq:9-3-4}$,

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = \partial \lambda_i + \lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) \label{eq:9-3-8}\end{equation}

Subtracting $\lambda_i \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i)$ from both sides of $\eqref{eq:9-3-8}$, the terms involving $\partial \boldsymbol{v}_i$ cancel:

\begin{equation}\boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i = \partial \lambda_i \label{eq:9-3-9}\end{equation}

Rewriting $\eqref{eq:9-3-9}$ gives the final result.

\begin{equation}\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-3-10}\end{equation}

Remark: This formula is also known as the "eigenvalue perturbation formula." A key point is that computing $\partial \boldsymbol{v}_i$ is not required. Setting $\partial \boldsymbol{A} = \boldsymbol{E}_{jk}$ (the matrix with 1 in entry $(j,k)$ and 0 elsewhere) gives $\displaystyle\frac{\partial \lambda_i}{\partial A_{jk}} = v_{i,j} v_{i,k}$.

9.4 Derivative of an Eigenvector

Formula: $\partial \boldsymbol{v}_i = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \boldsymbol{A}) \boldsymbol{v}_i$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is a real symmetric matrix, $\lambda_i$ is a simple eigenvalue, $\boldsymbol{v}_i \in \mathbb{R}^n$ is the corresponding normalized eigenvector ($\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$), $(\cdot)^+$ denotes the Moore–Penrose pseudoinverse

Proof

Differentiating the eigenvalue equation $\boldsymbol{A} \boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i$,

Rearranging $\eqref{eq:9-4-1}$ for $\partial \boldsymbol{v}_i$ by collecting $\boldsymbol{A}(\partial \boldsymbol{v}_i)$ and $\lambda_i(\partial \boldsymbol{v}_i)$,

\begin{equation}\boldsymbol{A} (\partial \boldsymbol{v}_i) - \lambda_i (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-2}\end{equation}

Factoring the left-hand side of $\eqref{eq:9-4-2}$,

\begin{equation}(\boldsymbol{A} - \lambda_i \boldsymbol{I}) (\partial \boldsymbol{v}_i) = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-3}\end{equation}

Multiplying both sides of $\eqref{eq:9-4-3}$ by $-1$ to reverse the sign,

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = (\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i \label{eq:9-4-4}\end{equation}

Differentiating the normalization condition $\boldsymbol{v}_i^\top \boldsymbol{v}_i = 1$,

\begin{equation}(\partial \boldsymbol{v}_i)^\top \boldsymbol{v}_i + \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0 \label{eq:9-4-5}\end{equation}

Since $(\partial \boldsymbol{v}_i)^\top \boldsymbol{v}_i$ is a scalar, it equals its transpose $\boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i)$. Therefore $\eqref{eq:9-4-5}$ gives

\begin{equation}2 \boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0 \label{eq:9-4-6}\end{equation}

Hence $\boldsymbol{v}_i^\top (\partial \boldsymbol{v}_i) = 0$, i.e., $\partial \boldsymbol{v}_i$ is orthogonal to $\boldsymbol{v}_i$.

Examine the matrix $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$. Since $\boldsymbol{A}$ is symmetric, it is orthogonally diagonalizable with eigenvalues $\lambda_0, \ldots, \lambda_{n-1}$. The eigenvalues of $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ are $\lambda_i - \lambda_0, \ldots, \lambda_i - \lambda_{n-1}$, and in particular $\lambda_i - \lambda_i = 0$.

Therefore $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ is singular, and its null space is the one-dimensional subspace $\text{span}\{\boldsymbol{v}_i\}$.

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A}) \boldsymbol{v}_i = \boldsymbol{0} \label{eq:9-4-7}\end{equation}

Decompose the right-hand side of $\eqref{eq:9-4-4}$ into a component along $\boldsymbol{v}_i$ and a component orthogonal to $\boldsymbol{v}_i$. From 9.3, $\partial \lambda_i = \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i$. The component along $\boldsymbol{v}_i$ is

\begin{equation}\boldsymbol{v}_i \boldsymbol{v}_i^\top (\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i = (\partial \lambda_i) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i = \boldsymbol{0} \label{eq:9-4-8}\end{equation}

So the right-hand side has only a component orthogonal to $\boldsymbol{v}_i$.

Use the Moore–Penrose pseudoinverse $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$. Since $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$ is symmetric, its pseudoinverse is also symmetric and acts as an inverse on the orthogonal complement of the null space $\text{span}\{\boldsymbol{v}_i\}$.

Since the right-hand side is orthogonal to $\boldsymbol{v}_i$ by $\eqref{eq:9-4-8}$, we can left-multiply both sides of $\eqref{eq:9-4-4}$ by $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$:

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ [(\partial \boldsymbol{A}) \boldsymbol{v}_i - (\partial \lambda_i) \boldsymbol{v}_i] \label{eq:9-4-9}\end{equation}

From $\eqref{eq:9-4-6}$, $\partial \boldsymbol{v}_i \perp \boldsymbol{v}_i$, so $\partial \boldsymbol{v}_i$ is orthogonal to the null space of $(\lambda_i \boldsymbol{I} - \boldsymbol{A})$. Therefore the left-hand side of $\eqref{eq:9-4-9}$ becomes

\begin{equation}(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\lambda_i \boldsymbol{I} - \boldsymbol{A}) (\partial \boldsymbol{v}_i) = \partial \boldsymbol{v}_i \label{eq:9-4-10}\end{equation}

On the right-hand side of $\eqref{eq:9-4-9}$, $(\partial \lambda_i) \boldsymbol{v}_i$ lies in the $\boldsymbol{v}_i$ direction and belongs to the null space, so $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \lambda_i) \boldsymbol{v}_i = \boldsymbol{0}$.

Combining $\eqref{eq:9-4-10}$ and the above gives the final result.

\begin{equation}\partial \boldsymbol{v}_i = (\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ (\partial \boldsymbol{A}) \boldsymbol{v}_i \label{eq:9-4-11}\end{equation}

Remark: When $\boldsymbol{A}$ is symmetric and the eigenvectors $\{\boldsymbol{v}_j\}$ form an orthonormal set, the pseudoinverse $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+$ can be expressed as $(\lambda_i \boldsymbol{I} - \boldsymbol{A})^+ = \sum_{j \neq i} \displaystyle\frac{1}{\lambda_i - \lambda_j} \boldsymbol{v}_j \boldsymbol{v}_j^\top$. This agrees with the result of Nelson's method in 9.14.

References

Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
Matrix calculus — Wikipedia