Proofs Chapter 15: Derivatives of Special Matrices

Differentiation under Symmetry, Positive Definiteness, and Structural Constraints

This chapter proves differentiation formulas for special matrices (symmetric, positive definite, orthogonal, etc.). Optimization under symmetric matrix constraints is essential in structural equation modeling (SEM), factor analysis, and semidefinite programming (SDP), while positive definiteness constraints are directly relevant to hyperparameter learning in Gaussian processes and kernel methods. We derive the derivatives of trace, determinant, and eigenvalues of symmetric matrices by imposing symmetry constraints on the general-matrix results.

Prerequisites: Chapter 7 (Derivatives of Determinants), Chapter 8 (Derivatives of Inverse Matrices), Chapter 9 (Derivatives of Eigenvalues). Related chapter: Chapter 13 (Derivatives of Structured Matrices).

15. Derivatives of Special Matrices

Assumptions for This Chapter
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
  • All formulas are based on the denominator layout
  • For derivatives with respect to a symmetric matrix $\boldsymbol{X}$, we apply the transformation formula from Chapter 13: $\boldsymbol{G} + \boldsymbol{G}^\top - \boldsymbol{G} \circ \boldsymbol{I}$
  • When positive definiteness is required, it is noted individually

We apply the symmetric matrix differentiation formula from 13.4 to specific functions. If the general-matrix result is $\boldsymbol{G}$, then for a symmetric matrix the result is $\boldsymbol{G} + \boldsymbol{G}^\top - \boldsymbol{G} \circ \boldsymbol{I}$.

15.1 Trace Derivative for Symmetric Matrices

Formula: $\displaystyle\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{A} + \boldsymbol{A}^\top - \boldsymbol{A} \circ \boldsymbol{I}$
Conditions: $\boldsymbol{X}$ is a symmetric matrix
Proof

We compute the derivative of the trace function $f = \text{tr}(\boldsymbol{A}\boldsymbol{X})$ with respect to a symmetric matrix $\boldsymbol{X}$.

First, recall the trace derivative formula for a general matrix. From 7.1, when $\boldsymbol{X}$ is a general matrix,

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}} = \boldsymbol{A}^\top \label{eq:15-1-1}\end{equation}

Therefore, the gradient matrix for the general case is

\begin{equation}\boldsymbol{G} = \boldsymbol{A}^\top \label{eq:15-1-2}\end{equation}

When $\boldsymbol{X}$ is symmetric, we apply the transformation formula from 13.4. The derivative with respect to a symmetric matrix is

\begin{equation}\frac{\partial f}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{G} + \boldsymbol{G}^\top - \boldsymbol{G} \circ \boldsymbol{I} \label{eq:15-1-3}\end{equation}

Substituting $\eqref{eq:15-1-2}$ into $\eqref{eq:15-1-3}$:

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{A}^\top + (\boldsymbol{A}^\top)^\top - \boldsymbol{A}^\top \circ \boldsymbol{I} \label{eq:15-1-4}\end{equation}

Since $(\boldsymbol{A}^\top)^\top = \boldsymbol{A}$, equation $\eqref{eq:15-1-4}$ becomes

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{A}^\top + \boldsymbol{A} - \boldsymbol{A}^\top \circ \boldsymbol{I} \label{eq:15-1-5}\end{equation}

We now verify a property of the Hadamard product with the identity matrix. $\boldsymbol{A}^\top \circ \boldsymbol{I}$ is the diagonal matrix retaining only the diagonal entries of $\boldsymbol{A}^\top$.

\begin{equation}(\boldsymbol{A}^\top \circ \boldsymbol{I})_{ij} = (A^\top)_{ij} \cdot \delta_{ij} = A_{ji} \cdot \delta_{ij} \label{eq:15-1-6}\end{equation}

Since $\delta_{ij} = 1$ only when $i = j$,

\begin{equation}(\boldsymbol{A}^\top \circ \boldsymbol{I})_{ii} = A_{ii} \label{eq:15-1-7}\end{equation}

Similarly, the diagonal entries of $\boldsymbol{A} \circ \boldsymbol{I}$ are

\begin{equation}(\boldsymbol{A} \circ \boldsymbol{I})_{ii} = A_{ii} \label{eq:15-1-8}\end{equation}

From $\eqref{eq:15-1-7}$ and $\eqref{eq:15-1-8}$, the diagonal entries coincide.

\begin{equation}\boldsymbol{A}^\top \circ \boldsymbol{I} = \boldsymbol{A} \circ \boldsymbol{I} \label{eq:15-1-9}\end{equation}

Substituting $\eqref{eq:15-1-9}$ into $\eqref{eq:15-1-5}$ yields the final result.

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{A} + \boldsymbol{A}^\top - \boldsymbol{A} \circ \boldsymbol{I} \label{eq:15-1-10}\end{equation}

Remark: When $\boldsymbol{A}$ is itself symmetric, $\boldsymbol{A}^\top = \boldsymbol{A}$, so $\eqref{eq:15-1-10}$ simplifies to $2\boldsymbol{A} - \boldsymbol{A} \circ \boldsymbol{I}$.

15.2 Determinant Derivative for Symmetric Matrices

Formula: $\displaystyle\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = |\boldsymbol{X}|\left(2\boldsymbol{X}^{-1} - \boldsymbol{X}^{-1} \circ \boldsymbol{I}\right)$
Conditions: $\boldsymbol{X}$ is a symmetric nonsingular matrix
Proof

We compute the derivative of the determinant $f = |\boldsymbol{X}|$ with respect to a symmetric nonsingular matrix $\boldsymbol{X}$.

First, recall the determinant derivative formula for a general matrix. From 8.1, when $\boldsymbol{X}$ is a general nonsingular matrix,

\begin{equation}\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}| (\boldsymbol{X}^{-1})^\top \label{eq:15-2-1}\end{equation}

Therefore, the gradient matrix for the general case is

\begin{equation}\boldsymbol{G} = |\boldsymbol{X}| (\boldsymbol{X}^{-1})^\top \label{eq:15-2-2}\end{equation}

When $\boldsymbol{X}$ is symmetric, we verify a property of the inverse. The inverse of a symmetric matrix is also symmetric.

\begin{equation}\boldsymbol{X}^\top = \boldsymbol{X} \implies (\boldsymbol{X}^{-1})^\top = \boldsymbol{X}^{-1} \label{eq:15-2-3}\end{equation}

Proof of $\eqref{eq:15-2-3}$: Taking the transpose of both sides of $\boldsymbol{X} \boldsymbol{X}^{-1} = \boldsymbol{I}$,

\begin{equation}(\boldsymbol{X}^{-1})^\top \boldsymbol{X}^\top = \boldsymbol{I} \label{eq:15-2-4}\end{equation}

Substituting $\boldsymbol{X}^\top = \boldsymbol{X}$,

\begin{equation}(\boldsymbol{X}^{-1})^\top \boldsymbol{X} = \boldsymbol{I} \label{eq:15-2-5}\end{equation}

Multiplying both sides of $\eqref{eq:15-2-5}$ on the right by $\boldsymbol{X}^{-1}$,

\begin{equation}(\boldsymbol{X}^{-1})^\top = \boldsymbol{X}^{-1} \label{eq:15-2-6}\end{equation}

Substituting $\eqref{eq:15-2-6}$ into $\eqref{eq:15-2-2}$,

\begin{equation}\boldsymbol{G} = |\boldsymbol{X}| \boldsymbol{X}^{-1} \label{eq:15-2-7}\end{equation}

When $\boldsymbol{X}$ is symmetric, we apply the transformation formula from 13.4.

\begin{equation}\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{G} + \boldsymbol{G}^\top - \boldsymbol{G} \circ \boldsymbol{I} \label{eq:15-2-8}\end{equation}

Substituting $\eqref{eq:15-2-7}$ into $\eqref{eq:15-2-8}$. Since $\boldsymbol{G}^\top = (|\boldsymbol{X}| \boldsymbol{X}^{-1})^\top = |\boldsymbol{X}| (\boldsymbol{X}^{-1})^\top$ and by $\eqref{eq:15-2-6}$, $(\boldsymbol{X}^{-1})^\top = \boldsymbol{X}^{-1}$,

\begin{equation}\boldsymbol{G}^\top = |\boldsymbol{X}| \boldsymbol{X}^{-1} = \boldsymbol{G} \label{eq:15-2-9}\end{equation}

Substituting $\eqref{eq:15-2-9}$ into $\eqref{eq:15-2-8}$:

\begin{equation}\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{G} + \boldsymbol{G} - \boldsymbol{G} \circ \boldsymbol{I} = 2\boldsymbol{G} - \boldsymbol{G} \circ \boldsymbol{I} \label{eq:15-2-10}\end{equation}

Substituting $\boldsymbol{G}$ from $\eqref{eq:15-2-7}$,

\begin{equation}\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = 2|\boldsymbol{X}| \boldsymbol{X}^{-1} - |\boldsymbol{X}| \boldsymbol{X}^{-1} \circ \boldsymbol{I} \label{eq:15-2-11}\end{equation}

Factoring out $|\boldsymbol{X}|$ yields the final result.

\begin{equation}\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = |\boldsymbol{X}|\left(2\boldsymbol{X}^{-1} - \boldsymbol{X}^{-1} \circ \boldsymbol{I}\right) \label{eq:15-2-12}\end{equation}

15.3 Log-Determinant Derivative for Symmetric Matrices

Formula: $\displaystyle\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = 2\boldsymbol{X}^{-1} - \boldsymbol{X}^{-1} \circ \boldsymbol{I}$
Conditions: $\boldsymbol{X}$ is a symmetric positive definite matrix
Proof

We compute the derivative of the log-determinant $f = \log|\boldsymbol{X}|$ with respect to a symmetric positive definite matrix $\boldsymbol{X}$.

First, recall the log-determinant derivative formula for a general matrix. From 8.2, when $\boldsymbol{X}$ is a general positive definite matrix,

\begin{equation}\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}} = (\boldsymbol{X}^{-1})^\top \label{eq:15-3-1}\end{equation}

Therefore, the gradient matrix for the general case is

\begin{equation}\boldsymbol{G} = (\boldsymbol{X}^{-1})^\top \label{eq:15-3-2}\end{equation}

When $\boldsymbol{X}$ is symmetric, by $\eqref{eq:15-2-6}$ from 15.2,

\begin{equation}(\boldsymbol{X}^{-1})^\top = \boldsymbol{X}^{-1} \label{eq:15-3-3}\end{equation}

Substituting $\eqref{eq:15-3-3}$ into $\eqref{eq:15-3-2}$,

\begin{equation}\boldsymbol{G} = \boldsymbol{X}^{-1} \label{eq:15-3-4}\end{equation}

When $\boldsymbol{X}$ is symmetric, we apply the transformation formula from 13.4.

\begin{equation}\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{G} + \boldsymbol{G}^\top - \boldsymbol{G} \circ \boldsymbol{I} \label{eq:15-3-5}\end{equation}

Since $\boldsymbol{G} = \boldsymbol{X}^{-1}$ is symmetric, $\boldsymbol{G}^\top = \boldsymbol{G}$.

\begin{equation}\boldsymbol{G}^\top = \boldsymbol{X}^{-1} = \boldsymbol{G} \label{eq:15-3-6}\end{equation}

Substituting $\eqref{eq:15-3-4}$ and $\eqref{eq:15-3-6}$ into $\eqref{eq:15-3-5}$:

\begin{equation}\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = \boldsymbol{X}^{-1} + \boldsymbol{X}^{-1} - \boldsymbol{X}^{-1} \circ \boldsymbol{I} \label{eq:15-3-7}\end{equation}

Simplifying $\eqref{eq:15-3-7}$ yields the final result.

\begin{equation}\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{sym}} = 2\boldsymbol{X}^{-1} - \boldsymbol{X}^{-1} \circ \boldsymbol{I} \label{eq:15-3-8}\end{equation}

Remark: Equation $\eqref{eq:15-3-8}$ is $\eqref{eq:15-2-12}$ from 15.2 divided by $|\boldsymbol{X}|$. This can also be verified via the chain rule $\displaystyle\frac{\partial \log|\boldsymbol{X}|}{\partial \boldsymbol{X}} = \displaystyle\frac{1}{|\boldsymbol{X}|}\displaystyle\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}}$.

15.4 Trace Derivative for Diagonal Matrices

Formula: $\displaystyle\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{diag}} = \boldsymbol{A} \circ \boldsymbol{I}$
Conditions: $\boldsymbol{X} = \text{diag}(x_0, \ldots, x_{n-1})$ is a diagonal matrix
Proof

We compute the derivative of the trace function $f = \text{tr}(\boldsymbol{A}\boldsymbol{X})$ with respect to a diagonal matrix $\boldsymbol{X} = \text{diag}(x_0, \ldots, x_{n-1})$.

First, expand $\text{tr}(\boldsymbol{A}\boldsymbol{X})$ in components. By the definition of the trace, we compute the sum of the diagonal entries of $\boldsymbol{AX}$.

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{X}) = \sum_{i=0}^{n-1} (\boldsymbol{A}\boldsymbol{X})_{ii} \label{eq:15-4-1a}\end{equation}

By the definition of matrix multiplication, expand $(\boldsymbol{AX})_{ii}$.

\begin{equation}(\boldsymbol{A}\boldsymbol{X})_{ii} = \sum_{k=0}^{n-1} A_{ik} X_{ki} \label{eq:15-4-1b}\end{equation}

Substituting $\eqref{eq:15-4-1b}$ into $\eqref{eq:15-4-1a}$:

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{X}) = \sum_{i=0}^{n-1} \sum_{k=0}^{n-1} A_{ik} X_{ki} \label{eq:15-4-1}\end{equation}

When $\boldsymbol{X}$ is diagonal, all off-diagonal entries are zero. That is,

\begin{equation}X_{ki} = \begin{cases} x_k & (k = i) \\ 0 & (k \neq i) \end{cases} \label{eq:15-4-2}\end{equation}

where $x_k = X_{kk}$ is the $k$-th diagonal entry of $\boldsymbol{X}$. Using the Kronecker delta $\delta_{ki}$, equation $\eqref{eq:15-4-2}$ can be written as $X_{ki} = x_k \delta_{ki}$.

Substituting $\eqref{eq:15-4-2}$ into $\eqref{eq:15-4-1}$. Since $X_{ki} = 0$ when $k \neq i$, only the $k = i$ terms survive.

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{X}) = \sum_{i=0}^{n-1} \sum_{k=0}^{n-1} A_{ik} x_k \delta_{ki} = \sum_{i=0}^{n-1} A_{ii} x_i \label{eq:15-4-3}\end{equation}

The last equality uses the fact that $\delta_{ki} = 1$ only when $k = i$.

Differentiating $\eqref{eq:15-4-3}$ with respect to the independent variable $x_j$ ($j = 0, \ldots, n-1$):

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial x_j} = \frac{\partial}{\partial x_j} \sum_{i=0}^{n-1} A_{ii} x_i \label{eq:15-4-4}\end{equation}

Since $A_{ii}$ is a constant and $\frac{\partial x_i}{\partial x_j} = \delta_{ij}$,

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial x_j} = \sum_{i=0}^{n-1} A_{ii} \delta_{ij} = A_{jj} \label{eq:15-4-5}\end{equation}

Expressing $\eqref{eq:15-4-5}$ in matrix form. The derivative with respect to a diagonal matrix $\boldsymbol{X}$ is represented as the diagonal matrix corresponding to the independent variables $x_0, \ldots, x_{n-1}$.

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{diag}} = \text{diag}(A_{00}, A_{11}, \ldots, A_{n-1,n-1}) \label{eq:15-4-6}\end{equation}

The right-hand side is the matrix consisting of only the diagonal entries of $\boldsymbol{A}$. This can be expressed using the Hadamard product.

\begin{equation}\text{diag}(A_{00}, A_{11}, \ldots, A_{n-1,n-1}) = \boldsymbol{A} \circ \boldsymbol{I} \label{eq:15-4-7}\end{equation}

Substituting $\eqref{eq:15-4-7}$ into $\eqref{eq:15-4-6}$ yields the final result.

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{X})}{\partial \boldsymbol{X}}\bigg|_{\boldsymbol{X}: \text{diag}} = \boldsymbol{A} \circ \boldsymbol{I} \label{eq:15-4-8}\end{equation}

Remark: $\boldsymbol{A} \circ \boldsymbol{I}$ is the Hadamard product (element-wise product), which extracts only the diagonal entries of $\boldsymbol{A}$. In component form, $(\boldsymbol{A} \circ \boldsymbol{I})_{ij} = A_{ij} \delta_{ij}$.

15.5 Trace Derivative for Toeplitz Matrices

Formula: $\displaystyle\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial \boldsymbol{T}}\bigg|_{\boldsymbol{T}: \text{Toeplitz}} = \boldsymbol{\alpha}(\boldsymbol{A})$
where $\alpha(\boldsymbol{A})_{ij} = \sum_{k-l=i-j} A_{lk}$ (diagonal element sums of $\boldsymbol{A}^\top$)
Conditions: $\boldsymbol{T}$ is a Toeplitz matrix ($T_{ij} = t_{i-j}$)
Proof

We compute the derivative of the trace function $f = \text{tr}(\boldsymbol{A}\boldsymbol{T})$ with respect to a Toeplitz matrix $\boldsymbol{T}$.

A Toeplitz matrix is a matrix whose entries along each diagonal are all equal. That is,

\begin{equation}T_{ij} = t_{i-j} \label{eq:15-5-1}\end{equation}

where $t_k$ ($k = -(n-1), \ldots, -1, 0, 1, \ldots, n-1$) are independent parameters. For example, when $n = 3$,

\begin{equation}\boldsymbol{T} = \begin{pmatrix} t_0 & t_{-1} & t_{-2} \\ t_1 & t_0 & t_{-1} \\ t_2 & t_1 & t_0 \end{pmatrix} \label{eq:15-5-2}\end{equation}

Expanding $\text{tr}(\boldsymbol{A}\boldsymbol{T})$ in components. By the definition of the trace,

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{T}) = \sum_{i=0}^{n-1} (\boldsymbol{A}\boldsymbol{T})_{ii} \label{eq:15-5-3a}\end{equation}

By the definition of matrix multiplication, expand $(\boldsymbol{AT})_{ii}$.

\begin{equation}(\boldsymbol{A}\boldsymbol{T})_{ii} = \sum_{j=0}^{n-1} A_{ij} T_{ji} \label{eq:15-5-3b}\end{equation}

Substituting $\eqref{eq:15-5-3b}$ into $\eqref{eq:15-5-3a}$:

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{T}) = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} A_{ij} T_{ji} \label{eq:15-5-3}\end{equation}

By $\eqref{eq:15-5-1}$, the $(j, i)$ entry of the Toeplitz matrix is $T_{ji} = t_{j-i}$. Substituting into $\eqref{eq:15-5-3}$:

\begin{equation}\text{tr}(\boldsymbol{A}\boldsymbol{T}) = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} A_{ij} t_{j-i} \label{eq:15-5-4}\end{equation}

Differentiating $\eqref{eq:15-5-4}$ with respect to the independent variable $t_k$. The terms contributing to $t_k$ are those satisfying $j - i = k$.

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial t_k} = \sum_{\substack{i, j \\ j - i = k}} A_{ij} \label{eq:15-5-5}\end{equation}

Interpreting the right-hand side of $\eqref{eq:15-5-5}$: it sums the entries $A_{ij}$ satisfying $j - i = k$. This is the sum of elements along the $k$-th diagonal of $\boldsymbol{A}$.

Writing out the diagonals of $\boldsymbol{A}$ explicitly. For $k = 0$ (main diagonal),

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial t_0} = \sum_{i=0}^{n-1} A_{ii} \label{eq:15-5-6}\end{equation}

For $k > 0$ (upper triangular diagonals),

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial t_k} = \sum_{i=0}^{n-k-1} A_{i, i+k} \label{eq:15-5-7}\end{equation}

For $k < 0$ (lower triangular diagonals),

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial t_k} = \sum_{j=0}^{n+k-1} A_{j-k, j} \label{eq:15-5-8}\end{equation}

Expressing the result as a matrix $\boldsymbol{\alpha}(\boldsymbol{A})$. $\boldsymbol{\alpha}(\boldsymbol{A})$ is a matrix with Toeplitz structure, whose $(i, j)$ entry is

\begin{equation}\alpha(\boldsymbol{A})_{ij} = \sum_{k-l=i-j} A_{lk} = \frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial t_{i-j}} \label{eq:15-5-9}\end{equation}

This is the sum of elements along the $(i-j)$-th diagonal of $\boldsymbol{A}^\top$.

Therefore, we obtain the final result.

\begin{equation}\frac{\partial \text{tr}(\boldsymbol{A}\boldsymbol{T})}{\partial \boldsymbol{T}}\bigg|_{\boldsymbol{T}: \text{Toeplitz}} = \boldsymbol{\alpha}(\boldsymbol{A}) \label{eq:15-5-10}\end{equation}

Remark: $\boldsymbol{\alpha}(\boldsymbol{A})$ itself has Toeplitz structure. For a symmetric Toeplitz matrix, the result becomes $\boldsymbol{\alpha}(\boldsymbol{A}) + \boldsymbol{\alpha}(\boldsymbol{A})^\top - \boldsymbol{\alpha}(\boldsymbol{A}) \circ \boldsymbol{I}$, analogous to 13.4.
Source: Toeplitz matrices originate from O. Toeplitz (1911) "Zur Theorie der quadratischen und bilinearen Formen von unendlichvielen Veränderlichen", Mathematische Annalen 70, 351–376.

15.6 Derivative of the Condition Number

Formula: $\displaystyle\frac{\partial c(\boldsymbol{A})}{\partial \boldsymbol{A}} = \displaystyle\frac{1}{\lambda_{\min}}\boldsymbol{v}_{\max}\boldsymbol{v}_{\max}^\top - \displaystyle\frac{c(\boldsymbol{A})}{\lambda_{\min}}\boldsymbol{v}_{\min}\boldsymbol{v}_{\min}^\top$
Conditions: $\boldsymbol{A}$ is a symmetric positive definite matrix, $c(\boldsymbol{A}) = \lambda_{\max}/\lambda_{\min}$, $\boldsymbol{v}_{\max}$ and $\boldsymbol{v}_{\min}$ are the corresponding normalized eigenvectors
Proof

We compute the derivative of the condition number $c(\boldsymbol{A}) = \lambda_{\max} / \lambda_{\min}$ of a symmetric positive definite matrix $\boldsymbol{A}$, where $\lambda_{\max}$ and $\lambda_{\min}$ are the largest and smallest eigenvalues of $\boldsymbol{A}$, respectively.

Decompose the condition number into numerator and denominator.

\begin{equation}c(\boldsymbol{A}) = \frac{\lambda_{\max}}{\lambda_{\min}} \label{eq:15-6-1}\end{equation}

Apply the quotient rule (1.28). Setting $u = \lambda_{\max}$ and $v = \lambda_{\min}$,

\begin{equation}\frac{\partial c}{\partial \boldsymbol{A}} = \frac{\partial}{\partial \boldsymbol{A}} \left( \frac{u}{v} \right) = \frac{1}{v} \frac{\partial u}{\partial \boldsymbol{A}} - \frac{u}{v^2} \frac{\partial v}{\partial \boldsymbol{A}} \label{eq:15-6-2}\end{equation}

Substituting $u = \lambda_{\max}$ and $v = \lambda_{\min}$ into $\eqref{eq:15-6-2}$:

\begin{equation}\frac{\partial c}{\partial \boldsymbol{A}} = \frac{1}{\lambda_{\min}} \frac{\partial \lambda_{\max}}{\partial \boldsymbol{A}} - \frac{\lambda_{\max}}{\lambda_{\min}^2} \frac{\partial \lambda_{\min}}{\partial \boldsymbol{A}} \label{eq:15-6-3}\end{equation}

Apply the eigenvalue derivative formula. From 9.3, for a simple eigenvalue $\lambda_i$ of a symmetric matrix $\boldsymbol{A}$,

\begin{equation}\frac{\partial \lambda_i}{\partial \boldsymbol{A}} = \boldsymbol{v}_i \boldsymbol{v}_i^\top \label{eq:15-6-4}\end{equation}

where $\boldsymbol{v}_i$ is the normalized eigenvector corresponding to $\lambda_i$ ($\|\boldsymbol{v}_i\| = 1$).

Applying $\eqref{eq:15-6-4}$ to $\lambda_{\max}$ and $\lambda_{\min}$:

\begin{equation}\frac{\partial \lambda_{\max}}{\partial \boldsymbol{A}} = \boldsymbol{v}_{\max} \boldsymbol{v}_{\max}^\top \label{eq:15-6-5}\end{equation}

\begin{equation}\frac{\partial \lambda_{\min}}{\partial \boldsymbol{A}} = \boldsymbol{v}_{\min} \boldsymbol{v}_{\min}^\top \label{eq:15-6-6}\end{equation}

Substituting $\eqref{eq:15-6-5}$ and $\eqref{eq:15-6-6}$ into $\eqref{eq:15-6-3}$:

\begin{equation}\frac{\partial c}{\partial \boldsymbol{A}} = \frac{1}{\lambda_{\min}} \boldsymbol{v}_{\max} \boldsymbol{v}_{\max}^\top - \frac{\lambda_{\max}}{\lambda_{\min}^2} \boldsymbol{v}_{\min} \boldsymbol{v}_{\min}^\top \label{eq:15-6-7}\end{equation}

Simplify the second term. From $\eqref{eq:15-6-1}$, $c = \lambda_{\max} / \lambda_{\min}$, so

\begin{equation}\frac{\lambda_{\max}}{\lambda_{\min}^2} = \frac{\lambda_{\max}}{\lambda_{\min}} \cdot \frac{1}{\lambda_{\min}} = \frac{c}{\lambda_{\min}} \label{eq:15-6-8}\end{equation}

Substituting $\eqref{eq:15-6-8}$ into $\eqref{eq:15-6-7}$ yields the final result.

\begin{equation}\frac{\partial c(\boldsymbol{A})}{\partial \boldsymbol{A}} = \frac{1}{\lambda_{\min}} \boldsymbol{v}_{\max} \boldsymbol{v}_{\max}^\top - \frac{c(\boldsymbol{A})}{\lambda_{\min}} \boldsymbol{v}_{\min} \boldsymbol{v}_{\min}^\top \label{eq:15-6-9}\end{equation}

Remark: This holds when $\lambda_{\max}$ and $\lambda_{\min}$ are simple eigenvalues. The first term in $\eqref{eq:15-6-9}$ is positive ($\boldsymbol{v}_{\max} \boldsymbol{v}_{\max}^\top$ is positive semidefinite) and the second term is negative. This means that perturbing $\boldsymbol{A}$ in the direction of the largest eigenvector increases the condition number, while perturbing in the direction of the smallest eigenvector decreases it.

References

  • Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
  • Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
  • Matrix calculus - Wikipedia