Proofs Chapter 7: Determinant Derivatives
Proofs: Determinant Derivatives
This chapter proves the matrix derivatives of the determinant det(X) and the log-determinant log det(X). Determinant derivatives are required in core areas of statistics and machine learning, including optimization of the log-likelihood of multivariate normal distributions (covariance matrix estimation), hyperparameter learning in Gaussian process regression, and computation of marginal likelihoods in Bayesian inference. The proofs are based on cofactor expansion and Jacobi's formula for the derivative of a determinant, deriving matrix-form results from component-wise calculations.
Prerequisites: Chapter 4 (Basic Matrix Derivative Formulas), Chapter 5 (Trace Derivatives). Chapters that use results from this chapter: Chapter 8 (Inverse Matrix Derivatives), Chapter 9 (Eigenvalue Derivatives).
Derivatives of Determinants
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
- The matrix $\boldsymbol{X}$ is invertible ($\det \boldsymbol{X} \neq 0$), and the derivatives are defined on the open set $\{\boldsymbol{X} \in \mathbb{R}^{N \times N} : \det \boldsymbol{X} \neq 0\}$ where invertibility is preserved
- For formulas involving non-square matrices, a full rank condition ($\mathrm{rank}(\boldsymbol{X}) = \min(M, N)$) is required
- All formulas are based on the denominator layout
7.1 Derivative of the Determinant $|\boldsymbol{X}|$
Proof
We recall the cofactor expansion of a determinant (A.13). The determinant can be expanded along any row $i$ as the sum of products of row elements and their corresponding cofactors.
\begin{equation}|\boldsymbol{X}| = \sum_{j=0}^{N-1} X_{ij} \tilde{X}_{ij}\label{eq:7-1-1}\end{equation}
Here $\tilde{X}_{ij}$ is the $(i, j)$ cofactor of $\boldsymbol{X}$.
We recall the definition of a cofactor. $\tilde{X}_{ij}$ is the determinant of the $(N-1) \times (N-1)$ submatrix obtained by removing the $i$-th row and $j$-th column from $\boldsymbol{X}$ (the minor), multiplied by the sign $(-1)^{i+j}$.
\begin{equation}\tilde{X}_{ij} = (-1)^{i+j} M_{ij}\label{eq:7-1-2}\end{equation}
Here $M_{ij}$ is the $(i, j)$ minor.
A key property is that the cofactor $\tilde{X}_{ij}$ does not involve $X_{ij}$ itself. This is because the minor $M_{ij}$ is computed by excluding the $i$-th row and $j$-th column.
We differentiate the determinant $|\boldsymbol{X}|$ with respect to the component $X_{ij}$. Using the cofactor expansion \eqref{eq:7-1-1}, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \dfrac{\partial}{\partial X_{ij}} \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik}\label{eq:7-1-3}\end{equation}
We decompose the derivative of the sum into the sum of derivatives of each term.
\begin{equation}\dfrac{\partial}{\partial X_{ij}} \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik} = \sum_{k=0}^{N-1} \dfrac{\partial}{\partial X_{ij}} (X_{ik} \tilde{X}_{ik})\label{eq:7-1-4}\end{equation}
We compute the derivative of each term. Since the cofactor $\tilde{X}_{ik}$ does not contain $X_{ij}$, it can be treated as a constant. Therefore the following holds.
\begin{equation}\dfrac{\partial}{\partial X_{ij}} (X_{ik} \tilde{X}_{ik}) = \tilde{X}_{ik} \dfrac{\partial X_{ik}}{\partial X_{ij}}\label{eq:7-1-5}\end{equation}
We compute the partial derivative of the component. $X_{ik}$ equals $X_{ij}$ only when $k = j$.
\begin{equation}\dfrac{\partial X_{ik}}{\partial X_{ij}} = \delta_{kj}\label{eq:7-1-6}\end{equation}
Here $\delta_{kj}$ is the Kronecker delta, which equals 1 when $k = j$ and 0 otherwise.
Combining \eqref{eq:7-1-5} and \eqref{eq:7-1-6}, we obtain the following.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} \delta_{kj} = \tilde{X}_{ij}\label{eq:7-1-7}\end{equation}
Since $\delta_{kj} = 1$ only when $k = j$, only the $k = j$ term survives in the sum.
Thus, the component-wise derivative of the determinant equals the cofactor.
\begin{equation}\dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \tilde{X}_{ij}\label{eq:7-1-8}\end{equation}
We consider the matrix derivative in the denominator layout. In the denominator layout, the $(i, j)$ component of the result corresponds to $\dfrac{\partial}{\partial X_{ij}}$.
\begin{equation}\left( \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| \right)_{ij} = \dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \tilde{X}_{ij}\label{eq:7-1-9}\end{equation}
We write this in matrix form. We define the adjugate matrix $\text{adj}(\boldsymbol{X})$. The $(i, j)$ component of $\text{adj}(\boldsymbol{X})$ is $\tilde{X}_{ji}$ (note the swapped indices).
Writing the result of \eqref{eq:7-1-9} as a matrix, since the $(i, j)$ component is $\tilde{X}_{ij}$, this is the transpose of $\text{adj}(\boldsymbol{X})$, namely $\text{adj}(\boldsymbol{X})^\top$.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = \text{adj}(\boldsymbol{X})^\top\label{eq:7-1-10}\end{equation}
We rewrite using the relationship between the inverse matrix and the adjugate matrix. For an invertible matrix $\boldsymbol{X}$, the following relation holds.
\begin{equation}\boldsymbol{X}^{-1} = \dfrac{1}{|\boldsymbol{X}|} \text{adj}(\boldsymbol{X})\label{eq:7-1-11}\end{equation}
Solving \eqref{eq:7-1-11} for $\text{adj}(\boldsymbol{X})$, we obtain the following.
\begin{equation}\text{adj}(\boldsymbol{X}) = |\boldsymbol{X}| \boldsymbol{X}^{-1}\label{eq:7-1-12}\end{equation}
Taking the transpose of both sides.
\begin{equation}\text{adj}(\boldsymbol{X})^\top = (|\boldsymbol{X}| \boldsymbol{X}^{-1})^\top\label{eq:7-1-13}\end{equation}
The scalar $|\boldsymbol{X}|$ is unchanged by transposition. Also, the transpose of a product is $(AB)^\top = B^\top A^\top$, but since this is a product with a scalar, the order does not matter. Therefore the following holds.
\begin{equation}\text{adj}(\boldsymbol{X})^\top = |\boldsymbol{X}| (\boldsymbol{X}^{-1})^\top = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-1-14}\end{equation}
Combining \eqref{eq:7-1-10} and \eqref{eq:7-1-14}, we obtain the final result.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-1-15}\end{equation}
7.2 Derivative of the Log-Determinant $\log|\boldsymbol{X}|$
Proof
The log-determinant is a composite function $\log(|\boldsymbol{X}|)$. The outer function is $\log$ and the inner function is $|\boldsymbol{X}|$.
We apply the chain rule for composite functions. The matrix derivative of a scalar function $f(g(\boldsymbol{X}))$ is as follows.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} f(g(\boldsymbol{X})) = f'(g(\boldsymbol{X})) \dfrac{\partial g}{\partial \boldsymbol{X}}\label{eq:7-2-1}\end{equation}
In this problem, $f(u) = \log(u)$ and $g(\boldsymbol{X}) = |\boldsymbol{X}|$.
Computing the derivative of the outer function $f(u) = \log(u)$, we obtain the following.
\begin{equation}f'(u) = \dfrac{1}{u}\label{eq:7-2-2}\end{equation}
Substituting $u = g(\boldsymbol{X}) = |\boldsymbol{X}|$, we obtain the following.
\begin{equation}f'(g(\boldsymbol{X})) = f'(|\boldsymbol{X}|) = \dfrac{1}{|\boldsymbol{X}|}\label{eq:7-2-3}\end{equation}
The derivative of the inner function is given by 7.1 as follows.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-2-4}\end{equation}
Substituting \eqref{eq:7-2-3} and \eqref{eq:7-2-4} into the chain rule \eqref{eq:7-2-1}, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} \log|\boldsymbol{X}| = \dfrac{1}{|\boldsymbol{X}|} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-2-5}\end{equation}
$\dfrac{1}{|\boldsymbol{X}|}$ and $|\boldsymbol{X}|$ cancel, yielding the final result.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} \log|\boldsymbol{X}| = \boldsymbol{X}^{-\top}\label{eq:7-2-6}\end{equation}
7.3 Derivative of a Power of the Determinant $|\boldsymbol{X}^n|$
Proof
We recall the multiplicative property of determinants (1.14). For any square matrices $\boldsymbol{A}$, $\boldsymbol{B}$, the following holds.
\begin{equation}|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|\label{eq:7-3-1}\end{equation}
Applying this property repeatedly, the determinant of $\boldsymbol{X}^n = \underbrace{\boldsymbol{X} \cdot \boldsymbol{X} \cdots \boldsymbol{X}}_{n \text{ times}}$ becomes the following.
\begin{equation}|\boldsymbol{X}^n| = |\underbrace{\boldsymbol{X} \cdot \boldsymbol{X} \cdots \boldsymbol{X}}_{n \text{ times}}| = \underbrace{|\boldsymbol{X}| \cdot |\boldsymbol{X}| \cdots |\boldsymbol{X}|}_{n \text{ times}} = |\boldsymbol{X}|^n\label{eq:7-3-2}\end{equation}
Therefore, the problem of differentiating $|\boldsymbol{X}^n|$ with respect to $\boldsymbol{X}$ reduces to differentiating $|\boldsymbol{X}|^n$ with respect to $\boldsymbol{X}$.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^n| = \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n\label{eq:7-3-3}\end{equation}
$|\boldsymbol{X}|^n$ is the $n$-th power of $|\boldsymbol{X}|$, which is a composite function. Let the outer function be $f(u) = u^n$ and the inner function be $g(\boldsymbol{X}) = |\boldsymbol{X}|$.
Computing the derivative of the outer function $f(u) = u^n$, we obtain the following.
\begin{equation}f'(u) = n u^{n-1}\label{eq:7-3-4}\end{equation}
Substituting $u = |\boldsymbol{X}|$, we obtain the following.
\begin{equation}f'(|\boldsymbol{X}|) = n |\boldsymbol{X}|^{n-1}\label{eq:7-3-5}\end{equation}
The derivative of the inner function is given by 7.1 as follows.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-3-6}\end{equation}
Applying the chain rule, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n = n |\boldsymbol{X}|^{n-1} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-3-7}\end{equation}
Using $|\boldsymbol{X}|^{n-1} \cdot |\boldsymbol{X}| = |\boldsymbol{X}|^n$ and simplifying, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n = n |\boldsymbol{X}|^n \boldsymbol{X}^{-\top}\label{eq:7-3-8}\end{equation}
Since $|\boldsymbol{X}|^n = |\boldsymbol{X}^n|$ by \eqref{eq:7-3-2}, we obtain the final result.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^n| = n |\boldsymbol{X}^n| \boldsymbol{X}^{-\top}\label{eq:7-3-9}\end{equation}
7.4 Scalar Derivative of a Determinant (Jacobi's Formula)
Proof
$\det(\boldsymbol{Y})$ is a function of all components $Y_{ij}$ of $\boldsymbol{Y}$. Since each component $Y_{ij}$ is a function of the scalar $x$, $\det(\boldsymbol{Y})$ is a composite function of $x$.
We apply the multivariable chain rule. Differentiating $\det(\boldsymbol{Y})$ with respect to $x$ yields the sum of contributions through all intermediate variables $Y_{ij}$.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} \dfrac{\partial \det(\boldsymbol{Y})}{\partial Y_{ij}} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-1}\end{equation}
From the proof of 7.1, the component-wise derivative of the determinant equals the cofactor.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial Y_{ij}} = \tilde{Y}_{ij}\label{eq:7-4-2}\end{equation}
Here $\tilde{Y}_{ij}$ is the $(i, j)$ cofactor of $\boldsymbol{Y}$.
Substituting \eqref{eq:7-4-2} into \eqref{eq:7-4-1}, we obtain the following.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i,j} \tilde{Y}_{ij} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-3}\end{equation}
We use the relationship between cofactors and the inverse matrix. As shown in the proof of 7.1, the following relation holds.
\begin{equation}\text{adj}(\boldsymbol{Y}) = \det(\boldsymbol{Y}) \boldsymbol{Y}^{-1}\label{eq:7-4-4}\end{equation}
Here the $(i, j)$ component of $\text{adj}(\boldsymbol{Y})$ is $\tilde{Y}_{ji}$.
Therefore $\tilde{Y}_{ij}$ is the $(j, i)$ component of $\text{adj}(\boldsymbol{Y})$, so it can be written as follows.
\begin{equation}\tilde{Y}_{ij} = (\text{adj}(\boldsymbol{Y}))_{ji} = \det(\boldsymbol{Y}) (Y^{-1})_{ji}\label{eq:7-4-5}\end{equation}
Substituting \eqref{eq:7-4-5} into \eqref{eq:7-4-3}, we obtain the following.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i,j} \det(\boldsymbol{Y}) (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-6}\end{equation}
Since $\det(\boldsymbol{Y})$ does not depend on $i, j$, we factor it out of the sum.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \sum_{i,j} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-7}\end{equation}
Interchanging the order of summation and summing over $i$ first, we obtain the following.
\begin{equation}\sum_{i,j} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x} = \sum_{j=0}^{N-1} \sum_{i=0}^{N-1} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-8}\end{equation}
The inner sum $\displaystyle\sum_{i} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}$ corresponds to the definition of matrix multiplication. This is the $(j, j)$ component of $\boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x}$.
\begin{equation}\sum_{i=0}^{N-1} (Y^{-1})_{ji} \left(\dfrac{\partial \boldsymbol{Y}}{\partial x}\right)_{ij} = \left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)_{jj}\label{eq:7-4-9}\end{equation}
The outer sum $\sum_{j}$ is the sum of diagonal elements, i.e., the trace.
\begin{equation}\sum_{j=0}^{N-1} \left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)_{jj} = \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)\label{eq:7-4-10}\end{equation}
Combining \eqref{eq:7-4-7} and \eqref{eq:7-4-10}, we obtain the final result (Jacobi's formula).
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)\label{eq:7-4-11}\end{equation}
Source: C.G.J. Jacobi (1841) "De formatione et proprietatibus Determinantium", Journal für die reine und angewandte Mathematik 22, 285-318.
7.5 Properties of Component-wise Derivatives of the Determinant
Proof
From the proof of 7.1, the component-wise derivative of the determinant equals the cofactor.
\begin{equation}\dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} = \tilde{X}_{ik}\label{eq:7-5-1}\end{equation}
Using this result, we rewrite the sum on the left-hand side in terms of cofactors.
\begin{equation}\sum_{k=0}^{N-1} \dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} X_{jk} = \sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk}\label{eq:7-5-2}\end{equation}
We consider the cases $i = j$ and $i \neq j$ separately.
Case $i = j$. The sum becomes the following.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{ik}\label{eq:7-5-3}\end{equation}
This is precisely the cofactor expansion of the determinant. The expansion of the determinant along the $i$-th row is as follows.
\begin{equation}\det(\boldsymbol{X}) = \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik}\label{eq:7-5-4}\end{equation}
Therefore, when $i = j$, the following holds.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{ik} = \det(\boldsymbol{X})\label{eq:7-5-5}\end{equation}
Case $i \neq j$. The sum becomes the following.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk}\label{eq:7-5-6}\end{equation}
We consider the meaning of this sum. $\tilde{X}_{ik}$ is the determinant of the submatrix obtained by removing the $i$-th row and $k$-th column from $\boldsymbol{X}$, multiplied by a sign. Therefore $\tilde{X}_{ik}$ does not contain any elements from the $i$-th row of $\boldsymbol{X}$.
The sum in \eqref{eq:7-5-6} equals the cofactor expansion along the $i$-th row of the matrix $\boldsymbol{X}'$ obtained by replacing the $i$-th row of $\boldsymbol{X}$ with the $j$-th row.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk} = \det(\boldsymbol{X}')\label{eq:7-5-7}\end{equation}
Here $\boldsymbol{X}'$ is the matrix obtained by replacing the $i$-th row of $\boldsymbol{X}$ with the $j$-th row of $\boldsymbol{X}$ (note that the cofactors $\tilde{X}_{ik}$ do not depend on the $i$-th row).
In $\boldsymbol{X}'$, the $i$-th row and the $j$-th row are the same vector. That is, two rows are identical.
As a property of determinants, the determinant of a matrix with two identical rows (or columns) is 0.
\begin{equation}\det(\boldsymbol{X}') = 0\label{eq:7-5-8}\end{equation}
Therefore, when $i \neq j$, the following holds.
\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk} = 0\label{eq:7-5-9}\end{equation}
Expressing \eqref{eq:7-5-5} and \eqref{eq:7-5-9} uniformly using the Kronecker delta, we obtain the final result.
\begin{equation}\sum_{k=0}^{N-1} \dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} X_{jk} = \delta_{ij} \det(\boldsymbol{X})\label{eq:7-5-10}\end{equation}
7.6 Second Derivative of the Determinant
Proof
We restate Jacobi's formula from 7.4.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-1}\end{equation}
We differentiate both sides once more with respect to $x$. The left-hand side becomes $\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2}$. The right-hand side is a product of two factors, so we apply the product rule (1.25).
\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \dfrac{\partial}{\partial x} \left[ \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') \right]\label{eq:7-6-2}\end{equation}
Applying the product rule (1.25) $(fg)' = f'g + fg'$ with $f = \det(\boldsymbol{Y})$ and $g = \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')$, we obtain the following.
\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \dfrac{\partial \det(\boldsymbol{Y})}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') + \det(\boldsymbol{Y}) \dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-3}\end{equation}
We compute the first term. Substituting Jacobi's formula \eqref{eq:7-6-1} for $\dfrac{\partial \det(\boldsymbol{Y})}{\partial x}$, we obtain the following.
\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') \cdot \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-4}\end{equation}
Simplifying the first term, we obtain the following.
\begin{equation}\det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2\label{eq:7-6-5}\end{equation}
We compute $\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')$ in the second term. Since the trace and differentiation commute, we first differentiate inside the trace.
\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \text{tr}\left( \dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') \right)\label{eq:7-6-6}\end{equation}
Applying the product rule (1.25) to the matrix product $\boldsymbol{Y}^{-1} \boldsymbol{Y}'$, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} \boldsymbol{Y}' + \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}'}{\partial x}\label{eq:7-6-7}\end{equation}
$\dfrac{\partial \boldsymbol{Y}'}{\partial x} = \boldsymbol{Y}''$.
We apply the scalar derivative formula for the inverse. Differentiating $\boldsymbol{Y}\boldsymbol{Y}^{-1} = \boldsymbol{I}$ with respect to $x$, the following holds.
\begin{equation}\boldsymbol{Y}' \boldsymbol{Y}^{-1} + \boldsymbol{Y} \dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{O}\label{eq:7-6-8}\end{equation}
Solving \eqref{eq:7-6-8} for $\dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x}$, we obtain the following.
\begin{equation}\dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} = -\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1}\label{eq:7-6-9}\end{equation}
Substituting \eqref{eq:7-6-9} into \eqref{eq:7-6-7}, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') = -\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1} \boldsymbol{Y}' + \boldsymbol{Y}^{-1} \boldsymbol{Y}''\label{eq:7-6-10}\end{equation}
We simplify the first term. Since $\boldsymbol{Y}^{-1} \boldsymbol{Y}'$ appears twice, it can be written as follows.
\begin{equation}-\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1} \boldsymbol{Y}' = -(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2\label{eq:7-6-11}\end{equation}
Taking the trace of \eqref{eq:7-6-6}, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \text{tr}(-(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 + \boldsymbol{Y}^{-1} \boldsymbol{Y}'')\label{eq:7-6-12}\end{equation}
Using the linearity of trace to decompose, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = -\text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'')\label{eq:7-6-13}\end{equation}
Substituting the results of \eqref{eq:7-6-5} and \eqref{eq:7-6-13} into \eqref{eq:7-6-3}, we obtain the following.
\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 + \det(\boldsymbol{Y}) \left[ -\text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'') \right]\label{eq:7-6-14}\end{equation}
Factoring out $\det(\boldsymbol{Y})$, we obtain the final result.
\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \det(\boldsymbol{Y}) \left[ \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'') + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 - \text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) \right]\label{eq:7-6-15}\end{equation}
7.7 Derivative of the Determinant of a Product $|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$
Proof
By the multiplicative property of determinants (1.14), the determinant of a product of square matrices equals the product of their determinants.
\begin{equation}|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|\label{eq:7-7-1}\end{equation}
We apply this property to $\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}$. Viewing it as $(\boldsymbol{A}\boldsymbol{X})\boldsymbol{B}$, we obtain the following.
\begin{equation}|(\boldsymbol{A}\boldsymbol{X})\boldsymbol{B}| = |\boldsymbol{A}\boldsymbol{X}||\boldsymbol{B}|\label{eq:7-7-2}\end{equation}
Applying the same property to $|\boldsymbol{A}\boldsymbol{X}|$, we obtain the following.
\begin{equation}|\boldsymbol{A}\boldsymbol{X}| = |\boldsymbol{A}||\boldsymbol{X}|\label{eq:7-7-3}\end{equation}
Combining \eqref{eq:7-7-2} and \eqref{eq:7-7-3}, we obtain the following.
\begin{equation}|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{X}||\boldsymbol{B}|\label{eq:7-7-4}\end{equation}
We differentiate $|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$ with respect to $\boldsymbol{X}$. Since $|\boldsymbol{A}|$ and $|\boldsymbol{B}|$ do not depend on $\boldsymbol{X}$, they can be factored out of the differentiation operator.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = \dfrac{\partial}{\partial \boldsymbol{X}} (|\boldsymbol{A}||\boldsymbol{X}||\boldsymbol{B}|) = |\boldsymbol{A}||\boldsymbol{B}| \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|\label{eq:7-7-5}\end{equation}
Substituting $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}$ from 7.1, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}| \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-7-6}\end{equation}
Using $|\boldsymbol{A}||\boldsymbol{B}||\boldsymbol{X}| = |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$ (from \eqref{eq:7-7-4}) and simplifying, we obtain the final result.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| \boldsymbol{X}^{-\top}\label{eq:7-7-7}\end{equation}
7.8 Determinant of Quadratic Forms
7.8.1 Case where $\boldsymbol{X}$ is square and invertible
Proof
We decompose $|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|$ using the multiplicative property of determinants (1.14).
\begin{equation}|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{X}^\top||\boldsymbol{A}||\boldsymbol{X}|\label{eq:7-8-1-1}\end{equation}
By the determinant of the transpose (1.15), $|\boldsymbol{X}^\top| = |\boldsymbol{X}|$.
From \eqref{eq:7-8-1-1} and this property, we obtain the following.
\begin{equation}|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{X}||\boldsymbol{A}||\boldsymbol{X}| = |\boldsymbol{A}||\boldsymbol{X}|^2\label{eq:7-8-1-2}\end{equation}
We differentiate with respect to $\boldsymbol{X}$. Since $|\boldsymbol{A}|$ is a constant, it can be factored out of the differentiation operator.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{A}| \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^2\label{eq:7-8-1-3}\end{equation}
From 7.3 with $n = 2$, the following holds.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^2 = 2 |\boldsymbol{X}|^2 \boldsymbol{X}^{-\top}\label{eq:7-8-1-4}\end{equation}
Substituting \eqref{eq:7-8-1-4} into \eqref{eq:7-8-1-3}, we obtain the following.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{A}| \cdot 2 |\boldsymbol{X}|^2 \boldsymbol{X}^{-\top}\label{eq:7-8-1-5}\end{equation}
Using $|\boldsymbol{A}||\boldsymbol{X}|^2 = |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|$ (from \eqref{eq:7-8-1-2}) and simplifying, we obtain the final result.
\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-8-1-6}\end{equation}
7.8.2 Case where $\boldsymbol{X}$ is non-square and $\boldsymbol{A}$ is symmetric
Proof
Let $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$. $\boldsymbol{Y}$ is an $N \times N$ matrix. Since $\boldsymbol{X}$ has full rank, $\boldsymbol{Y}$ is invertible (when $\boldsymbol{A}$ is positive definite).
Using Jacobi's formula from 7.4, differentiating $|\boldsymbol{Y}|$ with respect to component $X_{ij}$ gives the following.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right)\label{eq:7-8-2-1}\end{equation}
We differentiate $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$ with respect to $X_{ij}$. Since $\boldsymbol{X}$ appears in two places, $\boldsymbol{X}^\top$ and $\boldsymbol{X}$, we apply the product rule (1.25).
\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}} \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}}\label{eq:7-8-2-2}\end{equation}
We compute $\dfrac{\partial \boldsymbol{X}}{\partial X_{ij}}$. Differentiating the component $X_{kl}$ of $\boldsymbol{X}$ with respect to $X_{ij}$ gives 1 only when $k = i$ and $l = j$, and 0 otherwise.
\begin{equation}\left( \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)_{kl} = \delta_{ki} \delta_{lj}\label{eq:7-8-2-3}\end{equation}
We express this result as a matrix. Using the standard basis vectors $\boldsymbol{e}_i \in \mathbb{R}^M$ (with only the $i$-th component equal to 1) and $\boldsymbol{e}_j \in \mathbb{R}^N$ (with only the $j$-th component equal to 1), it can be written as follows.
\begin{equation}\dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} = \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-2-4}\end{equation}
This is an $M \times N$ matrix with only the $(i, j)$ component equal to 1 and all others 0.
We compute $\dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}}$. Taking the transpose swaps rows and columns, so we obtain the following.
\begin{equation}\dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}} = \left( \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)^\top = (\boldsymbol{e}_i \boldsymbol{e}_j^\top)^\top = \boldsymbol{e}_j \boldsymbol{e}_i^\top\label{eq:7-8-2-5}\end{equation}
Substituting \eqref{eq:7-8-2-4} and \eqref{eq:7-8-2-5} into \eqref{eq:7-8-2-2}, we obtain the following.
\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-2-6}\end{equation}
We use the symmetry of $\boldsymbol{A}$. Since $\boldsymbol{A} = \boldsymbol{A}^\top$, we have $\boldsymbol{A} \boldsymbol{e}_i = \boldsymbol{A}^\top \boldsymbol{e}_i$.
Note that $\boldsymbol{e}_i^\top \boldsymbol{A} = (\boldsymbol{A}^\top \boldsymbol{e}_i)^\top = (\boldsymbol{A} \boldsymbol{e}_i)^\top$. $\boldsymbol{A} \boldsymbol{e}_i$ is the $i$-th column vector of $\boldsymbol{A}$.
We analyze the first term of \eqref{eq:7-8-2-6}. $\boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X}$ is an $N \times N$ matrix.
Substituting into \eqref{eq:7-8-2-1} and computing the trace, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) + \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right)\label{eq:7-8-2-7}\end{equation}
Applying the cyclic property of trace $\text{tr}(\boldsymbol{P}\boldsymbol{Q}\boldsymbol{R}) = \text{tr}(\boldsymbol{R}\boldsymbol{P}\boldsymbol{Q})$ to the first term, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) = \text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right)\label{eq:7-8-2-8}\end{equation}
$\boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j$ is a $1 \times 1$ scalar, so the trace equals the value itself.
\begin{equation}\text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right) = \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-9}\end{equation}
Similarly, applying the cyclic property of trace to the second term, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right) = \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right)\label{eq:7-8-2-10}\end{equation}
This is also a $1 \times 1$ scalar, so the trace equals the value itself.
\begin{equation}\text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right) = (\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji} = ((\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-\top}))_{ij}\label{eq:7-8-2-11}\end{equation}
We verify that $\boldsymbol{Y}$ is symmetric. Since $\boldsymbol{A} = \boldsymbol{A}^\top$, the following holds.
\begin{equation}\boldsymbol{Y}^\top = (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^\top = \boldsymbol{X}^\top \boldsymbol{A}^\top (\boldsymbol{X}^\top)^\top = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X} = \boldsymbol{Y}\label{eq:7-8-2-12}\end{equation}
Since $\boldsymbol{Y}$ is symmetric, $\boldsymbol{Y}^{-1}$ is also symmetric. Therefore $\boldsymbol{Y}^{-\top} = \boldsymbol{Y}^{-1}$.
Using the results of \eqref{eq:7-8-2-9} and \eqref{eq:7-8-2-11} and simplifying \eqref{eq:7-8-2-7}, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} + (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} = 2(\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-13}\end{equation}
Substituting into \eqref{eq:7-8-2-1}, we obtain the derivative with respect to component $X_{ij}$.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \cdot 2(\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} = 2|\boldsymbol{Y}| (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-14}\end{equation}
In the denominator layout, the $(i, j)$ component of the result corresponds to $\dfrac{\partial}{\partial X_{ij}}$. Therefore, in matrix form, we obtain the following.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial \boldsymbol{X}} = 2|\boldsymbol{Y}| \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1}\label{eq:7-8-2-15}\end{equation}
Substituting $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$, we obtain the final result.
\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1}\label{eq:7-8-2-16}\end{equation}
7.8.3 Case where $\boldsymbol{X}$ is non-square and $\boldsymbol{A}$ is general
Proof
Let $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$. $\boldsymbol{Y}$ is an $N \times N$ matrix.
Using Jacobi's formula from 7.4, we obtain the following.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right)\label{eq:7-8-3-1}\end{equation}
As in 7.8.2, differentiating $\boldsymbol{Y}$ with respect to $X_{ij}$ gives the following.
\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-3-2}\end{equation}
Splitting the trace into two terms and computing, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) + \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right)\label{eq:7-8-3-3}\end{equation}
We compute the first term. Using the cyclic property of trace, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) = \text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-3-4}\end{equation}
We compute the second term. Similarly using the cyclic property of trace, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right) = \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right) = (\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji}\label{eq:7-8-3-5}\end{equation}
We transform the result of \eqref{eq:7-8-3-5}. $(\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji}$ equals the $(i, j)$ component of the transpose of the matrix $\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A}$.
\begin{equation}(\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji} = ((\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})^\top)_{ij} = (\boldsymbol{A}^\top \boldsymbol{X} \boldsymbol{Y}^{-\top})_{ij}\label{eq:7-8-3-6}\end{equation}
When $\boldsymbol{A}$ is a general matrix, $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$ is not necessarily symmetric. Therefore $\boldsymbol{Y}^{-\top} \neq \boldsymbol{Y}^{-1}$.
We compute $\boldsymbol{Y}^\top = (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^\top = \boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X}$. This has the form where $\boldsymbol{A}$ is replaced by $\boldsymbol{A}^\top$.
Therefore the following holds.
\begin{equation}\boldsymbol{Y}^{-\top} = (\boldsymbol{Y}^\top)^{-1} = (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1}\label{eq:7-8-3-7}\end{equation}
Combining \eqref{eq:7-8-3-4} and \eqref{eq:7-8-3-6}, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} \boldsymbol{Y}^{-\top})_{ij}\label{eq:7-8-3-8}\end{equation}
Substituting \eqref{eq:7-8-3-7}, we obtain the following.
\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1})_{ij}\label{eq:7-8-3-9}\end{equation}
Substituting into \eqref{eq:7-8-3-1}, we obtain the derivative with respect to component $X_{ij}$.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \left[ (\boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1})_{ij} \right]\label{eq:7-8-3-10}\end{equation}
Writing in matrix form using the denominator layout, we obtain the following.
\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial \boldsymbol{X}} = |\boldsymbol{Y}| \left( \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1} + \boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1} \right)\label{eq:7-8-3-11}\end{equation}
Substituting $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$, we obtain the final result.
\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \left( \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1} + \boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1} \right)\label{eq:7-8-3-12}\end{equation}
7.9 Log-Determinant of the Gram Matrix
7.9.1 Derivative with respect to $\boldsymbol{X}$
Proof
In 7.8.2, set $\boldsymbol{A} = \boldsymbol{I}$ (the identity matrix). Since $\boldsymbol{I}$ is symmetric, the formula from 7.8.2 applies.
From 7.8.2, the following holds.
\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X}| \boldsymbol{I} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X})^{-1}\label{eq:7-9-1-1}\end{equation}
Substituting $\boldsymbol{I} \boldsymbol{X} = \boldsymbol{X}$ and $\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X} = \boldsymbol{X}^\top \boldsymbol{X}$, we obtain the following.
\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{X}| \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-2}\end{equation}
To differentiate $\log |\boldsymbol{X}^\top \boldsymbol{X}|$ with respect to $\boldsymbol{X}$, we apply the chain rule. The outer function is $\log$ and the inner function is $|\boldsymbol{X}^\top \boldsymbol{X}|$.
Writing out the chain rule with $f(u) = \log(u)$ and $g(\boldsymbol{X}) = |\boldsymbol{X}^\top \boldsymbol{X}|$, the following holds.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} \dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}}\label{eq:7-9-1-3}\end{equation}
Substituting the result of \eqref{eq:7-9-1-2} into \eqref{eq:7-9-1-3}, we obtain the following.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} \cdot 2 |\boldsymbol{X}^\top \boldsymbol{X}| \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-4}\end{equation}
$\dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|}$ and $|\boldsymbol{X}^\top \boldsymbol{X}|$ cancel.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-5}\end{equation}
We derive the relationship with the Moore-Penrose pseudoinverse. When $\boldsymbol{X}$ has full rank ($\text{rank}(\boldsymbol{X}) = N$), the left inverse is defined as follows.
\begin{equation}\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top\label{eq:7-9-1-6}\end{equation}
We compute the transpose of $\boldsymbol{X}^{+}$. Since the transpose of a product is $(\boldsymbol{P}\boldsymbol{Q})^\top = \boldsymbol{Q}^\top \boldsymbol{P}^\top$, we obtain the following.
\begin{equation}(\boldsymbol{X}^{+})^\top = ((\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top)^\top = (\boldsymbol{X}^\top)^\top ((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^\top\label{eq:7-9-1-7}\end{equation}
$(\boldsymbol{X}^\top)^\top = \boldsymbol{X}$. Also, $\boldsymbol{X}^\top \boldsymbol{X}$ is a symmetric matrix, so its inverse is also symmetric. Therefore $((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$.
Combining \eqref{eq:7-9-1-7} with these properties, we obtain the following.
\begin{equation}(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-8}\end{equation}
Comparing \eqref{eq:7-9-1-5} and \eqref{eq:7-9-1-8}, we obtain the final result.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = 2 (\boldsymbol{X}^{+})^\top\label{eq:7-9-1-9}\end{equation}
7.9.2 Derivative with respect to the Pseudoinverse $\boldsymbol{X}^{+}$
Proof
We derive the relationship between $|\boldsymbol{X}^\top \boldsymbol{X}|$ and $|\boldsymbol{X}^{+}|$. We compute the determinant of $\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top$.
$\boldsymbol{X}^{+}$ is an $N \times M$ matrix, so when $N \neq M$ it is not a square matrix and the usual determinant is not defined. Therefore we consider $(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+}$.
From \eqref{eq:7-9-1-8} of 7.9.1, $(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$.
Computing $(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+}$, we obtain the following.
\begin{equation}(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+} = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top\label{eq:7-9-2-1}\end{equation}
Since $(\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = ((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^2 = (\boldsymbol{X}^\top \boldsymbol{X})^{-2}$, we obtain the following.
\begin{equation}(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+} = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-2} \boldsymbol{X}^\top\label{eq:7-9-2-2}\end{equation}
On the other hand, computing $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$, we obtain the following.
\begin{equation}\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top \cdot \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-2-3}\end{equation}
Simplifying the adjacent $\boldsymbol{X}^\top \boldsymbol{X}$ and $(\boldsymbol{X}^\top \boldsymbol{X})^{-1}$, we obtain the following.
\begin{equation}\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot \boldsymbol{X}^\top \boldsymbol{X} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-2-4}\end{equation}
Taking the determinant of both sides of \eqref{eq:7-9-2-4}. $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$ is an $N \times N$ square matrix.
\begin{equation}|\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top| = |(\boldsymbol{X}^\top \boldsymbol{X})^{-1}|\label{eq:7-9-2-5}\end{equation}
The determinant of an inverse matrix is the reciprocal of the original determinant. That is, $|(\boldsymbol{X}^\top \boldsymbol{X})^{-1}| = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|}$.
Taking the logarithm of both sides, we obtain the following.
\begin{equation}\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top| = \log \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} = -\log |\boldsymbol{X}^\top \boldsymbol{X}|\label{eq:7-9-2-6}\end{equation}
Therefore the following relation holds.
\begin{equation}\log |\boldsymbol{X}^\top \boldsymbol{X}| = -\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|\label{eq:7-9-2-7}\end{equation}
We consider differentiation with respect to $\boldsymbol{X}^{+}$. Viewing $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$ as a function of $\boldsymbol{X}^{+}$, it has the same form $\boldsymbol{Y}^\top \boldsymbol{Y}$ (with $\boldsymbol{Y} = (\boldsymbol{X}^{+})^\top$) as in 7.9.1.
From 7.9.1, $\dfrac{\partial \log |\boldsymbol{Y}^\top \boldsymbol{Y}|}{\partial \boldsymbol{Y}} = 2 \boldsymbol{Y} (\boldsymbol{Y}^\top \boldsymbol{Y})^{-1}$.
Setting $\boldsymbol{Y} = (\boldsymbol{X}^{+})^\top$, we have $\boldsymbol{Y}^\top = \boldsymbol{X}^{+}$ so $\boldsymbol{Y}^\top \boldsymbol{Y} = \boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$.
From \eqref{eq:7-9-2-7}, applying the chain rule, we obtain the following.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -\dfrac{\partial \log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|}{\partial \boldsymbol{X}^{+}}\label{eq:7-9-2-8}\end{equation}
We differentiate $\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|$ with respect to $\boldsymbol{X}^{+}$. From \eqref{eq:7-9-2-4}, $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$, and by a computation similar to 7.2, we obtain the following.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|}{\partial \boldsymbol{X}^{+}} = 2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top)^{-1} = 2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^\top \boldsymbol{X})\label{eq:7-9-2-9}\end{equation}
Substituting $(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$ from \eqref{eq:7-9-1-8} of 7.9.1, we obtain the following.
\begin{equation}2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^\top \boldsymbol{X}) = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} (\boldsymbol{X}^\top \boldsymbol{X}) = 2 \boldsymbol{X}\label{eq:7-9-2-10}\end{equation}
Substituting into \eqref{eq:7-9-2-8}, we obtain the following.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -2 \boldsymbol{X}\label{eq:7-9-2-11}\end{equation}
In the denominator layout, the result has the same shape as the variable being differentiated. Since $\boldsymbol{X}^{+}$ is $N \times M$, the result must also be $N \times M$. Since $\boldsymbol{X}$ is $M \times N$, we take the transpose to obtain the final result.
\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -2 \boldsymbol{X}^\top\label{eq:7-9-2-12}\end{equation}
7.10 Derivative of the Log Absolute Determinant $\log |\det(\boldsymbol{X})|$
Proof
We consider cases based on the sign of the determinant. For an invertible matrix $\boldsymbol{X}$, either $\det(\boldsymbol{X}) > 0$ or $\det(\boldsymbol{X}) < 0$ ($\det(\boldsymbol{X}) = 0$ contradicts invertibility).
Case 1: $\det(\boldsymbol{X}) > 0$
Since $|\det(\boldsymbol{X})| = \det(\boldsymbol{X})$, the following holds.
\begin{equation}\log |\det(\boldsymbol{X})| = \log \det(\boldsymbol{X})\label{eq:7-10-1}\end{equation}
From 7.2, the derivative of $\log \det(\boldsymbol{X})$ is as follows.
\begin{equation}\dfrac{\partial \log \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-2}\end{equation}
Therefore, when $\det(\boldsymbol{X}) > 0$, the following holds.
\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-3}\end{equation}
Case 2: $\det(\boldsymbol{X}) < 0$
Since $|\det(\boldsymbol{X})| = -\det(\boldsymbol{X})$, the following holds.
\begin{equation}\log |\det(\boldsymbol{X})| = \log(-\det(\boldsymbol{X}))\label{eq:7-10-4}\end{equation}
We differentiate $\log(-\det(\boldsymbol{X}))$ with respect to $\boldsymbol{X}$. This is a composite function, so we apply the chain rule. The outer function is $\log$ and the inner function is $-\det(\boldsymbol{X})$.
The derivative of the outer function $f(u) = \log(u)$ is $f'(u) = \dfrac{1}{u}$. Substituting $u = -\det(\boldsymbol{X})$, we obtain the following.
\begin{equation}f'(-\det(\boldsymbol{X})) = \dfrac{1}{-\det(\boldsymbol{X})}\label{eq:7-10-5}\end{equation}
We compute the derivative of the inner function $g(\boldsymbol{X}) = -\det(\boldsymbol{X})$. From 7.1, $\dfrac{\partial \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top}$, so the following holds.
\begin{equation}\dfrac{\partial (-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = -\det(\boldsymbol{X}) \boldsymbol{X}^{-\top}\label{eq:7-10-6}\end{equation}
Applying the chain rule and multiplying \eqref{eq:7-10-5} and \eqref{eq:7-10-6}, we obtain the following.
\begin{equation}\dfrac{\partial \log(-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = \dfrac{1}{-\det(\boldsymbol{X})} \cdot (-\det(\boldsymbol{X}) \boldsymbol{X}^{-\top})\label{eq:7-10-7}\end{equation}
$\dfrac{1}{-\det(\boldsymbol{X})}$ and $(-\det(\boldsymbol{X}))$ cancel, yielding the following.
\begin{equation}\dfrac{\partial \log(-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-8}\end{equation}
Therefore, when $\det(\boldsymbol{X}) < 0$ as well, the following holds.
\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-9}\end{equation}
Combining \eqref{eq:7-10-3} and \eqref{eq:7-10-9}, regardless of the sign of the determinant, we always obtain the same result as long as $\det(\boldsymbol{X}) \neq 0$.
\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-10}\end{equation}
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
- Murray, I. (2016). Differentiation of the Cholesky decomposition. arXiv:1602.07527.
- Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization Algorithms on Matrix Manifolds. Princeton University Press.
- Matrix calculus - Wikipedia