Proofs Chapter 7: Determinant Derivatives

Proofs: Determinant Derivatives

This chapter proves the matrix derivatives of the determinant det(X) and the log-determinant log det(X). Determinant derivatives are required in core areas of statistics and machine learning, including optimization of the log-likelihood of multivariate normal distributions (covariance matrix estimation), hyperparameter learning in Gaussian process regression, and computation of marginal likelihoods in Bayesian inference. The proofs are based on cofactor expansion and Jacobi's formula for the derivative of a determinant, deriving matrix-form results from component-wise calculations.

Prerequisites: Chapter 4 (Basic Matrix Derivative Formulas), Chapter 5 (Trace Derivatives). Chapters that use results from this chapter: Chapter 8 (Inverse Matrix Derivatives), Chapter 9 (Eigenvalue Derivatives).

Derivatives of Determinants

Prerequisites for this chapter
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
  • The matrix $\boldsymbol{X}$ is invertible ($\det \boldsymbol{X} \neq 0$), and the derivatives are defined on the open set $\{\boldsymbol{X} \in \mathbb{R}^{N \times N} : \det \boldsymbol{X} \neq 0\}$ where invertibility is preserved
  • For formulas involving non-square matrices, a full rank condition ($\mathrm{rank}(\boldsymbol{X}) = \min(M, N)$) is required
  • All formulas are based on the denominator layout

7.1 Derivative of the Determinant $|\boldsymbol{X}|$

Formula: $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix ($|\boldsymbol{X}| \neq 0$), $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top = (\boldsymbol{X}^\top)^{-1}$ is the transpose of the inverse
Proof

We recall the cofactor expansion of a determinant (A.13). The determinant can be expanded along any row $i$ as the sum of products of row elements and their corresponding cofactors.

\begin{equation}|\boldsymbol{X}| = \sum_{j=0}^{N-1} X_{ij} \tilde{X}_{ij}\label{eq:7-1-1}\end{equation}

Here $\tilde{X}_{ij}$ is the $(i, j)$ cofactor of $\boldsymbol{X}$.

We recall the definition of a cofactor. $\tilde{X}_{ij}$ is the determinant of the $(N-1) \times (N-1)$ submatrix obtained by removing the $i$-th row and $j$-th column from $\boldsymbol{X}$ (the minor), multiplied by the sign $(-1)^{i+j}$.

\begin{equation}\tilde{X}_{ij} = (-1)^{i+j} M_{ij}\label{eq:7-1-2}\end{equation}

Here $M_{ij}$ is the $(i, j)$ minor.

A key property is that the cofactor $\tilde{X}_{ij}$ does not involve $X_{ij}$ itself. This is because the minor $M_{ij}$ is computed by excluding the $i$-th row and $j$-th column.

We differentiate the determinant $|\boldsymbol{X}|$ with respect to the component $X_{ij}$. Using the cofactor expansion \eqref{eq:7-1-1}, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \dfrac{\partial}{\partial X_{ij}} \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik}\label{eq:7-1-3}\end{equation}

We decompose the derivative of the sum into the sum of derivatives of each term.

\begin{equation}\dfrac{\partial}{\partial X_{ij}} \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik} = \sum_{k=0}^{N-1} \dfrac{\partial}{\partial X_{ij}} (X_{ik} \tilde{X}_{ik})\label{eq:7-1-4}\end{equation}

We compute the derivative of each term. Since the cofactor $\tilde{X}_{ik}$ does not contain $X_{ij}$, it can be treated as a constant. Therefore the following holds.

\begin{equation}\dfrac{\partial}{\partial X_{ij}} (X_{ik} \tilde{X}_{ik}) = \tilde{X}_{ik} \dfrac{\partial X_{ik}}{\partial X_{ij}}\label{eq:7-1-5}\end{equation}

We compute the partial derivative of the component. $X_{ik}$ equals $X_{ij}$ only when $k = j$.

\begin{equation}\dfrac{\partial X_{ik}}{\partial X_{ij}} = \delta_{kj}\label{eq:7-1-6}\end{equation}

Here $\delta_{kj}$ is the Kronecker delta, which equals 1 when $k = j$ and 0 otherwise.

Combining \eqref{eq:7-1-5} and \eqref{eq:7-1-6}, we obtain the following.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} \delta_{kj} = \tilde{X}_{ij}\label{eq:7-1-7}\end{equation}

Since $\delta_{kj} = 1$ only when $k = j$, only the $k = j$ term survives in the sum.

Thus, the component-wise derivative of the determinant equals the cofactor.

\begin{equation}\dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \tilde{X}_{ij}\label{eq:7-1-8}\end{equation}

We consider the matrix derivative in the denominator layout. In the denominator layout, the $(i, j)$ component of the result corresponds to $\dfrac{\partial}{\partial X_{ij}}$.

\begin{equation}\left( \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| \right)_{ij} = \dfrac{\partial}{\partial X_{ij}} |\boldsymbol{X}| = \tilde{X}_{ij}\label{eq:7-1-9}\end{equation}

We write this in matrix form. We define the adjugate matrix $\text{adj}(\boldsymbol{X})$. The $(i, j)$ component of $\text{adj}(\boldsymbol{X})$ is $\tilde{X}_{ji}$ (note the swapped indices).

Writing the result of \eqref{eq:7-1-9} as a matrix, since the $(i, j)$ component is $\tilde{X}_{ij}$, this is the transpose of $\text{adj}(\boldsymbol{X})$, namely $\text{adj}(\boldsymbol{X})^\top$.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = \text{adj}(\boldsymbol{X})^\top\label{eq:7-1-10}\end{equation}

We rewrite using the relationship between the inverse matrix and the adjugate matrix. For an invertible matrix $\boldsymbol{X}$, the following relation holds.

\begin{equation}\boldsymbol{X}^{-1} = \dfrac{1}{|\boldsymbol{X}|} \text{adj}(\boldsymbol{X})\label{eq:7-1-11}\end{equation}

Solving \eqref{eq:7-1-11} for $\text{adj}(\boldsymbol{X})$, we obtain the following.

\begin{equation}\text{adj}(\boldsymbol{X}) = |\boldsymbol{X}| \boldsymbol{X}^{-1}\label{eq:7-1-12}\end{equation}

Taking the transpose of both sides.

\begin{equation}\text{adj}(\boldsymbol{X})^\top = (|\boldsymbol{X}| \boldsymbol{X}^{-1})^\top\label{eq:7-1-13}\end{equation}

The scalar $|\boldsymbol{X}|$ is unchanged by transposition. Also, the transpose of a product is $(AB)^\top = B^\top A^\top$, but since this is a product with a scalar, the order does not matter. Therefore the following holds.

\begin{equation}\text{adj}(\boldsymbol{X})^\top = |\boldsymbol{X}| (\boldsymbol{X}^{-1})^\top = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-1-14}\end{equation}

Combining \eqref{eq:7-1-10} and \eqref{eq:7-1-14}, we obtain the final result.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-1-15}\end{equation}

Remark: $\boldsymbol{X}^{-\top}$ is also written as $\boldsymbol{X}^{-T}$ and denotes the transpose of the inverse. Since $(\boldsymbol{X}^{-1})^\top = (\boldsymbol{X}^\top)^{-1}$ holds, computing in either order yields the same result. This formula is frequently used in optimization problems involving determinants.
Geometric interpretation: The determinant represents the volume scaling factor of a linear map. In $N$-dimensional space, the transformation by matrix $\boldsymbol{X}$ maps the unit hypercube to a parallelepiped of volume $|\det \boldsymbol{X}|$. From this perspective, the determinant derivative formula can be expressed using differential forms as $$d(\det \boldsymbol{X}) = \det \boldsymbol{X} \cdot \operatorname{tr}(\boldsymbol{X}^{-1} d\boldsymbol{X})$$ This is a left-invariant differential form on the general linear group $\mathrm{GL}(n)$ and is a standard result in Lie group theory. It corresponds to the fact that det is a group homomorphism from $\mathrm{GL}(n)$ to $\mathbb{R}^*$, and its differential (Lie algebra homomorphism) is the trace. This correspondence also appears through the exponential map in the relation $$\det(\exp \boldsymbol{A}) = \exp(\operatorname{tr} \boldsymbol{A})$$ If the eigenvalues of $\boldsymbol{A}$ are $\lambda_1, \ldots, \lambda_n$, then the eigenvalues of $\exp \boldsymbol{A}$ are $e^{\lambda_1}, \ldots, e^{\lambda_n}$, so $\det(\exp \boldsymbol{A}) = \prod_i e^{\lambda_i} = e^{\sum_i \lambda_i} = e^{\operatorname{tr} \boldsymbol{A}}$ follows. This means that the exponential map $\exp: \mathfrak{gl}(n) \to \mathrm{GL}(n)$ between the Lie group $\mathrm{GL}(n)$ and the Lie algebra $\mathfrak{gl}(n)$ preserves the homomorphic relationship between det and tr.

7.2 Derivative of the Log-Determinant $\log|\boldsymbol{X}|$

Formula: $\dfrac{\partial}{\partial \boldsymbol{X}} \log|\boldsymbol{X}| = \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix ($|\boldsymbol{X}| > 0$ assumed), $\log$ is the natural logarithm
Proof

The log-determinant is a composite function $\log(|\boldsymbol{X}|)$. The outer function is $\log$ and the inner function is $|\boldsymbol{X}|$.

We apply the chain rule for composite functions. The matrix derivative of a scalar function $f(g(\boldsymbol{X}))$ is as follows.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} f(g(\boldsymbol{X})) = f'(g(\boldsymbol{X})) \dfrac{\partial g}{\partial \boldsymbol{X}}\label{eq:7-2-1}\end{equation}

In this problem, $f(u) = \log(u)$ and $g(\boldsymbol{X}) = |\boldsymbol{X}|$.

Computing the derivative of the outer function $f(u) = \log(u)$, we obtain the following.

\begin{equation}f'(u) = \dfrac{1}{u}\label{eq:7-2-2}\end{equation}

Substituting $u = g(\boldsymbol{X}) = |\boldsymbol{X}|$, we obtain the following.

\begin{equation}f'(g(\boldsymbol{X})) = f'(|\boldsymbol{X}|) = \dfrac{1}{|\boldsymbol{X}|}\label{eq:7-2-3}\end{equation}

The derivative of the inner function is given by 7.1 as follows.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-2-4}\end{equation}

Substituting \eqref{eq:7-2-3} and \eqref{eq:7-2-4} into the chain rule \eqref{eq:7-2-1}, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} \log|\boldsymbol{X}| = \dfrac{1}{|\boldsymbol{X}|} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-2-5}\end{equation}

$\dfrac{1}{|\boldsymbol{X}|}$ and $|\boldsymbol{X}|$ cancel, yielding the final result.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} \log|\boldsymbol{X}| = \boldsymbol{X}^{-\top}\label{eq:7-2-6}\end{equation}

Remark: This formula is one of the most important results in determinant derivatives. It is frequently used in many statistical optimization problems, such as parameter estimation in Gaussian mixture models (GMM) and maximum likelihood estimation of covariance matrices. For the case $|\boldsymbol{X}| < 0$, see 7.10.
Connection to information geometry: If $\boldsymbol{\Sigma}$ is the covariance matrix of a multivariate normal distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, then $\log|\boldsymbol{\Sigma}|$ is directly related to the entropy. Specifically, the entropy of an $N$-dimensional normal distribution is $$H = \dfrac{1}{2}\log|\boldsymbol{\Sigma}| + \dfrac{N}{2}(1 + \log 2\pi)$$ Furthermore, the Fisher information matrix of the normal distribution family depends on the parameterization of the covariance matrix, and differentiation with respect to $\boldsymbol{\Sigma}^{-1}$ is central to its computation. This formula also appears in the study of metric structures on statistical manifolds in information geometry.

7.3 Derivative of a Power of the Determinant $|\boldsymbol{X}^n|$

Formula: $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^n| = n|\boldsymbol{X}^n| \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix, $n$ is an integer, $|\boldsymbol{X}^n| = |\boldsymbol{X}|^n$
Proof

We recall the multiplicative property of determinants (1.14). For any square matrices $\boldsymbol{A}$, $\boldsymbol{B}$, the following holds.

\begin{equation}|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|\label{eq:7-3-1}\end{equation}

Applying this property repeatedly, the determinant of $\boldsymbol{X}^n = \underbrace{\boldsymbol{X} \cdot \boldsymbol{X} \cdots \boldsymbol{X}}_{n \text{ times}}$ becomes the following.

\begin{equation}|\boldsymbol{X}^n| = |\underbrace{\boldsymbol{X} \cdot \boldsymbol{X} \cdots \boldsymbol{X}}_{n \text{ times}}| = \underbrace{|\boldsymbol{X}| \cdot |\boldsymbol{X}| \cdots |\boldsymbol{X}|}_{n \text{ times}} = |\boldsymbol{X}|^n\label{eq:7-3-2}\end{equation}

Therefore, the problem of differentiating $|\boldsymbol{X}^n|$ with respect to $\boldsymbol{X}$ reduces to differentiating $|\boldsymbol{X}|^n$ with respect to $\boldsymbol{X}$.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^n| = \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n\label{eq:7-3-3}\end{equation}

$|\boldsymbol{X}|^n$ is the $n$-th power of $|\boldsymbol{X}|$, which is a composite function. Let the outer function be $f(u) = u^n$ and the inner function be $g(\boldsymbol{X}) = |\boldsymbol{X}|$.

Computing the derivative of the outer function $f(u) = u^n$, we obtain the following.

\begin{equation}f'(u) = n u^{n-1}\label{eq:7-3-4}\end{equation}

Substituting $u = |\boldsymbol{X}|$, we obtain the following.

\begin{equation}f'(|\boldsymbol{X}|) = n |\boldsymbol{X}|^{n-1}\label{eq:7-3-5}\end{equation}

The derivative of the inner function is given by 7.1 as follows.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-3-6}\end{equation}

Applying the chain rule, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n = n |\boldsymbol{X}|^{n-1} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-3-7}\end{equation}

Using $|\boldsymbol{X}|^{n-1} \cdot |\boldsymbol{X}| = |\boldsymbol{X}|^n$ and simplifying, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^n = n |\boldsymbol{X}|^n \boldsymbol{X}^{-\top}\label{eq:7-3-8}\end{equation}

Since $|\boldsymbol{X}|^n = |\boldsymbol{X}^n|$ by \eqref{eq:7-3-2}, we obtain the final result.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^n| = n |\boldsymbol{X}^n| \boldsymbol{X}^{-\top}\label{eq:7-3-9}\end{equation}

Remark: For $n = -1$, we have $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^{-1}| = -|\boldsymbol{X}^{-1}| \boldsymbol{X}^{-\top}$. For $n = 2$, we have $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^2| = 2|\boldsymbol{X}^2| \boldsymbol{X}^{-\top}$.

7.4 Scalar Derivative of a Determinant (Jacobi's Formula)

Formula: $\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)$
Conditions: $\boldsymbol{Y}(x) \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix depending on a scalar $x$, each component $Y_{ij}(x)$ is a differentiable function of $x$
Proof

$\det(\boldsymbol{Y})$ is a function of all components $Y_{ij}$ of $\boldsymbol{Y}$. Since each component $Y_{ij}$ is a function of the scalar $x$, $\det(\boldsymbol{Y})$ is a composite function of $x$.

We apply the multivariable chain rule. Differentiating $\det(\boldsymbol{Y})$ with respect to $x$ yields the sum of contributions through all intermediate variables $Y_{ij}$.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} \dfrac{\partial \det(\boldsymbol{Y})}{\partial Y_{ij}} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-1}\end{equation}

From the proof of 7.1, the component-wise derivative of the determinant equals the cofactor.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial Y_{ij}} = \tilde{Y}_{ij}\label{eq:7-4-2}\end{equation}

Here $\tilde{Y}_{ij}$ is the $(i, j)$ cofactor of $\boldsymbol{Y}$.

Substituting \eqref{eq:7-4-2} into \eqref{eq:7-4-1}, we obtain the following.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i,j} \tilde{Y}_{ij} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-3}\end{equation}

We use the relationship between cofactors and the inverse matrix. As shown in the proof of 7.1, the following relation holds.

\begin{equation}\text{adj}(\boldsymbol{Y}) = \det(\boldsymbol{Y}) \boldsymbol{Y}^{-1}\label{eq:7-4-4}\end{equation}

Here the $(i, j)$ component of $\text{adj}(\boldsymbol{Y})$ is $\tilde{Y}_{ji}$.

Therefore $\tilde{Y}_{ij}$ is the $(j, i)$ component of $\text{adj}(\boldsymbol{Y})$, so it can be written as follows.

\begin{equation}\tilde{Y}_{ij} = (\text{adj}(\boldsymbol{Y}))_{ji} = \det(\boldsymbol{Y}) (Y^{-1})_{ji}\label{eq:7-4-5}\end{equation}

Substituting \eqref{eq:7-4-5} into \eqref{eq:7-4-3}, we obtain the following.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \sum_{i,j} \det(\boldsymbol{Y}) (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-6}\end{equation}

Since $\det(\boldsymbol{Y})$ does not depend on $i, j$, we factor it out of the sum.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \sum_{i,j} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-7}\end{equation}

Interchanging the order of summation and summing over $i$ first, we obtain the following.

\begin{equation}\sum_{i,j} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x} = \sum_{j=0}^{N-1} \sum_{i=0}^{N-1} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}\label{eq:7-4-8}\end{equation}

The inner sum $\displaystyle\sum_{i} (Y^{-1})_{ji} \dfrac{\partial Y_{ij}}{\partial x}$ corresponds to the definition of matrix multiplication. This is the $(j, j)$ component of $\boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x}$.

\begin{equation}\sum_{i=0}^{N-1} (Y^{-1})_{ji} \left(\dfrac{\partial \boldsymbol{Y}}{\partial x}\right)_{ij} = \left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)_{jj}\label{eq:7-4-9}\end{equation}

The outer sum $\sum_{j}$ is the sum of diagonal elements, i.e., the trace.

\begin{equation}\sum_{j=0}^{N-1} \left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)_{jj} = \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)\label{eq:7-4-10}\end{equation}

Combining \eqref{eq:7-4-7} and \eqref{eq:7-4-10}, we obtain the final result (Jacobi's formula).

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)\label{eq:7-4-11}\end{equation}

Remark: This formula is known as Jacobi's formula. Written in logarithmic derivative form, it becomes $\dfrac{\partial}{\partial x} \log \det(\boldsymbol{Y}) = \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial x} \right)$. This logarithmic derivative form frequently appears in parameter estimation of statistical models involving covariance matrices.
Source: C.G.J. Jacobi (1841) "De formatione et proprietatibus Determinantium", Journal für die reine und angewandte Mathematik 22, 285-318.

7.5 Properties of Component-wise Derivatives of the Determinant

Formula: $\displaystyle\sum_{k=0}^{N-1} \dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} X_{jk} = \delta_{ij} \det(\boldsymbol{X})$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix, $\delta_{ij}$ is the Kronecker delta (1 when $i = j$, 0 otherwise)
Proof

From the proof of 7.1, the component-wise derivative of the determinant equals the cofactor.

\begin{equation}\dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} = \tilde{X}_{ik}\label{eq:7-5-1}\end{equation}

Using this result, we rewrite the sum on the left-hand side in terms of cofactors.

\begin{equation}\sum_{k=0}^{N-1} \dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} X_{jk} = \sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk}\label{eq:7-5-2}\end{equation}

We consider the cases $i = j$ and $i \neq j$ separately.

Case $i = j$. The sum becomes the following.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{ik}\label{eq:7-5-3}\end{equation}

This is precisely the cofactor expansion of the determinant. The expansion of the determinant along the $i$-th row is as follows.

\begin{equation}\det(\boldsymbol{X}) = \sum_{k=0}^{N-1} X_{ik} \tilde{X}_{ik}\label{eq:7-5-4}\end{equation}

Therefore, when $i = j$, the following holds.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{ik} = \det(\boldsymbol{X})\label{eq:7-5-5}\end{equation}

Case $i \neq j$. The sum becomes the following.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk}\label{eq:7-5-6}\end{equation}

We consider the meaning of this sum. $\tilde{X}_{ik}$ is the determinant of the submatrix obtained by removing the $i$-th row and $k$-th column from $\boldsymbol{X}$, multiplied by a sign. Therefore $\tilde{X}_{ik}$ does not contain any elements from the $i$-th row of $\boldsymbol{X}$.

The sum in \eqref{eq:7-5-6} equals the cofactor expansion along the $i$-th row of the matrix $\boldsymbol{X}'$ obtained by replacing the $i$-th row of $\boldsymbol{X}$ with the $j$-th row.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk} = \det(\boldsymbol{X}')\label{eq:7-5-7}\end{equation}

Here $\boldsymbol{X}'$ is the matrix obtained by replacing the $i$-th row of $\boldsymbol{X}$ with the $j$-th row of $\boldsymbol{X}$ (note that the cofactors $\tilde{X}_{ik}$ do not depend on the $i$-th row).

In $\boldsymbol{X}'$, the $i$-th row and the $j$-th row are the same vector. That is, two rows are identical.

As a property of determinants, the determinant of a matrix with two identical rows (or columns) is 0.

\begin{equation}\det(\boldsymbol{X}') = 0\label{eq:7-5-8}\end{equation}

Therefore, when $i \neq j$, the following holds.

\begin{equation}\sum_{k=0}^{N-1} \tilde{X}_{ik} X_{jk} = 0\label{eq:7-5-9}\end{equation}

Expressing \eqref{eq:7-5-5} and \eqref{eq:7-5-9} uniformly using the Kronecker delta, we obtain the final result.

\begin{equation}\sum_{k=0}^{N-1} \dfrac{\partial \det(\boldsymbol{X})}{\partial X_{ik}} X_{jk} = \delta_{ij} \det(\boldsymbol{X})\label{eq:7-5-10}\end{equation}

Remark: Writing this formula in matrix form, if we denote the cofactor matrix as $\tilde{\boldsymbol{X}}$ (with $(i, j)$ component $\tilde{X}_{ij}$), then $\tilde{\boldsymbol{X}} \boldsymbol{X}^\top = \det(\boldsymbol{X}) \boldsymbol{I}$. This is equivalent to the inverse matrix formula via the adjugate: $\boldsymbol{X}^{-1} = \dfrac{1}{\det(\boldsymbol{X})} \text{adj}(\boldsymbol{X})$.

7.6 Second Derivative of the Determinant

Formula: $\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \det(\boldsymbol{Y}) \left[ \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'') + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 - \text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) \right]$
Conditions: $\boldsymbol{Y}(x) \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix depending on a scalar $x$, $\boldsymbol{Y}' = \dfrac{\partial \boldsymbol{Y}}{\partial x}$, $\boldsymbol{Y}'' = \dfrac{\partial^2 \boldsymbol{Y}}{\partial x^2}$, each component is twice differentiable with respect to $x$
Proof

We restate Jacobi's formula from 7.4.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-1}\end{equation}

We differentiate both sides once more with respect to $x$. The left-hand side becomes $\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2}$. The right-hand side is a product of two factors, so we apply the product rule (1.25).

\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \dfrac{\partial}{\partial x} \left[ \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') \right]\label{eq:7-6-2}\end{equation}

Applying the product rule (1.25) $(fg)' = f'g + fg'$ with $f = \det(\boldsymbol{Y})$ and $g = \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')$, we obtain the following.

\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \dfrac{\partial \det(\boldsymbol{Y})}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') + \det(\boldsymbol{Y}) \dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-3}\end{equation}

We compute the first term. Substituting Jacobi's formula \eqref{eq:7-6-1} for $\dfrac{\partial \det(\boldsymbol{Y})}{\partial x}$, we obtain the following.

\begin{equation}\dfrac{\partial \det(\boldsymbol{Y})}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') \cdot \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')\label{eq:7-6-4}\end{equation}

Simplifying the first term, we obtain the following.

\begin{equation}\det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2\label{eq:7-6-5}\end{equation}

We compute $\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')$ in the second term. Since the trace and differentiation commute, we first differentiate inside the trace.

\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \text{tr}\left( \dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') \right)\label{eq:7-6-6}\end{equation}

Applying the product rule (1.25) to the matrix product $\boldsymbol{Y}^{-1} \boldsymbol{Y}'$, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} \boldsymbol{Y}' + \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}'}{\partial x}\label{eq:7-6-7}\end{equation}

$\dfrac{\partial \boldsymbol{Y}'}{\partial x} = \boldsymbol{Y}''$.

We apply the scalar derivative formula for the inverse. Differentiating $\boldsymbol{Y}\boldsymbol{Y}^{-1} = \boldsymbol{I}$ with respect to $x$, the following holds.

\begin{equation}\boldsymbol{Y}' \boldsymbol{Y}^{-1} + \boldsymbol{Y} \dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{O}\label{eq:7-6-8}\end{equation}

Solving \eqref{eq:7-6-8} for $\dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x}$, we obtain the following.

\begin{equation}\dfrac{\partial \boldsymbol{Y}^{-1}}{\partial x} = -\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1}\label{eq:7-6-9}\end{equation}

Substituting \eqref{eq:7-6-9} into \eqref{eq:7-6-7}, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial x} (\boldsymbol{Y}^{-1} \boldsymbol{Y}') = -\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1} \boldsymbol{Y}' + \boldsymbol{Y}^{-1} \boldsymbol{Y}''\label{eq:7-6-10}\end{equation}

We simplify the first term. Since $\boldsymbol{Y}^{-1} \boldsymbol{Y}'$ appears twice, it can be written as follows.

\begin{equation}-\boldsymbol{Y}^{-1} \boldsymbol{Y}' \boldsymbol{Y}^{-1} \boldsymbol{Y}' = -(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2\label{eq:7-6-11}\end{equation}

Taking the trace of \eqref{eq:7-6-6}, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = \text{tr}(-(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 + \boldsymbol{Y}^{-1} \boldsymbol{Y}'')\label{eq:7-6-12}\end{equation}

Using the linearity of trace to decompose, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial x} \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}') = -\text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'')\label{eq:7-6-13}\end{equation}

Substituting the results of \eqref{eq:7-6-5} and \eqref{eq:7-6-13} into \eqref{eq:7-6-3}, we obtain the following.

\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \det(\boldsymbol{Y}) \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 + \det(\boldsymbol{Y}) \left[ -\text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'') \right]\label{eq:7-6-14}\end{equation}

Factoring out $\det(\boldsymbol{Y})$, we obtain the final result.

\begin{equation}\dfrac{\partial^2 \det(\boldsymbol{Y})}{\partial x^2} = \det(\boldsymbol{Y}) \left[ \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}'') + \text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2 - \text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2) \right]\label{eq:7-6-15}\end{equation}

Remark: $\text{tr}(\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2$ is the "square of the trace," while $\text{tr}((\boldsymbol{Y}^{-1} \boldsymbol{Y}')^2)$ is the "trace of the matrix squared." In general, these take different values. For example, if $\boldsymbol{A}$ is a diagonal matrix $\text{diag}(a_1, a_2)$, then $\text{tr}(\boldsymbol{A})^2 = (a_1 + a_2)^2$ but $\text{tr}(\boldsymbol{A}^2) = a_1^2 + a_2^2$.

7.7 Derivative of the Determinant of a Product $|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$

Formula: $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{A} \in \mathbb{R}^{N \times N}$, $\boldsymbol{B} \in \mathbb{R}^{N \times N}$ are constant matrices, $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an invertible matrix, $\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}$ is also invertible
Proof

By the multiplicative property of determinants (1.14), the determinant of a product of square matrices equals the product of their determinants.

\begin{equation}|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|\label{eq:7-7-1}\end{equation}

We apply this property to $\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}$. Viewing it as $(\boldsymbol{A}\boldsymbol{X})\boldsymbol{B}$, we obtain the following.

\begin{equation}|(\boldsymbol{A}\boldsymbol{X})\boldsymbol{B}| = |\boldsymbol{A}\boldsymbol{X}||\boldsymbol{B}|\label{eq:7-7-2}\end{equation}

Applying the same property to $|\boldsymbol{A}\boldsymbol{X}|$, we obtain the following.

\begin{equation}|\boldsymbol{A}\boldsymbol{X}| = |\boldsymbol{A}||\boldsymbol{X}|\label{eq:7-7-3}\end{equation}

Combining \eqref{eq:7-7-2} and \eqref{eq:7-7-3}, we obtain the following.

\begin{equation}|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{X}||\boldsymbol{B}|\label{eq:7-7-4}\end{equation}

We differentiate $|\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$ with respect to $\boldsymbol{X}$. Since $|\boldsymbol{A}|$ and $|\boldsymbol{B}|$ do not depend on $\boldsymbol{X}$, they can be factored out of the differentiation operator.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = \dfrac{\partial}{\partial \boldsymbol{X}} (|\boldsymbol{A}||\boldsymbol{X}||\boldsymbol{B}|) = |\boldsymbol{A}||\boldsymbol{B}| \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|\label{eq:7-7-5}\end{equation}

Substituting $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}$ from 7.1, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}| \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-7-6}\end{equation}

Using $|\boldsymbol{A}||\boldsymbol{B}||\boldsymbol{X}| = |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}|$ (from \eqref{eq:7-7-4}) and simplifying, we obtain the final result.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| = |\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}| \boldsymbol{X}^{-\top}\label{eq:7-7-7}\end{equation}

Remark: When $\boldsymbol{A} = \boldsymbol{B} = \boldsymbol{I}$, we have $|\boldsymbol{I}\boldsymbol{X}\boldsymbol{I}| = |\boldsymbol{X}|$, which reduces to $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}| = |\boldsymbol{X}| \boldsymbol{X}^{-\top}$ from 7.1.

7.8 Determinant of Quadratic Forms

7.8.1 Case where $\boldsymbol{X}$ is square and invertible
Formula: $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an invertible matrix, $\boldsymbol{A} \in \mathbb{R}^{N \times N}$ is a constant matrix
Proof

We decompose $|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|$ using the multiplicative property of determinants (1.14).

\begin{equation}|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{X}^\top||\boldsymbol{A}||\boldsymbol{X}|\label{eq:7-8-1-1}\end{equation}

By the determinant of the transpose (1.15), $|\boldsymbol{X}^\top| = |\boldsymbol{X}|$.

From \eqref{eq:7-8-1-1} and this property, we obtain the following.

\begin{equation}|\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{X}||\boldsymbol{A}||\boldsymbol{X}| = |\boldsymbol{A}||\boldsymbol{X}|^2\label{eq:7-8-1-2}\end{equation}

We differentiate with respect to $\boldsymbol{X}$. Since $|\boldsymbol{A}|$ is a constant, it can be factored out of the differentiation operator.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{A}| \dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^2\label{eq:7-8-1-3}\end{equation}

From 7.3 with $n = 2$, the following holds.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}|^2 = 2 |\boldsymbol{X}|^2 \boldsymbol{X}^{-\top}\label{eq:7-8-1-4}\end{equation}

Substituting \eqref{eq:7-8-1-4} into \eqref{eq:7-8-1-3}, we obtain the following.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = |\boldsymbol{A}| \cdot 2 |\boldsymbol{X}|^2 \boldsymbol{X}^{-\top}\label{eq:7-8-1-5}\end{equation}

Using $|\boldsymbol{A}||\boldsymbol{X}|^2 = |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|$ (from \eqref{eq:7-8-1-2}) and simplifying, we obtain the final result.

\begin{equation}\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{X}^{-\top}\label{eq:7-8-1-6}\end{equation}

Remark: When $\boldsymbol{A} = \boldsymbol{I}$, we have $|\boldsymbol{X}^\top \boldsymbol{X}| = |\boldsymbol{X}|^2$, and $\dfrac{\partial}{\partial \boldsymbol{X}} |\boldsymbol{X}^\top \boldsymbol{X}| = 2 |\boldsymbol{X}|^2 \boldsymbol{X}^{-\top}$.
7.8.2 Case where $\boldsymbol{X}$ is non-square and $\boldsymbol{A}$ is symmetric
Formula: $\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{M \times N}$ ($M \geq N$, $\text{rank}(\boldsymbol{X}) = N$), $\boldsymbol{A} \in \mathbb{R}^{M \times M}$ is a symmetric constant matrix ($\boldsymbol{A} = \boldsymbol{A}^\top$)
Proof

Let $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$. $\boldsymbol{Y}$ is an $N \times N$ matrix. Since $\boldsymbol{X}$ has full rank, $\boldsymbol{Y}$ is invertible (when $\boldsymbol{A}$ is positive definite).

Using Jacobi's formula from 7.4, differentiating $|\boldsymbol{Y}|$ with respect to component $X_{ij}$ gives the following.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right)\label{eq:7-8-2-1}\end{equation}

We differentiate $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$ with respect to $X_{ij}$. Since $\boldsymbol{X}$ appears in two places, $\boldsymbol{X}^\top$ and $\boldsymbol{X}$, we apply the product rule (1.25).

\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}} \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}}\label{eq:7-8-2-2}\end{equation}

We compute $\dfrac{\partial \boldsymbol{X}}{\partial X_{ij}}$. Differentiating the component $X_{kl}$ of $\boldsymbol{X}$ with respect to $X_{ij}$ gives 1 only when $k = i$ and $l = j$, and 0 otherwise.

\begin{equation}\left( \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)_{kl} = \delta_{ki} \delta_{lj}\label{eq:7-8-2-3}\end{equation}

We express this result as a matrix. Using the standard basis vectors $\boldsymbol{e}_i \in \mathbb{R}^M$ (with only the $i$-th component equal to 1) and $\boldsymbol{e}_j \in \mathbb{R}^N$ (with only the $j$-th component equal to 1), it can be written as follows.

\begin{equation}\dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} = \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-2-4}\end{equation}

This is an $M \times N$ matrix with only the $(i, j)$ component equal to 1 and all others 0.

We compute $\dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}}$. Taking the transpose swaps rows and columns, so we obtain the following.

\begin{equation}\dfrac{\partial (\boldsymbol{X}^\top)}{\partial X_{ij}} = \left( \dfrac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)^\top = (\boldsymbol{e}_i \boldsymbol{e}_j^\top)^\top = \boldsymbol{e}_j \boldsymbol{e}_i^\top\label{eq:7-8-2-5}\end{equation}

Substituting \eqref{eq:7-8-2-4} and \eqref{eq:7-8-2-5} into \eqref{eq:7-8-2-2}, we obtain the following.

\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-2-6}\end{equation}

We use the symmetry of $\boldsymbol{A}$. Since $\boldsymbol{A} = \boldsymbol{A}^\top$, we have $\boldsymbol{A} \boldsymbol{e}_i = \boldsymbol{A}^\top \boldsymbol{e}_i$.

Note that $\boldsymbol{e}_i^\top \boldsymbol{A} = (\boldsymbol{A}^\top \boldsymbol{e}_i)^\top = (\boldsymbol{A} \boldsymbol{e}_i)^\top$. $\boldsymbol{A} \boldsymbol{e}_i$ is the $i$-th column vector of $\boldsymbol{A}$.

We analyze the first term of \eqref{eq:7-8-2-6}. $\boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X}$ is an $N \times N$ matrix.

Substituting into \eqref{eq:7-8-2-1} and computing the trace, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) + \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right)\label{eq:7-8-2-7}\end{equation}

Applying the cyclic property of trace $\text{tr}(\boldsymbol{P}\boldsymbol{Q}\boldsymbol{R}) = \text{tr}(\boldsymbol{R}\boldsymbol{P}\boldsymbol{Q})$ to the first term, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) = \text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right)\label{eq:7-8-2-8}\end{equation}

$\boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j$ is a $1 \times 1$ scalar, so the trace equals the value itself.

\begin{equation}\text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right) = \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-9}\end{equation}

Similarly, applying the cyclic property of trace to the second term, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right) = \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right)\label{eq:7-8-2-10}\end{equation}

This is also a $1 \times 1$ scalar, so the trace equals the value itself.

\begin{equation}\text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right) = (\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji} = ((\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-\top}))_{ij}\label{eq:7-8-2-11}\end{equation}

We verify that $\boldsymbol{Y}$ is symmetric. Since $\boldsymbol{A} = \boldsymbol{A}^\top$, the following holds.

\begin{equation}\boldsymbol{Y}^\top = (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^\top = \boldsymbol{X}^\top \boldsymbol{A}^\top (\boldsymbol{X}^\top)^\top = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X} = \boldsymbol{Y}\label{eq:7-8-2-12}\end{equation}

Since $\boldsymbol{Y}$ is symmetric, $\boldsymbol{Y}^{-1}$ is also symmetric. Therefore $\boldsymbol{Y}^{-\top} = \boldsymbol{Y}^{-1}$.

Using the results of \eqref{eq:7-8-2-9} and \eqref{eq:7-8-2-11} and simplifying \eqref{eq:7-8-2-7}, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} + (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} = 2(\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-13}\end{equation}

Substituting into \eqref{eq:7-8-2-1}, we obtain the derivative with respect to component $X_{ij}$.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \cdot 2(\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} = 2|\boldsymbol{Y}| (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-2-14}\end{equation}

In the denominator layout, the $(i, j)$ component of the result corresponds to $\dfrac{\partial}{\partial X_{ij}}$. Therefore, in matrix form, we obtain the following.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial \boldsymbol{X}} = 2|\boldsymbol{Y}| \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1}\label{eq:7-8-2-15}\end{equation}

Substituting $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$, we obtain the final result.

\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1}\label{eq:7-8-2-16}\end{equation}

Remark: The coefficient 2 arises because, when $\boldsymbol{A}$ is symmetric, the contributions from the $\boldsymbol{X}^\top$ side and the $\boldsymbol{X}$ side are equal. Setting $\boldsymbol{A} = \boldsymbol{I}$ reduces to the result of 7.9.1.
7.8.3 Case where $\boldsymbol{X}$ is non-square and $\boldsymbol{A}$ is general
Formula: $\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \left( \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1} + \boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1} \right)$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{M \times N}$ ($M \geq N$, $\text{rank}(\boldsymbol{X}) = N$), $\boldsymbol{A} \in \mathbb{R}^{M \times M}$ is a general constant matrix (including the case $\boldsymbol{A} \neq \boldsymbol{A}^\top$)
Proof

Let $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$. $\boldsymbol{Y}$ is an $N \times N$ matrix.

Using Jacobi's formula from 7.4, we obtain the following.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right)\label{eq:7-8-3-1}\end{equation}

As in 7.8.2, differentiating $\boldsymbol{Y}$ with respect to $X_{ij}$ gives the following.

\begin{equation}\dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} + \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top\label{eq:7-8-3-2}\end{equation}

Splitting the trace into two terms and computing, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) + \text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right)\label{eq:7-8-3-3}\end{equation}

We compute the first term. Using the cyclic property of trace, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{e}_j \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \right) = \text{tr}\left( \boldsymbol{e}_i^\top \boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1} \boldsymbol{e}_j \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij}\label{eq:7-8-3-4}\end{equation}

We compute the second term. Similarly using the cyclic property of trace, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \boldsymbol{e}_j^\top \right) = \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{e}_i \right) = (\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji}\label{eq:7-8-3-5}\end{equation}

We transform the result of \eqref{eq:7-8-3-5}. $(\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji}$ equals the $(i, j)$ component of the transpose of the matrix $\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A}$.

\begin{equation}(\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})_{ji} = ((\boldsymbol{Y}^{-1} \boldsymbol{X}^\top \boldsymbol{A})^\top)_{ij} = (\boldsymbol{A}^\top \boldsymbol{X} \boldsymbol{Y}^{-\top})_{ij}\label{eq:7-8-3-6}\end{equation}

When $\boldsymbol{A}$ is a general matrix, $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$ is not necessarily symmetric. Therefore $\boldsymbol{Y}^{-\top} \neq \boldsymbol{Y}^{-1}$.

We compute $\boldsymbol{Y}^\top = (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^\top = \boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X}$. This has the form where $\boldsymbol{A}$ is replaced by $\boldsymbol{A}^\top$.

Therefore the following holds.

\begin{equation}\boldsymbol{Y}^{-\top} = (\boldsymbol{Y}^\top)^{-1} = (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1}\label{eq:7-8-3-7}\end{equation}

Combining \eqref{eq:7-8-3-4} and \eqref{eq:7-8-3-6}, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} \boldsymbol{Y}^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} \boldsymbol{Y}^{-\top})_{ij}\label{eq:7-8-3-8}\end{equation}

Substituting \eqref{eq:7-8-3-7}, we obtain the following.

\begin{equation}\text{tr}\left( \boldsymbol{Y}^{-1} \dfrac{\partial \boldsymbol{Y}}{\partial X_{ij}} \right) = (\boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1})_{ij}\label{eq:7-8-3-9}\end{equation}

Substituting into \eqref{eq:7-8-3-1}, we obtain the derivative with respect to component $X_{ij}$.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial X_{ij}} = |\boldsymbol{Y}| \left[ (\boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1})_{ij} + (\boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1})_{ij} \right]\label{eq:7-8-3-10}\end{equation}

Writing in matrix form using the denominator layout, we obtain the following.

\begin{equation}\dfrac{\partial |\boldsymbol{Y}|}{\partial \boldsymbol{X}} = |\boldsymbol{Y}| \left( \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1} + \boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1} \right)\label{eq:7-8-3-11}\end{equation}

Substituting $\boldsymbol{Y} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$, we obtain the final result.

\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}| \left( \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1} + \boldsymbol{A}^\top \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X})^{-1} \right)\label{eq:7-8-3-12}\end{equation}

Remark: When $\boldsymbol{A} = \boldsymbol{A}^\top$, we have $\boldsymbol{X}^\top \boldsymbol{A}^\top \boldsymbol{X} = \boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X}$, so the two terms become equal, yielding $2 \boldsymbol{A} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{A} \boldsymbol{X})^{-1}$, which agrees with 7.8.2.

7.9 Log-Determinant of the Gram Matrix

7.9.1 Derivative with respect to $\boldsymbol{X}$
Formula: $\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = 2 (\boldsymbol{X}^{+})^\top$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{M \times N}$ ($M \geq N$, $\text{rank}(\boldsymbol{X}) = N$), $\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top$ is the Moore-Penrose pseudoinverse (left inverse)
Proof

In 7.8.2, set $\boldsymbol{A} = \boldsymbol{I}$ (the identity matrix). Since $\boldsymbol{I}$ is symmetric, the formula from 7.8.2 applies.

From 7.8.2, the following holds.

\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X}| \boldsymbol{I} \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X})^{-1}\label{eq:7-9-1-1}\end{equation}

Substituting $\boldsymbol{I} \boldsymbol{X} = \boldsymbol{X}$ and $\boldsymbol{X}^\top \boldsymbol{I} \boldsymbol{X} = \boldsymbol{X}^\top \boldsymbol{X}$, we obtain the following.

\begin{equation}\dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 |\boldsymbol{X}^\top \boldsymbol{X}| \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-2}\end{equation}

To differentiate $\log |\boldsymbol{X}^\top \boldsymbol{X}|$ with respect to $\boldsymbol{X}$, we apply the chain rule. The outer function is $\log$ and the inner function is $|\boldsymbol{X}^\top \boldsymbol{X}|$.

Writing out the chain rule with $f(u) = \log(u)$ and $g(\boldsymbol{X}) = |\boldsymbol{X}^\top \boldsymbol{X}|$, the following holds.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} \dfrac{\partial |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}}\label{eq:7-9-1-3}\end{equation}

Substituting the result of \eqref{eq:7-9-1-2} into \eqref{eq:7-9-1-3}, we obtain the following.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} \cdot 2 |\boldsymbol{X}^\top \boldsymbol{X}| \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-4}\end{equation}

$\dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|}$ and $|\boldsymbol{X}^\top \boldsymbol{X}|$ cancel.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-5}\end{equation}

We derive the relationship with the Moore-Penrose pseudoinverse. When $\boldsymbol{X}$ has full rank ($\text{rank}(\boldsymbol{X}) = N$), the left inverse is defined as follows.

\begin{equation}\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top\label{eq:7-9-1-6}\end{equation}

We compute the transpose of $\boldsymbol{X}^{+}$. Since the transpose of a product is $(\boldsymbol{P}\boldsymbol{Q})^\top = \boldsymbol{Q}^\top \boldsymbol{P}^\top$, we obtain the following.

\begin{equation}(\boldsymbol{X}^{+})^\top = ((\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top)^\top = (\boldsymbol{X}^\top)^\top ((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^\top\label{eq:7-9-1-7}\end{equation}

$(\boldsymbol{X}^\top)^\top = \boldsymbol{X}$. Also, $\boldsymbol{X}^\top \boldsymbol{X}$ is a symmetric matrix, so its inverse is also symmetric. Therefore $((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$.

Combining \eqref{eq:7-9-1-7} with these properties, we obtain the following.

\begin{equation}(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-1-8}\end{equation}

Comparing \eqref{eq:7-9-1-5} and \eqref{eq:7-9-1-8}, we obtain the final result.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}} = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = 2 (\boldsymbol{X}^{+})^\top\label{eq:7-9-1-9}\end{equation}

Remark: $\boldsymbol{X}^\top \boldsymbol{X}$ is called the Gram matrix, whose components are the inner products between column vectors of $\boldsymbol{X}$. The log-determinant $\log |\boldsymbol{X}^\top \boldsymbol{X}|$ appears in regularization terms and model selection criteria.
7.9.2 Derivative with respect to the Pseudoinverse $\boldsymbol{X}^{+}$
Formula: $\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -2 \boldsymbol{X}^\top$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{M \times N}$ ($M \geq N$, $\text{rank}(\boldsymbol{X}) = N$), $\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top \in \mathbb{R}^{N \times M}$ is the pseudoinverse
Proof

We derive the relationship between $|\boldsymbol{X}^\top \boldsymbol{X}|$ and $|\boldsymbol{X}^{+}|$. We compute the determinant of $\boldsymbol{X}^{+} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top$.

$\boldsymbol{X}^{+}$ is an $N \times M$ matrix, so when $N \neq M$ it is not a square matrix and the usual determinant is not defined. Therefore we consider $(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+}$.

From \eqref{eq:7-9-1-8} of 7.9.1, $(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$.

Computing $(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+}$, we obtain the following.

\begin{equation}(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+} = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top\label{eq:7-9-2-1}\end{equation}

Since $(\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = ((\boldsymbol{X}^\top \boldsymbol{X})^{-1})^2 = (\boldsymbol{X}^\top \boldsymbol{X})^{-2}$, we obtain the following.

\begin{equation}(\boldsymbol{X}^{+})^\top \boldsymbol{X}^{+} = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-2} \boldsymbol{X}^\top\label{eq:7-9-2-2}\end{equation}

On the other hand, computing $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$, we obtain the following.

\begin{equation}\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top \cdot \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-2-3}\end{equation}

Simplifying the adjacent $\boldsymbol{X}^\top \boldsymbol{X}$ and $(\boldsymbol{X}^\top \boldsymbol{X})^{-1}$, we obtain the following.

\begin{equation}\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \cdot \boldsymbol{X}^\top \boldsymbol{X} \cdot (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\label{eq:7-9-2-4}\end{equation}

Taking the determinant of both sides of \eqref{eq:7-9-2-4}. $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$ is an $N \times N$ square matrix.

\begin{equation}|\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top| = |(\boldsymbol{X}^\top \boldsymbol{X})^{-1}|\label{eq:7-9-2-5}\end{equation}

The determinant of an inverse matrix is the reciprocal of the original determinant. That is, $|(\boldsymbol{X}^\top \boldsymbol{X})^{-1}| = \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|}$.

Taking the logarithm of both sides, we obtain the following.

\begin{equation}\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top| = \log \dfrac{1}{|\boldsymbol{X}^\top \boldsymbol{X}|} = -\log |\boldsymbol{X}^\top \boldsymbol{X}|\label{eq:7-9-2-6}\end{equation}

Therefore the following relation holds.

\begin{equation}\log |\boldsymbol{X}^\top \boldsymbol{X}| = -\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|\label{eq:7-9-2-7}\end{equation}

We consider differentiation with respect to $\boldsymbol{X}^{+}$. Viewing $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$ as a function of $\boldsymbol{X}^{+}$, it has the same form $\boldsymbol{Y}^\top \boldsymbol{Y}$ (with $\boldsymbol{Y} = (\boldsymbol{X}^{+})^\top$) as in 7.9.1.

From 7.9.1, $\dfrac{\partial \log |\boldsymbol{Y}^\top \boldsymbol{Y}|}{\partial \boldsymbol{Y}} = 2 \boldsymbol{Y} (\boldsymbol{Y}^\top \boldsymbol{Y})^{-1}$.

Setting $\boldsymbol{Y} = (\boldsymbol{X}^{+})^\top$, we have $\boldsymbol{Y}^\top = \boldsymbol{X}^{+}$ so $\boldsymbol{Y}^\top \boldsymbol{Y} = \boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top$.

From \eqref{eq:7-9-2-7}, applying the chain rule, we obtain the following.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -\dfrac{\partial \log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|}{\partial \boldsymbol{X}^{+}}\label{eq:7-9-2-8}\end{equation}

We differentiate $\log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|$ with respect to $\boldsymbol{X}^{+}$. From \eqref{eq:7-9-2-4}, $\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$, and by a computation similar to 7.2, we obtain the following.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|}{\partial \boldsymbol{X}^{+}} = 2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top)^{-1} = 2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^\top \boldsymbol{X})\label{eq:7-9-2-9}\end{equation}

Substituting $(\boldsymbol{X}^{+})^\top = \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1}$ from \eqref{eq:7-9-1-8} of 7.9.1, we obtain the following.

\begin{equation}2 (\boldsymbol{X}^{+})^\top (\boldsymbol{X}^\top \boldsymbol{X}) = 2 \boldsymbol{X} (\boldsymbol{X}^\top \boldsymbol{X})^{-1} (\boldsymbol{X}^\top \boldsymbol{X}) = 2 \boldsymbol{X}\label{eq:7-9-2-10}\end{equation}

Substituting into \eqref{eq:7-9-2-8}, we obtain the following.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -2 \boldsymbol{X}\label{eq:7-9-2-11}\end{equation}

In the denominator layout, the result has the same shape as the variable being differentiated. Since $\boldsymbol{X}^{+}$ is $N \times M$, the result must also be $N \times M$. Since $\boldsymbol{X}$ is $M \times N$, we take the transpose to obtain the final result.

\begin{equation}\dfrac{\partial \log |\boldsymbol{X}^\top \boldsymbol{X}|}{\partial \boldsymbol{X}^{+}} = -2 \boldsymbol{X}^\top\label{eq:7-9-2-12}\end{equation}

Remark: This formula is useful in optimization problems where the pseudoinverse is the independent variable. The negative sign arises because $|\boldsymbol{X}^\top \boldsymbol{X}|$ and $|\boldsymbol{X}^{+} (\boldsymbol{X}^{+})^\top|$ are reciprocals of each other.

7.10 Derivative of the Log Absolute Determinant $\log |\det(\boldsymbol{X})|$

Formula: $\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}$
Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an invertible matrix ($\det(\boldsymbol{X}) \neq 0$), $|\det(\boldsymbol{X})|$ is the absolute value of the determinant
Proof

We consider cases based on the sign of the determinant. For an invertible matrix $\boldsymbol{X}$, either $\det(\boldsymbol{X}) > 0$ or $\det(\boldsymbol{X}) < 0$ ($\det(\boldsymbol{X}) = 0$ contradicts invertibility).

Case 1: $\det(\boldsymbol{X}) > 0$

Since $|\det(\boldsymbol{X})| = \det(\boldsymbol{X})$, the following holds.

\begin{equation}\log |\det(\boldsymbol{X})| = \log \det(\boldsymbol{X})\label{eq:7-10-1}\end{equation}

From 7.2, the derivative of $\log \det(\boldsymbol{X})$ is as follows.

\begin{equation}\dfrac{\partial \log \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-2}\end{equation}

Therefore, when $\det(\boldsymbol{X}) > 0$, the following holds.

\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-3}\end{equation}

Case 2: $\det(\boldsymbol{X}) < 0$

Since $|\det(\boldsymbol{X})| = -\det(\boldsymbol{X})$, the following holds.

\begin{equation}\log |\det(\boldsymbol{X})| = \log(-\det(\boldsymbol{X}))\label{eq:7-10-4}\end{equation}

We differentiate $\log(-\det(\boldsymbol{X}))$ with respect to $\boldsymbol{X}$. This is a composite function, so we apply the chain rule. The outer function is $\log$ and the inner function is $-\det(\boldsymbol{X})$.

The derivative of the outer function $f(u) = \log(u)$ is $f'(u) = \dfrac{1}{u}$. Substituting $u = -\det(\boldsymbol{X})$, we obtain the following.

\begin{equation}f'(-\det(\boldsymbol{X})) = \dfrac{1}{-\det(\boldsymbol{X})}\label{eq:7-10-5}\end{equation}

We compute the derivative of the inner function $g(\boldsymbol{X}) = -\det(\boldsymbol{X})$. From 7.1, $\dfrac{\partial \det(\boldsymbol{X})}{\partial \boldsymbol{X}} = \det(\boldsymbol{X}) \boldsymbol{X}^{-\top}$, so the following holds.

\begin{equation}\dfrac{\partial (-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = -\det(\boldsymbol{X}) \boldsymbol{X}^{-\top}\label{eq:7-10-6}\end{equation}

Applying the chain rule and multiplying \eqref{eq:7-10-5} and \eqref{eq:7-10-6}, we obtain the following.

\begin{equation}\dfrac{\partial \log(-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = \dfrac{1}{-\det(\boldsymbol{X})} \cdot (-\det(\boldsymbol{X}) \boldsymbol{X}^{-\top})\label{eq:7-10-7}\end{equation}

$\dfrac{1}{-\det(\boldsymbol{X})}$ and $(-\det(\boldsymbol{X}))$ cancel, yielding the following.

\begin{equation}\dfrac{\partial \log(-\det(\boldsymbol{X}))}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-8}\end{equation}

Therefore, when $\det(\boldsymbol{X}) < 0$ as well, the following holds.

\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-9}\end{equation}

Combining \eqref{eq:7-10-3} and \eqref{eq:7-10-9}, regardless of the sign of the determinant, we always obtain the same result as long as $\det(\boldsymbol{X}) \neq 0$.

\begin{equation}\dfrac{\partial \log |\det(\boldsymbol{X})|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}\label{eq:7-10-10}\end{equation}

Remark: The formula $\dfrac{\partial \log |\boldsymbol{X}|}{\partial \boldsymbol{X}} = \boldsymbol{X}^{-\top}$ from 7.2 (where $|\boldsymbol{X}| = \det(\boldsymbol{X})$) implicitly assumes $\det(\boldsymbol{X}) > 0$. This formula is more general and holds whenever $\det(\boldsymbol{X}) \neq 0$. The result is the same regardless of the sign because the operation of multiplying by $-1$ exactly cancels between the derivative and the logarithm. In probabilistic models such as normalizing flows, $\log |\det(\boldsymbol{J})|$ (where $\boldsymbol{J}$ is the Jacobian) appears in the log-likelihood, making this formula important.

References

  • Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
  • Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
  • Murray, I. (2016). Differentiation of the Cholesky decomposition. arXiv:1602.07527.
  • Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization Algorithms on Matrix Manifolds. Princeton University Press.
  • Matrix calculus - Wikipedia