Proofs Chapter 8: Matrix Inverse Derivatives
Proofs: Derivatives of Matrix Inverse
This chapter proves the derivatives of matrix inverses. Inverse matrix derivatives are required in a wide range of applied mathematics and statistics, including the derivation of the Kalman filter gain matrix, variance computation in generalized least squares (GLS), and updating the posterior precision matrix in Bayesian inference. Starting from the basic approach of differentiating both sides of AA⁻¹ = I, we derive the scalar and component-wise derivative formulas for the inverse matrix.
Prerequisites: Chapter 4 (Basic Matrix Derivative Formulas), Chapter 7 (Determinant Derivatives). Chapters using results from this chapter: Chapter 9 (Eigenvalue Derivatives), Chapter 15 (Derivatives of Special Matrices).
8. Matrix Inverse Derivatives
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
- All formulas are based on the denominator layout
- Matrix $\boldsymbol{Y}$ is invertible ($\det \boldsymbol{Y} \neq 0$) and derivatives are defined on an open set where invertibility is preserved
- Formulas involving the pseudoinverse require full rank conditions
- Indices are 0-based ($i, j, k = 0, 1, \dots, N-1$), for direct correspondence with C/C++ implementations
8.1.0 Scalar Derivative of the Inverse (Fundamental)
Proof
We begin by recalling the definition of the inverse. An invertible matrix $\boldsymbol{Y}$ and its inverse $\boldsymbol{Y}^{-1}$ satisfy the following relation.
\begin{equation} \boldsymbol{Y} \boldsymbol{Y}^{-1} = \boldsymbol{I} \label{eq:8-1-1} \end{equation}
where $\boldsymbol{I}$ is the $N \times N$ identity matrix.
Differentiating both sides of \eqref{eq:8-1-1} with respect to scalar $x$. Since $\boldsymbol{I}$ does not depend on $x$, its derivative is the zero matrix $\boldsymbol{O}$.
\begin{equation} \frac{\partial}{\partial x} (\boldsymbol{Y} \boldsymbol{Y}^{-1}) = \frac{\partial \boldsymbol{I}}{\partial x} = \boldsymbol{O} \label{eq:8-1-2} \end{equation}
Applying the product rule (Leibniz rule, 1.25) to the left-hand side gives the following.
\begin{equation} \frac{\partial}{\partial x} (\boldsymbol{Y} \boldsymbol{Y}^{-1}) = \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} + \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} \label{eq:8-1-3} \end{equation}
Combining \eqref{eq:8-1-2} and \eqref{eq:8-1-3}, we obtain the following.
\begin{equation} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} + \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{O} \label{eq:8-1-4} \end{equation}
Isolating the term containing $\displaystyle\frac{\partial \boldsymbol{Y}^{-1}}{\partial x}$ on the right-hand side gives the following.
\begin{equation} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-5} \end{equation}
Left-multiplying both sides of \eqref{eq:8-1-5} by $\boldsymbol{Y}^{-1}$. Since $\boldsymbol{Y}$ is invertible, $\boldsymbol{Y}^{-1}$ exists.
\begin{equation} \boldsymbol{Y}^{-1} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{Y}^{-1} \left( - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \right) \label{eq:8-1-6} \end{equation}
Simplifying the left-hand side. Since $\boldsymbol{Y}^{-1} \boldsymbol{Y} = \boldsymbol{I}$ and multiplying by $\boldsymbol{I}$ leaves a matrix unchanged, we get the following.
\begin{equation} \boldsymbol{Y}^{-1} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{I} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} \label{eq:8-1-7} \end{equation}
Simplifying the right-hand side by factoring out the minus sign gives the following.
\begin{equation} \boldsymbol{Y}^{-1} \left( - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \right) = - \boldsymbol{Y}^{-1} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-8} \end{equation}
Combining \eqref{eq:8-1-7} and \eqref{eq:8-1-8}, we obtain the final result.
\begin{equation} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = - \boldsymbol{Y}^{-1} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-9} \end{equation}
8.1.1 Component-wise Derivative of the Inverse
Proof
We apply the formula from 8.1.0, replacing scalar $x$ with matrix entry $X_{ij}$. Setting $\boldsymbol{Y} = \boldsymbol{X}$ gives the following.
\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = - \boldsymbol{X}^{-1} \frac{\partial \boldsymbol{X}}{\partial X_{ij}} \boldsymbol{X}^{-1} \label{eq:8-2-1} \end{equation}
We compute $\displaystyle\frac{\partial \boldsymbol{X}}{\partial X_{ij}}$. Differentiating entry $(p, q)$ of $\boldsymbol{X}$ with respect to $X_{ij}$ gives 1 only when $(p, q) = (i, j)$ and 0 otherwise.
\begin{equation} \left( \frac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)_{pq} = \frac{\partial X_{pq}}{\partial X_{ij}} = \delta_{pi} \delta_{qj} \label{eq:8-2-2} \end{equation}
Expressing \eqref{eq:8-2-2} in matrix form. Using standard basis vectors $\boldsymbol{e}_i \in \mathbb{R}^N$ (with 1 in position $i$), the matrix with 1 only at position $(i, j)$ is the outer product $\boldsymbol{e}_i \boldsymbol{e}_j^\top$.
\begin{equation} \frac{\partial \boldsymbol{X}}{\partial X_{ij}} = \boldsymbol{e}_i \boldsymbol{e}_j^\top \label{eq:8-2-3} \end{equation}
Substituting \eqref{eq:8-2-3} into \eqref{eq:8-2-1} gives the following.
\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = - \boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \label{eq:8-2-4} \end{equation}
Computing $\boldsymbol{X}^{-1} \boldsymbol{e}_i$. Since $\boldsymbol{e}_i$ has 1 only in position $i$, $\boldsymbol{X}^{-1} \boldsymbol{e}_i$ is the $i$-th column of $\boldsymbol{X}^{-1}$.
\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i)_k = \sum_{m=0}^{N-1} (\boldsymbol{X}^{-1})_{km} (\boldsymbol{e}_i)_m = (\boldsymbol{X}^{-1})_{ki} \label{eq:8-2-5} \end{equation}
Computing $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$. Since $\boldsymbol{e}_j^\top$ has 1 only in position $j$, $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$ is the $j$-th row of $\boldsymbol{X}^{-1}$.
\begin{equation} (\boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_l = \sum_{m=0}^{N-1} (\boldsymbol{e}_j)_m (\boldsymbol{X}^{-1})_{ml} = (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-6} \end{equation}
Computing the $(k, l)$ entry of the matrix product $\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$ in \eqref{eq:8-2-4}. By associativity of matrix products and outer products, we get the following.
\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = (\boldsymbol{X}^{-1} \boldsymbol{e}_i)_k (\boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_l \label{eq:8-2-7} \end{equation}
Substituting \eqref{eq:8-2-5} and \eqref{eq:8-2-6} into \eqref{eq:8-2-7} gives the following.
\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-8} \end{equation}
Taking the $(k, l)$ entry of \eqref{eq:8-2-4} and substituting \eqref{eq:8-2-8}, we obtain the final result.
\begin{equation} \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} = - (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-9} \end{equation}
8.1.2 Quadratic Form with Inverse $\boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b}$
Proof
Write the scalar $f = \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b}$ in component form. Since $\boldsymbol{a}^\top$ is a row vector, $\boldsymbol{X}^{-1}$ is a matrix, and $\boldsymbol{b}$ is a column vector, the result is a scalar.
\begin{equation} f = \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{kl} b_l \label{eq:8-3-1} \end{equation}
Differentiating $f$ with respect to $X_{ij}$. Since $a_k$ and $b_l$ are constants, the differentiation acts only on $(\boldsymbol{X}^{-1})_{kl}$.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} b_l \label{eq:8-3-2} \end{equation}
From 8.1.1, the component-wise derivative of the inverse is as follows.
\begin{equation} \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} = - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-3-3} \end{equation}
Substituting \eqref{eq:8-3-3} into \eqref{eq:8-3-2} gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k \left( - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \right) b_l \label{eq:8-3-4} \end{equation}
Factoring out the minus sign gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} b_l \label{eq:8-3-5} \end{equation}
Separating the sums. The sums over $k$ and $l$ are independent.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \left( \sum_{k=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} \right) \left( \sum_{l=0}^{N-1} (\boldsymbol{X}^{-1})_{jl} b_l \right) \label{eq:8-3-6} \end{equation}
Computing the first parenthesized expression. $\sum_k a_k (\boldsymbol{X}^{-1})_{ki}$ is the $i$-th component of $\boldsymbol{a}^\top \boldsymbol{X}^{-1}$.
\begin{equation} \sum_{k=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} = (\boldsymbol{a}^\top \boldsymbol{X}^{-1})_i \label{eq:8-3-7} \end{equation}
Rewriting $\boldsymbol{a}^\top \boldsymbol{X}^{-1} = (\boldsymbol{X}^{-\top} \boldsymbol{a})^\top$, \eqref{eq:8-3-7} becomes the following.
\begin{equation} (\boldsymbol{a}^\top \boldsymbol{X}^{-1})_i = ((\boldsymbol{X}^{-1})^\top \boldsymbol{a})_i = (\boldsymbol{X}^{-\top} \boldsymbol{a})_i \label{eq:8-3-8} \end{equation}
Computing the second parenthesized expression. $\sum_l (\boldsymbol{X}^{-1})_{jl} b_l$ is the $j$-th component of $\boldsymbol{X}^{-1} \boldsymbol{b}$.
\begin{equation} \sum_{l=0}^{N-1} (\boldsymbol{X}^{-1})_{jl} b_l = (\boldsymbol{X}^{-1} \boldsymbol{b})_j \label{eq:8-3-9} \end{equation}
Substituting \eqref{eq:8-3-8} and \eqref{eq:8-3-9} into \eqref{eq:8-3-6} gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - (\boldsymbol{X}^{-\top} \boldsymbol{a})_i (\boldsymbol{X}^{-1} \boldsymbol{b})_j \label{eq:8-3-10} \end{equation}
In the denominator layout, the $(i, j)$ entry of $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}}$ is $\displaystyle\frac{\partial f}{\partial X_{ij}}$.
\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = \frac{\partial f}{\partial X_{ij}} \label{eq:8-3-11} \end{equation}
The right-hand side of \eqref{eq:8-3-10} can be written as an outer product. For $\boldsymbol{u}, \boldsymbol{v} \in \mathbb{R}^N$, $(\boldsymbol{u} \boldsymbol{v}^\top)_{ij} = u_i v_j$, so we get the following.
\begin{equation} - (\boldsymbol{X}^{-\top} \boldsymbol{a})_i (\boldsymbol{X}^{-1} \boldsymbol{b})_j = - (\boldsymbol{X}^{-\top} \boldsymbol{a} (\boldsymbol{X}^{-1} \boldsymbol{b})^\top)_{ij} \label{eq:8-3-12} \end{equation}
Substituting $(\boldsymbol{X}^{-1} \boldsymbol{b})^\top = \boldsymbol{b}^\top (\boldsymbol{X}^{-1})^\top = \boldsymbol{b}^\top \boldsymbol{X}^{-\top}$ gives the following.
\begin{equation} - (\boldsymbol{X}^{-\top} \boldsymbol{a} (\boldsymbol{X}^{-1} \boldsymbol{b})^\top)_{ij} = - (\boldsymbol{X}^{-\top} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-\top})_{ij} \label{eq:8-3-13} \end{equation}
Expressing in matrix form, we obtain the final result.
\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b} = - \boldsymbol{X}^{-\top} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-\top} \label{eq:8-3-14} \end{equation}
8.1.3 Determinant of the Inverse $|\boldsymbol{X}^{-1}|$
Proof
We establish the relationship between the determinant of the inverse and the original determinant. By the multiplicative property $|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|$, we have the following.
\begin{equation} |\boldsymbol{X}||\boldsymbol{X}^{-1}| = |\boldsymbol{X}\boldsymbol{X}^{-1}| = |\boldsymbol{I}| = 1 \label{eq:8-4-1} \end{equation}
Solving \eqref{eq:8-4-1} for $|\boldsymbol{X}^{-1}|$ gives the following.
\begin{equation} |\boldsymbol{X}^{-1}| = \frac{1}{|\boldsymbol{X}|} = |\boldsymbol{X}|^{-1} \label{eq:8-4-2} \end{equation}
Differentiating $|\boldsymbol{X}^{-1}| = |\boldsymbol{X}|^{-1}$ with respect to $\boldsymbol{X}$. This is a composite function, so we apply the chain rule with outer function $f(u) = u^{-1}$ and inner function $g(\boldsymbol{X}) = |\boldsymbol{X}|$.
The derivative of the outer function $f(u) = u^{-1}$ is the following.
\begin{equation} f'(u) = -u^{-2} = -\frac{1}{u^2} \label{eq:8-4-3} \end{equation}
Substituting $u = |\boldsymbol{X}|$ gives the following.
\begin{equation} f'(|\boldsymbol{X}|) = -|\boldsymbol{X}|^{-2} \label{eq:8-4-4} \end{equation}
The derivative of the inner function is given by 7.1 as follows.
\begin{equation} \frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}| \boldsymbol{X}^{-\top} \label{eq:8-4-5} \end{equation}
Applying the chain rule. Multiplying \eqref{eq:8-4-4} and \eqref{eq:8-4-5} gives the following.
\begin{equation} \frac{\partial |\boldsymbol{X}|^{-1}}{\partial \boldsymbol{X}} = f'(|\boldsymbol{X}|) \cdot \frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}|^{-2} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top} \label{eq:8-4-6} \end{equation}
Using $|\boldsymbol{X}|^{-2} \cdot |\boldsymbol{X}| = |\boldsymbol{X}|^{-1}$ to simplify gives the following.
\begin{equation} \frac{\partial |\boldsymbol{X}|^{-1}}{\partial \boldsymbol{X}} = -|\boldsymbol{X}|^{-1} \boldsymbol{X}^{-\top} \label{eq:8-4-7} \end{equation}
Substituting $|\boldsymbol{X}|^{-1} = |\boldsymbol{X}^{-1}|$ from \eqref{eq:8-4-2} gives the following.
\begin{equation} \frac{\partial |\boldsymbol{X}^{-1}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}^{-1}| \boldsymbol{X}^{-\top} \label{eq:8-4-8} \end{equation}
Rewriting $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top$, we obtain the final result.
\begin{equation} \frac{\partial |\boldsymbol{X}^{-1}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}^{-1}| (\boldsymbol{X}^{-1})^\top \label{eq:8-4-9} \end{equation}
8.1.4 Trace with Inverse $\text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$
Proof
We differentiate the scalar $f = \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$ with respect to $X_{ij}$. Since the trace is linear, it commutes with differentiation.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \frac{\partial}{\partial X_{ij}} \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}) = \text{tr}\left( \boldsymbol{A} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} \boldsymbol{B} \right) \label{eq:8-5-1} \end{equation}
From the proof of 8.1.1, the component-wise derivative of the inverse is as follows.
\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = -\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \label{eq:8-5-2} \end{equation}
Substituting \eqref{eq:8-5-2} into \eqref{eq:8-5-1} gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \text{tr}\left( \boldsymbol{A} \left( -\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \right) \boldsymbol{B} \right) \label{eq:8-5-3} \end{equation}
Factoring out the minus sign. Since the trace is linear with respect to scalar multiplication, we get the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \text{tr}\left( \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \right) \label{eq:8-5-4} \end{equation}
Using the cyclic property $\text{tr}(\boldsymbol{P}\boldsymbol{Q}\boldsymbol{R}) = \text{tr}(\boldsymbol{R}\boldsymbol{P}\boldsymbol{Q})$ to rearrange the order. Moving $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B}$ to the front gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \right) \label{eq:8-5-5} \end{equation}
$\boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i$ is a scalar ($1 \times 1$ matrix). The trace of a scalar equals the value itself.
\begin{equation} \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \right) = \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \label{eq:8-5-6} \end{equation}
$\boldsymbol{e}_j^\top \boldsymbol{M} \boldsymbol{e}_i$ extracts the $(j, i)$ entry of matrix $\boldsymbol{M}$. Setting $\boldsymbol{M} = \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1}$ gives the following.
\begin{equation} \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i = (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-7} \end{equation}
Combining \eqref{eq:8-5-5}, \eqref{eq:8-5-6}, and \eqref{eq:8-5-7} gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-8} \end{equation}
In the denominator layout, the $(i, j)$ entry of $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}}$ is $\displaystyle\frac{\partial f}{\partial X_{ij}}$, so we get the following.
\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-9} \end{equation}
Having the $(i, j)$ entry equal to $M_{ji}$ means the matrix is the transpose of the original.
\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = - ((\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})^\top)_{ij} \label{eq:8-5-10} \end{equation}
Expressing in matrix form, we obtain the final result.
\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}) = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})^\top \label{eq:8-5-11} \end{equation}
8.1.5 Trace of Sum Inverse $\text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1})$
Proof
Let $\boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$. We differentiate $f = \text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{X}$.
Differentiating $\boldsymbol{Y}$ with respect to $X_{ij}$. Since $\boldsymbol{A}$ is constant, $\displaystyle\frac{\partial \boldsymbol{A}}{\partial X_{ij}} = \boldsymbol{O}$, giving the following.
\begin{equation} \frac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \frac{\partial (\boldsymbol{X} + \boldsymbol{A})}{\partial X_{ij}} = \frac{\partial \boldsymbol{X}}{\partial X_{ij}} + \boldsymbol{O} = \boldsymbol{e}_i \boldsymbol{e}_j^\top \label{eq:8-6-1} \end{equation}
Substituting the coefficient matrices $\boldsymbol{A} \to \boldsymbol{I}$, $\boldsymbol{B} \to \boldsymbol{I}$, and variable matrix $\boldsymbol{X} \to \boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$ in the formula from 8.1.4 gives the following.
\begin{equation} \text{tr}(\boldsymbol{I} \cdot \boldsymbol{Y}^{-1} \cdot \boldsymbol{I}) = \text{tr}(\boldsymbol{Y}^{-1}) \label{eq:8-6-2} \end{equation}
From 8.1.4, the derivative of $\text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{Y}$ is the following.
\begin{equation} \frac{\partial}{\partial \boldsymbol{Y}} \text{tr}(\boldsymbol{Y}^{-1}) = - (\boldsymbol{Y}^{-1} \cdot \boldsymbol{I} \cdot \boldsymbol{I} \cdot \boldsymbol{Y}^{-1})^\top = - (\boldsymbol{Y}^{-2})^\top \label{eq:8-6-3} \end{equation}
Applying the chain rule. The derivative of $f = \text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{X}$ is the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k,l} \frac{\partial f}{\partial Y_{kl}} \frac{\partial Y_{kl}}{\partial X_{ij}} \label{eq:8-6-4} \end{equation}
From \eqref{eq:8-6-1}, $\displaystyle\frac{\partial Y_{kl}}{\partial X_{ij}} = \delta_{ki} \delta_{lj}$. Substituting into \eqref{eq:8-6-4} gives the following.
\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k,l} \frac{\partial f}{\partial Y_{kl}} \delta_{ki} \delta_{lj} = \frac{\partial f}{\partial Y_{ij}} \label{eq:8-6-5} \end{equation}
\eqref{eq:8-6-5} shows that when $\boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$, $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}} = \displaystyle\frac{\partial f}{\partial \boldsymbol{Y}}$ (because $\boldsymbol{A}$ is constant).
Combining \eqref{eq:8-6-3} and \eqref{eq:8-6-5}, we obtain the final result.
\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1}) = - ((\boldsymbol{X}+\boldsymbol{A})^{-2})^\top \label{eq:8-6-6} \end{equation}
8.1.6 Chain Rule for the Inverse
Proof
$J$ depends on $\boldsymbol{A}$ through $\boldsymbol{W} = \boldsymbol{A}^{-1}$. Writing the chain rule in component form gives the following.
\begin{equation} \frac{\partial J}{\partial A_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} \frac{\partial J}{\partial W_{kl}} \frac{\partial W_{kl}}{\partial A_{ij}} \label{eq:8-7-1} \end{equation}
From 8.1.1, the component-wise derivative of the inverse is as follows. Since $\boldsymbol{W} = \boldsymbol{A}^{-1}$, we match the indices accordingly.
\begin{equation} \frac{\partial W_{kl}}{\partial A_{ij}} = \frac{\partial (\boldsymbol{A}^{-1})_{kl}}{\partial A_{ij}} = - (\boldsymbol{A}^{-1})_{ki} (\boldsymbol{A}^{-1})_{jl} = -W_{ki} W_{jl} \label{eq:8-7-2} \end{equation}
Substituting \eqref{eq:8-7-2} into \eqref{eq:8-7-1} gives the following.
\begin{equation} \frac{\partial J}{\partial A_{ij}} = \sum_{k,l} \frac{\partial J}{\partial W_{kl}} (-W_{ki} W_{jl}) \label{eq:8-7-3} \end{equation}
Factoring out the minus sign and separating the sums gives the following.
\begin{equation} \frac{\partial J}{\partial A_{ij}} = - \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} W_{ki} \frac{\partial J}{\partial W_{kl}} W_{jl} \label{eq:8-7-4} \end{equation}
Interpreting this sum as a matrix product. Considering the sum over $k$, $\sum_k W_{ki} \displaystyle\frac{\partial J}{\partial W_{kl}}$, we get the following.
\begin{equation} \sum_{k=0}^{N-1} W_{ki} \frac{\partial J}{\partial W_{kl}} = \sum_{k=0}^{N-1} (W^\top)_{ik} \frac{\partial J}{\partial W_{kl}} = \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} \label{eq:8-7-5} \end{equation}
Next, considering the sum over $l$ gives the following.
\begin{equation} \sum_{l=0}^{N-1} \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} W_{jl} = \sum_{l=0}^{N-1} \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} (W^\top)_{lj} = \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{W}^\top \right)_{ij} \label{eq:8-7-6} \end{equation}
Combining \eqref{eq:8-7-4}, \eqref{eq:8-7-5}, and \eqref{eq:8-7-6} gives the following.
\begin{equation} \frac{\partial J}{\partial A_{ij}} = - \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{W}^\top \right)_{ij} \label{eq:8-7-7} \end{equation}
Substituting $\boldsymbol{W}^\top = \boldsymbol{A}^{-\top}$ from $\boldsymbol{W} = \boldsymbol{A}^{-1}$, we obtain the final result in matrix form.
\begin{equation} \frac{\partial J}{\partial \boldsymbol{A}} = - \boldsymbol{A}^{-\top} \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{A}^{-\top} \label{eq:8-7-8} \end{equation}
8.1.7 Leontief Inverse Derivative
Proof
Define the Leontief inverse as $\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$. We differentiate $\boldsymbol{L}$ with respect to $A_{ij}$.
Let $\boldsymbol{X} = \boldsymbol{I} - \boldsymbol{A}$. Then $\boldsymbol{L} = \boldsymbol{X}^{-1}$.
Applying the chain rule. The derivative of $\boldsymbol{L}$ with respect to $A_{ij}$ can be computed through $\boldsymbol{X}$.
\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \sum_{k,l} \frac{\partial \boldsymbol{L}}{\partial X_{kl}} \frac{\partial X_{kl}}{\partial A_{ij}} \label{eq:8-8-1} \end{equation}
Since $X_{kl} = (\boldsymbol{I} - \boldsymbol{A})_{kl} = \delta_{kl} - A_{kl}$, differentiating with respect to $A_{ij}$ gives the following.
\begin{equation} \frac{\partial X_{kl}}{\partial A_{ij}} = \frac{\partial (\delta_{kl} - A_{kl})}{\partial A_{ij}} = 0 - \frac{\partial A_{kl}}{\partial A_{ij}} \label{eq:8-8-2} \end{equation}
Since $\displaystyle\frac{\partial A_{kl}}{\partial A_{ij}} = \delta_{ki} \delta_{lj}$ (equals 1 only when $(k,l) = (i,j)$), we get the following.
\begin{equation} \frac{\partial X_{kl}}{\partial A_{ij}} = -\delta_{ki} \delta_{lj} \label{eq:8-8-3} \end{equation}
Substituting \eqref{eq:8-8-3} into \eqref{eq:8-8-1} gives the following.
\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \sum_{k,l} \frac{\partial \boldsymbol{L}}{\partial X_{kl}} (-\delta_{ki} \delta_{lj}) \label{eq:8-8-4} \end{equation}
Since $\delta_{ki} \delta_{lj} = 1$ only when $(k,l) = (i,j)$, only one term survives from the sum.
\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = -\frac{\partial \boldsymbol{L}}{\partial X_{ij}} \label{eq:8-8-5} \end{equation}
Computing $\displaystyle\frac{\partial \boldsymbol{L}}{\partial X_{ij}}$. Since $\boldsymbol{L} = \boldsymbol{X}^{-1}$, from the proof of 8.1.1 we have the following.
\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = -\boldsymbol{X}^{-1} \boldsymbol{E}_{ij} \boldsymbol{X}^{-1} \label{eq:8-8-6} \end{equation}
where $\boldsymbol{E}_{ij} = \boldsymbol{e}_i \boldsymbol{e}_j^\top$ is the matrix with 1 only at position $(i,j)$.
Substituting $\boldsymbol{X}^{-1} = \boldsymbol{L}$ gives the following.
\begin{equation} \frac{\partial \boldsymbol{L}}{\partial X_{ij}} = -\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L} \label{eq:8-8-7} \end{equation}
Substituting \eqref{eq:8-8-7} into \eqref{eq:8-8-5} gives the following.
\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = -(-\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L}) \label{eq:8-8-8} \end{equation}
The minus signs cancel, giving the final result.
\begin{equation} \frac{\partial}{\partial A_{ij}} (\boldsymbol{I} - \boldsymbol{A})^{-1} = \boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L} \label{eq:8-8-9} \end{equation}
8.1.8 Trace of Leontief Inverse Derivative
Proof
Let $\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$. We differentiate $f = \text{tr}(\boldsymbol{L})$ with respect to $A_{ij}$.
Since the trace is the sum of diagonal entries, it can be written as follows.
\begin{equation} f = \text{tr}(\boldsymbol{L}) = \sum_{k=0}^{N-1} L_{kk} \label{eq:8-9-1} \end{equation}
Differentiating $f$ with respect to $A_{ij}$ gives the following.
\begin{equation} \frac{\partial f}{\partial A_{ij}} = \frac{\partial}{\partial A_{ij}} \sum_{k=0}^{N-1} L_{kk} = \sum_{k=0}^{N-1} \frac{\partial L_{kk}}{\partial A_{ij}} \label{eq:8-9-2} \end{equation}
From 8.1.7, $\displaystyle\frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L}$. We compute the $(k, k)$ entry of this.
Writing $(\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk}$ in component form gives the following.
\begin{equation} (\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk} = \sum_{p=0}^{N-1} \sum_{q=0}^{N-1} L_{kp} (\boldsymbol{E}_{ij})_{pq} L_{qk} \label{eq:8-9-3} \end{equation}
Since $(\boldsymbol{E}_{ij})_{pq} = \delta_{pi} \delta_{qj}$ (equals 1 only when $(p,q) = (i,j)$), we get the following.
\begin{equation} (\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk} = \sum_{p,q} L_{kp} \delta_{pi} \delta_{qj} L_{qk} = L_{ki} L_{jk} \label{eq:8-9-4} \end{equation}
Substituting \eqref{eq:8-9-4} into \eqref{eq:8-9-2} gives the following.
\begin{equation} \frac{\partial f}{\partial A_{ij}} = \sum_{k=0}^{N-1} L_{ki} L_{jk} \label{eq:8-9-5} \end{equation}
Interpreting this sum as a matrix product. $\sum_k L_{ki} L_{jk}$ is the inner product of the $i$-th column and the $j$-th row of $\boldsymbol{L}$.
\begin{equation} \sum_{k=0}^{N-1} L_{ki} L_{jk} = \sum_{k=0}^{N-1} (L^\top)_{ik} L_{jk} = (\boldsymbol{L}^\top \boldsymbol{L}^\top)_{ij} = (\boldsymbol{L}^2)^\top_{ij} \label{eq:8-9-6} \end{equation}
Alternatively, $\sum_k L_{ki} L_{jk} = \sum_k L_{jk} L_{ki} = (\boldsymbol{L} \boldsymbol{L})_{ji} = (\boldsymbol{L}^2)_{ji}$.
In the denominator layout, $\left( \displaystyle\frac{\partial f}{\partial \boldsymbol{A}} \right)_{ij} = \displaystyle\frac{\partial f}{\partial A_{ij}}$, so having the $(i, j)$ entry equal $(\boldsymbol{L}^2)_{ji}$ implies transposition.
\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{A}} \right)_{ij} = (\boldsymbol{L}^2)_{ji} = ((\boldsymbol{L}^2)^\top)_{ij} \label{eq:8-9-7} \end{equation}
Expressing in matrix form, we obtain the final result.
\begin{equation} \frac{\partial}{\partial \boldsymbol{A}} \text{tr}((\boldsymbol{I} - \boldsymbol{A})^{-1}) = (\boldsymbol{L}^2)^\top = ((\boldsymbol{I} - \boldsymbol{A})^{-2})^\top \label{eq:8-9-8} \end{equation}
Moore-Penrose Pseudoinverse Derivatives
Derivative formulas for the pseudoinverse (generalized inverse). Used in least squares and inverse kinematics in robotics.
8.2.1 Moore-Penrose Pseudoinverse Derivative (General Form)
Proof
Differentiating the Moore-Penrose condition $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}$ gives the following.
\begin{equation} (d\boldsymbol{X})\boldsymbol{X}^+\boldsymbol{X} + \boldsymbol{X}(d\boldsymbol{X}^+)\boldsymbol{X} + \boldsymbol{X}\boldsymbol{X}^+(d\boldsymbol{X}) = d\boldsymbol{X} \label{eq:8-10-1} \end{equation}
Similarly, differentiating $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{X}^+$ gives the following.
\begin{equation} (d\boldsymbol{X}^+)\boldsymbol{X}\boldsymbol{X}^+ + \boldsymbol{X}^+(d\boldsymbol{X})\boldsymbol{X}^+ + \boldsymbol{X}^+\boldsymbol{X}(d\boldsymbol{X}^+) = d\boldsymbol{X}^+ \label{eq:8-10-2} \end{equation}
Combining these conditions with the Hermitian conditions $(\boldsymbol{X}\boldsymbol{X}^+)^\top = \boldsymbol{X}\boldsymbol{X}^+$ and $(\boldsymbol{X}^+\boldsymbol{X})^\top = \boldsymbol{X}^+\boldsymbol{X}$, we solve for $d\boldsymbol{X}^+$ to obtain the Golub-Pereyra (1973) formula.
8.2.2 Pseudoinverse Derivative (Full Column Rank)
Proof
For full column rank, $\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$ (left inverse).
Since $\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{I}_n$, the null space projection $\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{O}$.
The third term in 8.2.1 vanishes, giving the following.
\begin{equation} d\boldsymbol{X}^+ = -\boldsymbol{X}^+ (d\boldsymbol{X}) \boldsymbol{X}^+ + \boldsymbol{X}^{+\top}\boldsymbol{X}^\top (d\boldsymbol{X})^\top (\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+) \label{eq:8-11-1} \end{equation}
Simplifying $\boldsymbol{X}^{+\top}\boldsymbol{X}^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1} \cdot \boldsymbol{X}^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$ yields the formula.
8.2.3 Pseudoinverse Derivative (Full Row Rank)
Proof
For full row rank, $\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$ (right inverse).
Since $\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{I}_m$, the null space projection $\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{O}$.
The second term in 8.2.1 vanishes, giving the following.
\begin{equation} d\boldsymbol{X}^+ = -\boldsymbol{X}^+ (d\boldsymbol{X}) \boldsymbol{X}^+ + (\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X})(d\boldsymbol{X})^\top \boldsymbol{X}^{+\top}\boldsymbol{X}^+ \label{eq:8-12-1} \end{equation}
Substituting $\boldsymbol{X}^{+\top}\boldsymbol{X}^+ = (\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$ yields the formula.
8.2.4 Time Derivative of the Pseudoinverse
Proof
Obtained by replacing $d\boldsymbol{X}$ with $\dot{\boldsymbol{X}} dt$ in 8.2.1 and dividing by $dt$.
In robotics, this is the time derivative of $\boldsymbol{J}^+$, used in acceleration-level inverse kinematics.
\begin{equation} \ddot{\boldsymbol{q}} = \boldsymbol{J}^+\ddot{\boldsymbol{x}} + \dot{\boldsymbol{J}}^+\dot{\boldsymbol{x}} + (\boldsymbol{I} - \boldsymbol{J}^+\boldsymbol{J})\ddot{\boldsymbol{q}}_0 \label{eq:8-13-1} \end{equation}
8.2.5 Derivation of the Right Inverse
Derivation
When $\boldsymbol{X}$ has full row rank, $\boldsymbol{X}\boldsymbol{X}^\top$ is an $m \times m$ positive definite matrix and hence invertible.
Setting $\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$, we verify the four Moore-Penrose conditions.
(1) $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X} = \boldsymbol{X}$ ✓
(2) $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} = \boldsymbol{X}^+$ ✓
(3) $(\boldsymbol{X}\boldsymbol{X}^+)^\top = (\boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1})^\top = \boldsymbol{I}^\top = \boldsymbol{I} = \boldsymbol{X}\boldsymbol{X}^+$ ✓
(4) $(\boldsymbol{X}^+\boldsymbol{X})^\top = (\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X})^\top = \boldsymbol{X}^\top((\boldsymbol{X}\boldsymbol{X}^\top)^{-1})^\top\boldsymbol{X} = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X} = \boldsymbol{X}^+\boldsymbol{X}$ ✓
All four conditions are satisfied, confirming this is the Moore-Penrose pseudoinverse. $\square$
8.2.6 Derivation of the Left Inverse
Derivation
When $\boldsymbol{X}$ has full column rank, $\boldsymbol{X}^\top\boldsymbol{X}$ is an $n \times n$ positive definite matrix and hence invertible.
Setting $\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$, we verify the four Moore-Penrose conditions.
(1) $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X} = \boldsymbol{X}$ ✓
(2) $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = \boldsymbol{X}^+$ ✓
(3) $(\boldsymbol{X}\boldsymbol{X}^+)^\top = (\boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top)^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = \boldsymbol{X}\boldsymbol{X}^+$ ✓
(4) $(\boldsymbol{X}^+\boldsymbol{X})^\top = ((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X})^\top = \boldsymbol{I}^\top = \boldsymbol{I} = \boldsymbol{X}^+\boldsymbol{X}$ ✓
All four conditions are satisfied, confirming this is the Moore-Penrose pseudoinverse. $\square$
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
- Matrix calculus - Wikipedia