Why does the matrix inverse derivative dY⁻¹/dx have a minus sign?

Differentiating both sides of YY⁻¹ = I with respect to scalar x and applying the product rule (Leibniz rule) gives (dY/dx)Y⁻¹ + Y(dY⁻¹/dx) = O. Solving for dY⁻¹/dx and left-multiplying by Y⁻¹ introduces the minus sign, yielding dY⁻¹/dx = −Y⁻¹(dY/dx)Y⁻¹.

What is the component-wise derivative formula ∂(X⁻¹)_{kl}/∂X_{ij}?

∂(X⁻¹)_{kl}/∂X_{ij} = −(X⁻¹)_{ki}(X⁻¹)_{jl}. This is derived by substituting ∂X/∂X_{ij} = e_i e_jᵀ (outer product of standard basis vectors) into the scalar inverse derivative formula and extracting the (k,l) component of the matrix product.

What is the matrix derivative of the quadratic form aᵀX⁻¹b?

∂(aᵀX⁻¹b)/∂X = −X⁻ᵀabᵀX⁻ᵀ. This is derived by substituting the component-wise derivative formula, separating the sums, and expressing the result as an outer product. When X is symmetric, X⁻ᵀ = X⁻¹ and the formula simplifies.

What is the general form of the Moore-Penrose pseudoinverse derivative?

The general form is dX⁺ = −X⁺(dX)X⁺ + X⁺ᵀXᵀ(dX)ᵀ(I − XX⁺) + (I − X⁺X)(dX)ᵀX⁺ᵀX⁺ (Golub-Pereyra, 1973). For full column rank, the third term vanishes; for full row rank, the second term vanishes. For invertible matrices, only the first term remains, recovering the standard inverse derivative.

Why is there no minus sign in the Leontief inverse derivative?

For the Leontief inverse L = (I − A)⁻¹, differentiating with respect to A_{ij} gives ∂L/∂A_{ij} = L E_{ij} L with no minus sign. Setting X = I − A, we get ∂X/∂A_{ij} = −δ_{ki}δ_{lj}, and this negative sign cancels with the minus sign from the inverse derivative formula. This result is used in input-output analysis in economics.

Proofs Chapter 8: Matrix Inverse Derivatives

8.1.0 Scalar Derivative of the Inverse (Fundamental)

Formula: $\displaystyle\frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = - \boldsymbol{Y}^{-1} \displaystyle\frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1}$

Conditions: $\boldsymbol{Y}(x) \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix depending on scalar $x$, each entry $Y_{ij}(x)$ is a differentiable function of $x$

Proof

We begin by recalling the definition of the inverse. An invertible matrix $\boldsymbol{Y}$ and its inverse $\boldsymbol{Y}^{-1}$ satisfy the following relation.

\begin{equation} \boldsymbol{Y} \boldsymbol{Y}^{-1} = \boldsymbol{I} \label{eq:8-1-1} \end{equation}

where $\boldsymbol{I}$ is the $N \times N$ identity matrix.

Differentiating both sides of \eqref{eq:8-1-1} with respect to scalar $x$. Since $\boldsymbol{I}$ does not depend on $x$, its derivative is the zero matrix $\boldsymbol{O}$.

\begin{equation} \frac{\partial}{\partial x} (\boldsymbol{Y} \boldsymbol{Y}^{-1}) = \frac{\partial \boldsymbol{I}}{\partial x} = \boldsymbol{O} \label{eq:8-1-2} \end{equation}

Applying the product rule (Leibniz rule, 1.25) to the left-hand side gives the following.

\begin{equation} \frac{\partial}{\partial x} (\boldsymbol{Y} \boldsymbol{Y}^{-1}) = \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} + \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} \label{eq:8-1-3} \end{equation}

Combining \eqref{eq:8-1-2} and \eqref{eq:8-1-3}, we obtain the following.

\begin{equation} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} + \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{O} \label{eq:8-1-4} \end{equation}

Isolating the term containing $\displaystyle\frac{\partial \boldsymbol{Y}^{-1}}{\partial x}$ on the right-hand side gives the following.

\begin{equation} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-5} \end{equation}

Left-multiplying both sides of \eqref{eq:8-1-5} by $\boldsymbol{Y}^{-1}$. Since $\boldsymbol{Y}$ is invertible, $\boldsymbol{Y}^{-1}$ exists.

\begin{equation} \boldsymbol{Y}^{-1} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{Y}^{-1} \left( - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \right) \label{eq:8-1-6} \end{equation}

Simplifying the left-hand side. Since $\boldsymbol{Y}^{-1} \boldsymbol{Y} = \boldsymbol{I}$ and multiplying by $\boldsymbol{I}$ leaves a matrix unchanged, we get the following.

\begin{equation} \boldsymbol{Y}^{-1} \boldsymbol{Y} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \boldsymbol{I} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} \label{eq:8-1-7} \end{equation}

Simplifying the right-hand side by factoring out the minus sign gives the following.

\begin{equation} \boldsymbol{Y}^{-1} \left( - \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \right) = - \boldsymbol{Y}^{-1} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-8} \end{equation}

Combining \eqref{eq:8-1-7} and \eqref{eq:8-1-8}, we obtain the final result.

\begin{equation} \frac{\partial \boldsymbol{Y}^{-1}}{\partial x} = - \boldsymbol{Y}^{-1} \frac{\partial \boldsymbol{Y}}{\partial x} \boldsymbol{Y}^{-1} \label{eq:8-1-9} \end{equation}

Remark: This is the most fundamental formula for differentiating the matrix inverse. Note the minus sign and that $\boldsymbol{Y}^{-1}$ appears on both sides of $\displaystyle\frac{\partial \boldsymbol{Y}}{\partial x}$. This can be applied to differentiate complex inverses such as $(\boldsymbol{X}^\top \boldsymbol{X})^{-1}$ or $(\boldsymbol{A} + \boldsymbol{B}\boldsymbol{X}\boldsymbol{C})^{-1}$.

8.1.1 Component-wise Derivative of the Inverse

Formula: $\displaystyle\frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} = - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl}$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix, $(\boldsymbol{X}^{-1})_{kl}$ denotes the $(k, l)$ entry of $\boldsymbol{X}^{-1}$

Proof

We apply the formula from 8.1.0, replacing scalar $x$ with matrix entry $X_{ij}$. Setting $\boldsymbol{Y} = \boldsymbol{X}$ gives the following.

\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = - \boldsymbol{X}^{-1} \frac{\partial \boldsymbol{X}}{\partial X_{ij}} \boldsymbol{X}^{-1} \label{eq:8-2-1} \end{equation}

We compute $\displaystyle\frac{\partial \boldsymbol{X}}{\partial X_{ij}}$. Differentiating entry $(p, q)$ of $\boldsymbol{X}$ with respect to $X_{ij}$ gives 1 only when $(p, q) = (i, j)$ and 0 otherwise.

\begin{equation} \left( \frac{\partial \boldsymbol{X}}{\partial X_{ij}} \right)_{pq} = \frac{\partial X_{pq}}{\partial X_{ij}} = \delta_{pi} \delta_{qj} \label{eq:8-2-2} \end{equation}

Expressing \eqref{eq:8-2-2} in matrix form. Using standard basis vectors $\boldsymbol{e}_i \in \mathbb{R}^N$ (with 1 in position $i$), the matrix with 1 only at position $(i, j)$ is the outer product $\boldsymbol{e}_i \boldsymbol{e}_j^\top$.

\begin{equation} \frac{\partial \boldsymbol{X}}{\partial X_{ij}} = \boldsymbol{e}_i \boldsymbol{e}_j^\top \label{eq:8-2-3} \end{equation}

Substituting \eqref{eq:8-2-3} into \eqref{eq:8-2-1} gives the following.

\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = - \boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \label{eq:8-2-4} \end{equation}

Computing $\boldsymbol{X}^{-1} \boldsymbol{e}_i$. Since $\boldsymbol{e}_i$ has 1 only in position $i$, $\boldsymbol{X}^{-1} \boldsymbol{e}_i$ is the $i$-th column of $\boldsymbol{X}^{-1}$.

\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i)_k = \sum_{m=0}^{N-1} (\boldsymbol{X}^{-1})_{km} (\boldsymbol{e}_i)_m = (\boldsymbol{X}^{-1})_{ki} \label{eq:8-2-5} \end{equation}

Computing $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$. Since $\boldsymbol{e}_j^\top$ has 1 only in position $j$, $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$ is the $j$-th row of $\boldsymbol{X}^{-1}$.

\begin{equation} (\boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_l = \sum_{m=0}^{N-1} (\boldsymbol{e}_j)_m (\boldsymbol{X}^{-1})_{ml} = (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-6} \end{equation}

Computing the $(k, l)$ entry of the matrix product $\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1}$ in \eqref{eq:8-2-4}. By associativity of matrix products and outer products, we get the following.

\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = (\boldsymbol{X}^{-1} \boldsymbol{e}_i)_k (\boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_l \label{eq:8-2-7} \end{equation}

Substituting \eqref{eq:8-2-5} and \eqref{eq:8-2-6} into \eqref{eq:8-2-7} gives the following.

\begin{equation} (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-8} \end{equation}

Taking the $(k, l)$ entry of \eqref{eq:8-2-4} and substituting \eqref{eq:8-2-8}, we obtain the final result.

\begin{equation} \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} = - (\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1})_{kl} = - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-2-9} \end{equation}

Remark: This formula expresses how each entry of the inverse depends on each entry of the original matrix. The derivative of $(\boldsymbol{X}^{-1})_{kl}$ with respect to $X_{ij}$ is the negative product of the $(k, i)$ entry and the $(j, l)$ entry of $\boldsymbol{X}^{-1}$.

8.1.2 Quadratic Form with Inverse $\boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b}$

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b} = - \boldsymbol{X}^{-\top} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-\top}$

Conditions: $\boldsymbol{a}, \boldsymbol{b} \in \mathbb{R}^N$ are constant vectors, $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is invertible, $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top = (\boldsymbol{X}^\top)^{-1}$

Proof

Write the scalar $f = \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b}$ in component form. Since $\boldsymbol{a}^\top$ is a row vector, $\boldsymbol{X}^{-1}$ is a matrix, and $\boldsymbol{b}$ is a column vector, the result is a scalar.

\begin{equation} f = \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{kl} b_l \label{eq:8-3-1} \end{equation}

Differentiating $f$ with respect to $X_{ij}$. Since $a_k$ and $b_l$ are constants, the differentiation acts only on $(\boldsymbol{X}^{-1})_{kl}$.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} b_l \label{eq:8-3-2} \end{equation}

From 8.1.1, the component-wise derivative of the inverse is as follows.

\begin{equation} \frac{\partial (\boldsymbol{X}^{-1})_{kl}}{\partial X_{ij}} = - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \label{eq:8-3-3} \end{equation}

Substituting \eqref{eq:8-3-3} into \eqref{eq:8-3-2} gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k \left( - (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} \right) b_l \label{eq:8-3-4} \end{equation}

Factoring out the minus sign gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} (\boldsymbol{X}^{-1})_{jl} b_l \label{eq:8-3-5} \end{equation}

Separating the sums. The sums over $k$ and $l$ are independent.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \left( \sum_{k=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} \right) \left( \sum_{l=0}^{N-1} (\boldsymbol{X}^{-1})_{jl} b_l \right) \label{eq:8-3-6} \end{equation}

Computing the first parenthesized expression. $\sum_k a_k (\boldsymbol{X}^{-1})_{ki}$ is the $i$-th component of $\boldsymbol{a}^\top \boldsymbol{X}^{-1}$.

\begin{equation} \sum_{k=0}^{N-1} a_k (\boldsymbol{X}^{-1})_{ki} = (\boldsymbol{a}^\top \boldsymbol{X}^{-1})_i \label{eq:8-3-7} \end{equation}

Rewriting $\boldsymbol{a}^\top \boldsymbol{X}^{-1} = (\boldsymbol{X}^{-\top} \boldsymbol{a})^\top$, \eqref{eq:8-3-7} becomes the following.

\begin{equation} (\boldsymbol{a}^\top \boldsymbol{X}^{-1})_i = ((\boldsymbol{X}^{-1})^\top \boldsymbol{a})_i = (\boldsymbol{X}^{-\top} \boldsymbol{a})_i \label{eq:8-3-8} \end{equation}

Computing the second parenthesized expression. $\sum_l (\boldsymbol{X}^{-1})_{jl} b_l$ is the $j$-th component of $\boldsymbol{X}^{-1} \boldsymbol{b}$.

\begin{equation} \sum_{l=0}^{N-1} (\boldsymbol{X}^{-1})_{jl} b_l = (\boldsymbol{X}^{-1} \boldsymbol{b})_j \label{eq:8-3-9} \end{equation}

Substituting \eqref{eq:8-3-8} and \eqref{eq:8-3-9} into \eqref{eq:8-3-6} gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - (\boldsymbol{X}^{-\top} \boldsymbol{a})_i (\boldsymbol{X}^{-1} \boldsymbol{b})_j \label{eq:8-3-10} \end{equation}

In the denominator layout, the $(i, j)$ entry of $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}}$ is $\displaystyle\frac{\partial f}{\partial X_{ij}}$.

\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = \frac{\partial f}{\partial X_{ij}} \label{eq:8-3-11} \end{equation}

The right-hand side of \eqref{eq:8-3-10} can be written as an outer product. For $\boldsymbol{u}, \boldsymbol{v} \in \mathbb{R}^N$, $(\boldsymbol{u} \boldsymbol{v}^\top)_{ij} = u_i v_j$, so we get the following.

\begin{equation} - (\boldsymbol{X}^{-\top} \boldsymbol{a})_i (\boldsymbol{X}^{-1} \boldsymbol{b})_j = - (\boldsymbol{X}^{-\top} \boldsymbol{a} (\boldsymbol{X}^{-1} \boldsymbol{b})^\top)_{ij} \label{eq:8-3-12} \end{equation}

Substituting $(\boldsymbol{X}^{-1} \boldsymbol{b})^\top = \boldsymbol{b}^\top (\boldsymbol{X}^{-1})^\top = \boldsymbol{b}^\top \boldsymbol{X}^{-\top}$ gives the following.

\begin{equation} - (\boldsymbol{X}^{-\top} \boldsymbol{a} (\boldsymbol{X}^{-1} \boldsymbol{b})^\top)_{ij} = - (\boldsymbol{X}^{-\top} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-\top})_{ij} \label{eq:8-3-13} \end{equation}

Expressing in matrix form, we obtain the final result.

\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b} = - \boldsymbol{X}^{-\top} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-\top} \label{eq:8-3-14} \end{equation}

Remark: When $\boldsymbol{a} = \boldsymbol{b}$, this gives the derivative of the quadratic form $\boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{a}$. When $\boldsymbol{X}$ is symmetric, $\boldsymbol{X}^{-\top} = \boldsymbol{X}^{-1}$ and the formula simplifies to $-\boldsymbol{X}^{-1} \boldsymbol{a} \boldsymbol{b}^\top \boldsymbol{X}^{-1}$.

8.1.3 Determinant of the Inverse $|\boldsymbol{X}^{-1}|$

Formula: $\displaystyle\frac{\partial |\boldsymbol{X}^{-1}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}^{-1}| (\boldsymbol{X}^{-1})^\top$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is an $N \times N$ invertible matrix, $|\boldsymbol{X}^{-1}|$ is the determinant of $\boldsymbol{X}^{-1}$

Proof

We establish the relationship between the determinant of the inverse and the original determinant. By the multiplicative property $|\boldsymbol{A}\boldsymbol{B}| = |\boldsymbol{A}||\boldsymbol{B}|$, we have the following.

\begin{equation} |\boldsymbol{X}||\boldsymbol{X}^{-1}| = |\boldsymbol{X}\boldsymbol{X}^{-1}| = |\boldsymbol{I}| = 1 \label{eq:8-4-1} \end{equation}

Solving \eqref{eq:8-4-1} for $|\boldsymbol{X}^{-1}|$ gives the following.

\begin{equation} |\boldsymbol{X}^{-1}| = \frac{1}{|\boldsymbol{X}|} = |\boldsymbol{X}|^{-1} \label{eq:8-4-2} \end{equation}

Differentiating $|\boldsymbol{X}^{-1}| = |\boldsymbol{X}|^{-1}$ with respect to $\boldsymbol{X}$. This is a composite function, so we apply the chain rule with outer function $f(u) = u^{-1}$ and inner function $g(\boldsymbol{X}) = |\boldsymbol{X}|$.

The derivative of the outer function $f(u) = u^{-1}$ is the following.

\begin{equation} f'(u) = -u^{-2} = -\frac{1}{u^2} \label{eq:8-4-3} \end{equation}

Substituting $u = |\boldsymbol{X}|$ gives the following.

\begin{equation} f'(|\boldsymbol{X}|) = -|\boldsymbol{X}|^{-2} \label{eq:8-4-4} \end{equation}

The derivative of the inner function is given by 7.1 as follows.

\begin{equation} \frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}| \boldsymbol{X}^{-\top} \label{eq:8-4-5} \end{equation}

Applying the chain rule. Multiplying \eqref{eq:8-4-4} and \eqref{eq:8-4-5} gives the following.

\begin{equation} \frac{\partial |\boldsymbol{X}|^{-1}}{\partial \boldsymbol{X}} = f'(|\boldsymbol{X}|) \cdot \frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}|^{-2} \cdot |\boldsymbol{X}| \boldsymbol{X}^{-\top} \label{eq:8-4-6} \end{equation}

Using $|\boldsymbol{X}|^{-2} \cdot |\boldsymbol{X}| = |\boldsymbol{X}|^{-1}$ to simplify gives the following.

\begin{equation} \frac{\partial |\boldsymbol{X}|^{-1}}{\partial \boldsymbol{X}} = -|\boldsymbol{X}|^{-1} \boldsymbol{X}^{-\top} \label{eq:8-4-7} \end{equation}

Substituting $|\boldsymbol{X}|^{-1} = |\boldsymbol{X}^{-1}|$ from \eqref{eq:8-4-2} gives the following.

\begin{equation} \frac{\partial |\boldsymbol{X}^{-1}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}^{-1}| \boldsymbol{X}^{-\top} \label{eq:8-4-8} \end{equation}

Rewriting $\boldsymbol{X}^{-\top} = (\boldsymbol{X}^{-1})^\top$, we obtain the final result.

\begin{equation} \frac{\partial |\boldsymbol{X}^{-1}|}{\partial \boldsymbol{X}} = -|\boldsymbol{X}^{-1}| (\boldsymbol{X}^{-1})^\top \label{eq:8-4-9} \end{equation}

Remark: Comparing with 7.1 ($\displaystyle\frac{\partial |\boldsymbol{X}|}{\partial \boldsymbol{X}} = |\boldsymbol{X}| \boldsymbol{X}^{-\top}$), we see that the inverse case has a minus sign. This reflects the relation $|\boldsymbol{X}^{-1}| = |\boldsymbol{X}|^{-1}$.

8.1.4 Trace with Inverse $\text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}) = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})^\top$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{K \times N}$, $\boldsymbol{B} \in \mathbb{R}^{N \times K}$ are constant matrices, $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is invertible ($\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}$ must be square)

Proof

We differentiate the scalar $f = \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$ with respect to $X_{ij}$. Since the trace is linear, it commutes with differentiation.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \frac{\partial}{\partial X_{ij}} \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}) = \text{tr}\left( \boldsymbol{A} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} \boldsymbol{B} \right) \label{eq:8-5-1} \end{equation}

From the proof of 8.1.1, the component-wise derivative of the inverse is as follows.

\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = -\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \label{eq:8-5-2} \end{equation}

Substituting \eqref{eq:8-5-2} into \eqref{eq:8-5-1} gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \text{tr}\left( \boldsymbol{A} \left( -\boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \right) \boldsymbol{B} \right) \label{eq:8-5-3} \end{equation}

Factoring out the minus sign. Since the trace is linear with respect to scalar multiplication, we get the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \text{tr}\left( \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \right) \label{eq:8-5-4} \end{equation}

Using the cyclic property $\text{tr}(\boldsymbol{P}\boldsymbol{Q}\boldsymbol{R}) = \text{tr}(\boldsymbol{R}\boldsymbol{P}\boldsymbol{Q})$ to rearrange the order. Moving $\boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B}$ to the front gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \right) \label{eq:8-5-5} \end{equation}

$\boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i$ is a scalar ($1 \times 1$ matrix). The trace of a scalar equals the value itself.

\begin{equation} \text{tr}\left( \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \right) = \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i \label{eq:8-5-6} \end{equation}

$\boldsymbol{e}_j^\top \boldsymbol{M} \boldsymbol{e}_i$ extracts the $(j, i)$ entry of matrix $\boldsymbol{M}$. Setting $\boldsymbol{M} = \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1}$ gives the following.

\begin{equation} \boldsymbol{e}_j^\top \boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1} \boldsymbol{e}_i = (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-7} \end{equation}

Combining \eqref{eq:8-5-5}, \eqref{eq:8-5-6}, and \eqref{eq:8-5-7} gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-8} \end{equation}

In the denominator layout, the $(i, j)$ entry of $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}}$ is $\displaystyle\frac{\partial f}{\partial X_{ij}}$, so we get the following.

\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})_{ji} \label{eq:8-5-9} \end{equation}

Having the $(i, j)$ entry equal to $M_{ji}$ means the matrix is the transpose of the original.

\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{X}} \right)_{ij} = - ((\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})^\top)_{ij} \label{eq:8-5-10} \end{equation}

Expressing in matrix form, we obtain the final result.

\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B}) = - (\boldsymbol{X}^{-1} \boldsymbol{B} \boldsymbol{A} \boldsymbol{X}^{-1})^\top \label{eq:8-5-11} \end{equation}

Remark: When $\boldsymbol{A} = \boldsymbol{B} = \boldsymbol{I}$, this gives the derivative of $\text{tr}(\boldsymbol{X}^{-1})$, yielding $-(\boldsymbol{X}^{-1} \boldsymbol{X}^{-1})^\top = -(\boldsymbol{X}^{-2})^\top$. When $\boldsymbol{X}$ is symmetric, the transpose becomes unnecessary.

8.1.5 Trace of Sum Inverse $\text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1})$

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1}) = - ((\boldsymbol{X}+\boldsymbol{A})^{-2})^\top$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{N \times N}$ is a constant matrix, $\boldsymbol{X} \in \mathbb{R}^{N \times N}$ is a matrix variable, $\boldsymbol{X} + \boldsymbol{A}$ is invertible

Proof

Let $\boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$. We differentiate $f = \text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{X}$.

Differentiating $\boldsymbol{Y}$ with respect to $X_{ij}$. Since $\boldsymbol{A}$ is constant, $\displaystyle\frac{\partial \boldsymbol{A}}{\partial X_{ij}} = \boldsymbol{O}$, giving the following.

\begin{equation} \frac{\partial \boldsymbol{Y}}{\partial X_{ij}} = \frac{\partial (\boldsymbol{X} + \boldsymbol{A})}{\partial X_{ij}} = \frac{\partial \boldsymbol{X}}{\partial X_{ij}} + \boldsymbol{O} = \boldsymbol{e}_i \boldsymbol{e}_j^\top \label{eq:8-6-1} \end{equation}

Substituting the coefficient matrices $\boldsymbol{A} \to \boldsymbol{I}$, $\boldsymbol{B} \to \boldsymbol{I}$, and variable matrix $\boldsymbol{X} \to \boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$ in the formula from 8.1.4 gives the following.

\begin{equation} \text{tr}(\boldsymbol{I} \cdot \boldsymbol{Y}^{-1} \cdot \boldsymbol{I}) = \text{tr}(\boldsymbol{Y}^{-1}) \label{eq:8-6-2} \end{equation}

From 8.1.4, the derivative of $\text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{Y}$ is the following.

\begin{equation} \frac{\partial}{\partial \boldsymbol{Y}} \text{tr}(\boldsymbol{Y}^{-1}) = - (\boldsymbol{Y}^{-1} \cdot \boldsymbol{I} \cdot \boldsymbol{I} \cdot \boldsymbol{Y}^{-1})^\top = - (\boldsymbol{Y}^{-2})^\top \label{eq:8-6-3} \end{equation}

Applying the chain rule. The derivative of $f = \text{tr}(\boldsymbol{Y}^{-1})$ with respect to $\boldsymbol{X}$ is the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k,l} \frac{\partial f}{\partial Y_{kl}} \frac{\partial Y_{kl}}{\partial X_{ij}} \label{eq:8-6-4} \end{equation}

From \eqref{eq:8-6-1}, $\displaystyle\frac{\partial Y_{kl}}{\partial X_{ij}} = \delta_{ki} \delta_{lj}$. Substituting into \eqref{eq:8-6-4} gives the following.

\begin{equation} \frac{\partial f}{\partial X_{ij}} = \sum_{k,l} \frac{\partial f}{\partial Y_{kl}} \delta_{ki} \delta_{lj} = \frac{\partial f}{\partial Y_{ij}} \label{eq:8-6-5} \end{equation}

\eqref{eq:8-6-5} shows that when $\boldsymbol{Y} = \boldsymbol{X} + \boldsymbol{A}$, $\displaystyle\frac{\partial f}{\partial \boldsymbol{X}} = \displaystyle\frac{\partial f}{\partial \boldsymbol{Y}}$ (because $\boldsymbol{A}$ is constant).

Combining \eqref{eq:8-6-3} and \eqref{eq:8-6-5}, we obtain the final result.

\begin{equation} \frac{\partial}{\partial \boldsymbol{X}} \text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1}) = - ((\boldsymbol{X}+\boldsymbol{A})^{-2})^\top \label{eq:8-6-6} \end{equation}

Remark: When $\boldsymbol{A} = \boldsymbol{O}$, this reduces to $\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \text{tr}(\boldsymbol{X}^{-1}) = -(\boldsymbol{X}^{-2})^\top$. When $\boldsymbol{X}$ is symmetric, the transpose is unnecessary and the result simplifies to $-(\boldsymbol{X}+\boldsymbol{A})^{-2}$.

8.1.6 Chain Rule for the Inverse

Formula: $\displaystyle\frac{\partial J}{\partial \boldsymbol{A}} = - \boldsymbol{A}^{-\top} \displaystyle\frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{A}^{-\top}$ (when $\boldsymbol{W} = \boldsymbol{A}^{-1}$)

Conditions: $J$ is a scalar function depending on $\boldsymbol{A}$ through the intermediate variable $\boldsymbol{W} = \boldsymbol{A}^{-1}$, $\boldsymbol{A} \in \mathbb{R}^{N \times N}$ is invertible

Proof

$J$ depends on $\boldsymbol{A}$ through $\boldsymbol{W} = \boldsymbol{A}^{-1}$. Writing the chain rule in component form gives the following.

\begin{equation} \frac{\partial J}{\partial A_{ij}} = \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} \frac{\partial J}{\partial W_{kl}} \frac{\partial W_{kl}}{\partial A_{ij}} \label{eq:8-7-1} \end{equation}

From 8.1.1, the component-wise derivative of the inverse is as follows. Since $\boldsymbol{W} = \boldsymbol{A}^{-1}$, we match the indices accordingly.

\begin{equation} \frac{\partial W_{kl}}{\partial A_{ij}} = \frac{\partial (\boldsymbol{A}^{-1})_{kl}}{\partial A_{ij}} = - (\boldsymbol{A}^{-1})_{ki} (\boldsymbol{A}^{-1})_{jl} = -W_{ki} W_{jl} \label{eq:8-7-2} \end{equation}

Substituting \eqref{eq:8-7-2} into \eqref{eq:8-7-1} gives the following.

\begin{equation} \frac{\partial J}{\partial A_{ij}} = \sum_{k,l} \frac{\partial J}{\partial W_{kl}} (-W_{ki} W_{jl}) \label{eq:8-7-3} \end{equation}

Factoring out the minus sign and separating the sums gives the following.

\begin{equation} \frac{\partial J}{\partial A_{ij}} = - \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} W_{ki} \frac{\partial J}{\partial W_{kl}} W_{jl} \label{eq:8-7-4} \end{equation}

Interpreting this sum as a matrix product. Considering the sum over $k$, $\sum_k W_{ki} \displaystyle\frac{\partial J}{\partial W_{kl}}$, we get the following.

\begin{equation} \sum_{k=0}^{N-1} W_{ki} \frac{\partial J}{\partial W_{kl}} = \sum_{k=0}^{N-1} (W^\top)_{ik} \frac{\partial J}{\partial W_{kl}} = \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} \label{eq:8-7-5} \end{equation}

Next, considering the sum over $l$ gives the following.

\begin{equation} \sum_{l=0}^{N-1} \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} W_{jl} = \sum_{l=0}^{N-1} \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \right)_{il} (W^\top)_{lj} = \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{W}^\top \right)_{ij} \label{eq:8-7-6} \end{equation}

Combining \eqref{eq:8-7-4}, \eqref{eq:8-7-5}, and \eqref{eq:8-7-6} gives the following.

\begin{equation} \frac{\partial J}{\partial A_{ij}} = - \left( \boldsymbol{W}^\top \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{W}^\top \right)_{ij} \label{eq:8-7-7} \end{equation}

Substituting $\boldsymbol{W}^\top = \boldsymbol{A}^{-\top}$ from $\boldsymbol{W} = \boldsymbol{A}^{-1}$, we obtain the final result in matrix form.

\begin{equation} \frac{\partial J}{\partial \boldsymbol{A}} = - \boldsymbol{A}^{-\top} \frac{\partial J}{\partial \boldsymbol{W}} \boldsymbol{A}^{-\top} \label{eq:8-7-8} \end{equation}

Remark: This formula is useful in neural network backpropagation for propagating gradients through layers involving matrix inverses. Given $\displaystyle\frac{\partial J}{\partial \boldsymbol{W}}$, one can compute $\displaystyle\frac{\partial J}{\partial \boldsymbol{A}}$. When $\boldsymbol{A}$ is symmetric, $\boldsymbol{A}^{-\top} = \boldsymbol{A}^{-1}$.

8.1.7 Leontief Inverse Derivative

Formula: $\displaystyle\frac{\partial}{\partial A_{ij}} (\boldsymbol{I} - \boldsymbol{A})^{-1} = \boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L}$ ($\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$, $\boldsymbol{E}_{ij}$ is the matrix with 1 only at position $(i,j)$)

Conditions: $\boldsymbol{A} \in \mathbb{R}^{N \times N}$ is an $N \times N$ matrix, $\boldsymbol{I} - \boldsymbol{A}$ is invertible, $\boldsymbol{E}_{ij} = \boldsymbol{e}_i \boldsymbol{e}_j^\top$ is the single-entry matrix with 1 at $(i,j)$

Proof

Define the Leontief inverse as $\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$. We differentiate $\boldsymbol{L}$ with respect to $A_{ij}$.

Let $\boldsymbol{X} = \boldsymbol{I} - \boldsymbol{A}$. Then $\boldsymbol{L} = \boldsymbol{X}^{-1}$.

Applying the chain rule. The derivative of $\boldsymbol{L}$ with respect to $A_{ij}$ can be computed through $\boldsymbol{X}$.

\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \sum_{k,l} \frac{\partial \boldsymbol{L}}{\partial X_{kl}} \frac{\partial X_{kl}}{\partial A_{ij}} \label{eq:8-8-1} \end{equation}

Since $X_{kl} = (\boldsymbol{I} - \boldsymbol{A})_{kl} = \delta_{kl} - A_{kl}$, differentiating with respect to $A_{ij}$ gives the following.

\begin{equation} \frac{\partial X_{kl}}{\partial A_{ij}} = \frac{\partial (\delta_{kl} - A_{kl})}{\partial A_{ij}} = 0 - \frac{\partial A_{kl}}{\partial A_{ij}} \label{eq:8-8-2} \end{equation}

Since $\displaystyle\frac{\partial A_{kl}}{\partial A_{ij}} = \delta_{ki} \delta_{lj}$ (equals 1 only when $(k,l) = (i,j)$), we get the following.

\begin{equation} \frac{\partial X_{kl}}{\partial A_{ij}} = -\delta_{ki} \delta_{lj} \label{eq:8-8-3} \end{equation}

Substituting \eqref{eq:8-8-3} into \eqref{eq:8-8-1} gives the following.

\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \sum_{k,l} \frac{\partial \boldsymbol{L}}{\partial X_{kl}} (-\delta_{ki} \delta_{lj}) \label{eq:8-8-4} \end{equation}

Since $\delta_{ki} \delta_{lj} = 1$ only when $(k,l) = (i,j)$, only one term survives from the sum.

\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = -\frac{\partial \boldsymbol{L}}{\partial X_{ij}} \label{eq:8-8-5} \end{equation}

Computing $\displaystyle\frac{\partial \boldsymbol{L}}{\partial X_{ij}}$. Since $\boldsymbol{L} = \boldsymbol{X}^{-1}$, from the proof of 8.1.1 we have the following.

\begin{equation} \frac{\partial \boldsymbol{X}^{-1}}{\partial X_{ij}} = -\boldsymbol{X}^{-1} \boldsymbol{E}_{ij} \boldsymbol{X}^{-1} \label{eq:8-8-6} \end{equation}

where $\boldsymbol{E}_{ij} = \boldsymbol{e}_i \boldsymbol{e}_j^\top$ is the matrix with 1 only at position $(i,j)$.

Substituting $\boldsymbol{X}^{-1} = \boldsymbol{L}$ gives the following.

\begin{equation} \frac{\partial \boldsymbol{L}}{\partial X_{ij}} = -\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L} \label{eq:8-8-7} \end{equation}

Substituting \eqref{eq:8-8-7} into \eqref{eq:8-8-5} gives the following.

\begin{equation} \frac{\partial \boldsymbol{L}}{\partial A_{ij}} = -(-\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L}) \label{eq:8-8-8} \end{equation}

The minus signs cancel, giving the final result.

\begin{equation} \frac{\partial}{\partial A_{ij}} (\boldsymbol{I} - \boldsymbol{A})^{-1} = \boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L} \label{eq:8-8-9} \end{equation}

Remark: Compared with 8.1.1, the Leontief inverse derivative has no minus sign. This is because the derivative of $\boldsymbol{I} - \boldsymbol{A}$ with respect to $\boldsymbol{A}$ is $-\boldsymbol{I}$. In input-output analysis, the $(k, l)$ entry of $\boldsymbol{L}$ represents the total output of sector $k$ per unit of final demand in sector $l$ (including both direct and indirect effects).

8.1.8 Trace of Leontief Inverse Derivative

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{A}} \text{tr}((\boldsymbol{I} - \boldsymbol{A})^{-1}) = ((\boldsymbol{I} - \boldsymbol{A})^{-1}(\boldsymbol{I} - \boldsymbol{A})^{-1})^\top = (\boldsymbol{L}^2)^\top$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{N \times N}$ is an $N \times N$ matrix, $\boldsymbol{I} - \boldsymbol{A}$ is invertible, $\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$

Proof

Let $\boldsymbol{L} = (\boldsymbol{I} - \boldsymbol{A})^{-1}$. We differentiate $f = \text{tr}(\boldsymbol{L})$ with respect to $A_{ij}$.

Since the trace is the sum of diagonal entries, it can be written as follows.

\begin{equation} f = \text{tr}(\boldsymbol{L}) = \sum_{k=0}^{N-1} L_{kk} \label{eq:8-9-1} \end{equation}

Differentiating $f$ with respect to $A_{ij}$ gives the following.

\begin{equation} \frac{\partial f}{\partial A_{ij}} = \frac{\partial}{\partial A_{ij}} \sum_{k=0}^{N-1} L_{kk} = \sum_{k=0}^{N-1} \frac{\partial L_{kk}}{\partial A_{ij}} \label{eq:8-9-2} \end{equation}

From 8.1.7, $\displaystyle\frac{\partial \boldsymbol{L}}{\partial A_{ij}} = \boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L}$. We compute the $(k, k)$ entry of this.

Writing $(\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk}$ in component form gives the following.

\begin{equation} (\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk} = \sum_{p=0}^{N-1} \sum_{q=0}^{N-1} L_{kp} (\boldsymbol{E}_{ij})_{pq} L_{qk} \label{eq:8-9-3} \end{equation}

Since $(\boldsymbol{E}_{ij})_{pq} = \delta_{pi} \delta_{qj}$ (equals 1 only when $(p,q) = (i,j)$), we get the following.

\begin{equation} (\boldsymbol{L} \boldsymbol{E}_{ij} \boldsymbol{L})_{kk} = \sum_{p,q} L_{kp} \delta_{pi} \delta_{qj} L_{qk} = L_{ki} L_{jk} \label{eq:8-9-4} \end{equation}

Substituting \eqref{eq:8-9-4} into \eqref{eq:8-9-2} gives the following.

\begin{equation} \frac{\partial f}{\partial A_{ij}} = \sum_{k=0}^{N-1} L_{ki} L_{jk} \label{eq:8-9-5} \end{equation}

Interpreting this sum as a matrix product. $\sum_k L_{ki} L_{jk}$ is the inner product of the $i$-th column and the $j$-th row of $\boldsymbol{L}$.

\begin{equation} \sum_{k=0}^{N-1} L_{ki} L_{jk} = \sum_{k=0}^{N-1} (L^\top)_{ik} L_{jk} = (\boldsymbol{L}^\top \boldsymbol{L}^\top)_{ij} = (\boldsymbol{L}^2)^\top_{ij} \label{eq:8-9-6} \end{equation}

Alternatively, $\sum_k L_{ki} L_{jk} = \sum_k L_{jk} L_{ki} = (\boldsymbol{L} \boldsymbol{L})_{ji} = (\boldsymbol{L}^2)_{ji}$.

In the denominator layout, $\left( \displaystyle\frac{\partial f}{\partial \boldsymbol{A}} \right)_{ij} = \displaystyle\frac{\partial f}{\partial A_{ij}}$, so having the $(i, j)$ entry equal $(\boldsymbol{L}^2)_{ji}$ implies transposition.

\begin{equation} \left( \frac{\partial f}{\partial \boldsymbol{A}} \right)_{ij} = (\boldsymbol{L}^2)_{ji} = ((\boldsymbol{L}^2)^\top)_{ij} \label{eq:8-9-7} \end{equation}

Expressing in matrix form, we obtain the final result.

\begin{equation} \frac{\partial}{\partial \boldsymbol{A}} \text{tr}((\boldsymbol{I} - \boldsymbol{A})^{-1}) = (\boldsymbol{L}^2)^\top = ((\boldsymbol{I} - \boldsymbol{A})^{-2})^\top \label{eq:8-9-8} \end{equation}

Remark: Comparing with 8.1.5 ($\displaystyle\frac{\partial}{\partial \boldsymbol{X}} \text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1}) = -((\boldsymbol{X}+\boldsymbol{A})^{-2})^\top$), the sign differs. This is because the derivative of $\boldsymbol{I} - \boldsymbol{A}$ with respect to $\boldsymbol{A}$ is $-\boldsymbol{I}$. In input-output analysis, $\text{tr}(\boldsymbol{L})$ is interpreted as an indicator of the total ripple effect across the economy (sum of influence coefficients).

8.2.1 Moore-Penrose Pseudoinverse Derivative (General Form)

Formula: $d\boldsymbol{X}^+ = -\boldsymbol{X}^+ (d\boldsymbol{X}) \boldsymbol{X}^+ + \boldsymbol{X}^{+\top}\boldsymbol{X}^\top (d\boldsymbol{X})^\top (\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+) + (\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X})(d\boldsymbol{X})^\top \boldsymbol{X}^{+\top}\boldsymbol{X}^+$

Conditions: The rank of $\boldsymbol{X}$ does not change under $d\boldsymbol{X}$ (constant rank)

Proof

Differentiating the Moore-Penrose condition $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}$ gives the following.

\begin{equation} (d\boldsymbol{X})\boldsymbol{X}^+\boldsymbol{X} + \boldsymbol{X}(d\boldsymbol{X}^+)\boldsymbol{X} + \boldsymbol{X}\boldsymbol{X}^+(d\boldsymbol{X}) = d\boldsymbol{X} \label{eq:8-10-1} \end{equation}

Similarly, differentiating $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{X}^+$ gives the following.

\begin{equation} (d\boldsymbol{X}^+)\boldsymbol{X}\boldsymbol{X}^+ + \boldsymbol{X}^+(d\boldsymbol{X})\boldsymbol{X}^+ + \boldsymbol{X}^+\boldsymbol{X}(d\boldsymbol{X}^+) = d\boldsymbol{X}^+ \label{eq:8-10-2} \end{equation}

Combining these conditions with the Hermitian conditions $(\boldsymbol{X}\boldsymbol{X}^+)^\top = \boldsymbol{X}\boldsymbol{X}^+$ and $(\boldsymbol{X}^+\boldsymbol{X})^\top = \boldsymbol{X}^+\boldsymbol{X}$, we solve for $d\boldsymbol{X}^+$ to obtain the Golub-Pereyra (1973) formula.

Remark: The second and third terms involve null space projections and simplify in the full column rank or full row rank cases.

8.2.2 Pseudoinverse Derivative (Full Column Rank)

Formula: $d\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}(d\boldsymbol{X})^\top(\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+) - \boldsymbol{X}^+(d\boldsymbol{X})\boldsymbol{X}^+$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{m \times n}$, $m \ge n$, full column rank

Proof

For full column rank, $\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$ (left inverse).

Since $\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{I}_n$, the null space projection $\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{O}$.

The third term in 8.2.1 vanishes, giving the following.

\begin{equation} d\boldsymbol{X}^+ = -\boldsymbol{X}^+ (d\boldsymbol{X}) \boldsymbol{X}^+ + \boldsymbol{X}^{+\top}\boldsymbol{X}^\top (d\boldsymbol{X})^\top (\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+) \label{eq:8-11-1} \end{equation}

Simplifying $\boldsymbol{X}^{+\top}\boldsymbol{X}^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1} \cdot \boldsymbol{X}^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$ yields the formula.

8.2.3 Pseudoinverse Derivative (Full Row Rank)

Formula: $d\boldsymbol{X}^+ = (\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X})(d\boldsymbol{X})^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} - \boldsymbol{X}^+(d\boldsymbol{X})\boldsymbol{X}^+$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{m \times n}$, $m \le n$, full row rank

Proof

For full row rank, $\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$ (right inverse).

Since $\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{I}_m$, the null space projection $\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{O}$.

The second term in 8.2.1 vanishes, giving the following.

\begin{equation} d\boldsymbol{X}^+ = -\boldsymbol{X}^+ (d\boldsymbol{X}) \boldsymbol{X}^+ + (\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X})(d\boldsymbol{X})^\top \boldsymbol{X}^{+\top}\boldsymbol{X}^+ \label{eq:8-12-1} \end{equation}

Substituting $\boldsymbol{X}^{+\top}\boldsymbol{X}^+ = (\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$ yields the formula.

Remark: For redundant manipulators in robotics ($n > m$), this formula applies when the Jacobian $\boldsymbol{J}$ has full row rank.

8.2.4 Time Derivative of the Pseudoinverse

Formula: $\displaystyle\frac{d\boldsymbol{X}^+}{dt} = -\boldsymbol{X}^+ \dot{\boldsymbol{X}} \boldsymbol{X}^+ + \boldsymbol{X}^{+\top}\boldsymbol{X}^\top \dot{\boldsymbol{X}}^\top (\boldsymbol{I} - \boldsymbol{X}\boldsymbol{X}^+) + (\boldsymbol{I} - \boldsymbol{X}^+\boldsymbol{X})\dot{\boldsymbol{X}}^\top \boldsymbol{X}^{+\top}\boldsymbol{X}^+$

Conditions: $\boldsymbol{X}(t)$ is a function of time, constant rank

Proof

Obtained by replacing $d\boldsymbol{X}$ with $\dot{\boldsymbol{X}} dt$ in 8.2.1 and dividing by $dt$.

In robotics, this is the time derivative of $\boldsymbol{J}^+$, used in acceleration-level inverse kinematics.

\begin{equation} \ddot{\boldsymbol{q}} = \boldsymbol{J}^+\ddot{\boldsymbol{x}} + \dot{\boldsymbol{J}}^+\dot{\boldsymbol{x}} + (\boldsymbol{I} - \boldsymbol{J}^+\boldsymbol{J})\ddot{\boldsymbol{q}}_0 \label{eq:8-13-1} \end{equation}

8.2.5 Derivation of the Right Inverse

$$\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} \quad \text{(full row rank)}$$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{m \times n}$ has full row rank ($\text{rank}(\boldsymbol{X}) = m$, $m \le n$)

Derivation

When $\boldsymbol{X}$ has full row rank, $\boldsymbol{X}\boldsymbol{X}^\top$ is an $m \times m$ positive definite matrix and hence invertible.

Setting $\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}$, we verify the four Moore-Penrose conditions.

(1) $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X} = \boldsymbol{X}$ ✓

(2) $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1} = \boldsymbol{X}^+$ ✓

(3) $(\boldsymbol{X}\boldsymbol{X}^+)^\top = (\boldsymbol{X}\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1})^\top = \boldsymbol{I}^\top = \boldsymbol{I} = \boldsymbol{X}\boldsymbol{X}^+$ ✓

(4) $(\boldsymbol{X}^+\boldsymbol{X})^\top = (\boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X})^\top = \boldsymbol{X}^\top((\boldsymbol{X}\boldsymbol{X}^\top)^{-1})^\top\boldsymbol{X} = \boldsymbol{X}^\top(\boldsymbol{X}\boldsymbol{X}^\top)^{-1}\boldsymbol{X} = \boldsymbol{X}^+\boldsymbol{X}$ ✓

All four conditions are satisfied, confirming this is the Moore-Penrose pseudoinverse. $\square$

8.2.6 Derivation of the Left Inverse

$$\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top \quad \text{(full column rank)}$$

Conditions: $\boldsymbol{X} \in \mathbb{R}^{m \times n}$ has full column rank ($\text{rank}(\boldsymbol{X}) = n$, $m \ge n$)

Derivation

When $\boldsymbol{X}$ has full column rank, $\boldsymbol{X}^\top\boldsymbol{X}$ is an $n \times n$ positive definite matrix and hence invertible.

Setting $\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$, we verify the four Moore-Penrose conditions.

(1) $\boldsymbol{X}\boldsymbol{X}^+\boldsymbol{X} = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X} = \boldsymbol{X}$ ✓

(2) $\boldsymbol{X}^+\boldsymbol{X}\boldsymbol{X}^+ = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = \boldsymbol{X}^+$ ✓

(3) $(\boldsymbol{X}\boldsymbol{X}^+)^\top = (\boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top)^\top = \boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top = \boldsymbol{X}\boldsymbol{X}^+$ ✓

(4) $(\boldsymbol{X}^+\boldsymbol{X})^\top = ((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{X})^\top = \boldsymbol{I}^\top = \boldsymbol{I} = \boldsymbol{X}^+\boldsymbol{X}$ ✓

All four conditions are satisfied, confirming this is the Moore-Penrose pseudoinverse. $\square$

Remark: This formula corresponds to the solution $\boldsymbol{x} = (\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{b}$ of the normal equations $\boldsymbol{X}^\top\boldsymbol{X}\boldsymbol{x} = \boldsymbol{X}^\top\boldsymbol{b}$ for the least squares problem $\min_{\boldsymbol{x}} \|\boldsymbol{X}\boldsymbol{x} - \boldsymbol{b}\|^2$.

Proofs Chapter 8: Matrix Inverse Derivatives

8. Matrix Inverse Derivatives

8.1.0 Scalar Derivative of the Inverse (Fundamental)

Proof

8.1.1 Component-wise Derivative of the Inverse

Proof

8.1.2 Quadratic Form with Inverse $\boldsymbol{a}^\top \boldsymbol{X}^{-1} \boldsymbol{b}$

Proof

8.1.3 Determinant of the Inverse $|\boldsymbol{X}^{-1}|$

Proof

8.1.4 Trace with Inverse $\text{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$

Proof

8.1.5 Trace of Sum Inverse $\text{tr}((\boldsymbol{X}+\boldsymbol{A})^{-1})$

Proof

8.1.6 Chain Rule for the Inverse

Proof

8.1.7 Leontief Inverse Derivative

Proof

8.1.8 Trace of Leontief Inverse Derivative

Proof

Moore-Penrose Pseudoinverse Derivatives

8.2.1 Moore-Penrose Pseudoinverse Derivative (General Form)

Proof

8.2.2 Pseudoinverse Derivative (Full Column Rank)

Proof

8.2.3 Pseudoinverse Derivative (Full Row Rank)

Proof

8.2.4 Time Derivative of the Pseudoinverse

Proof

8.2.5 Derivation of the Right Inverse

Derivation

8.2.6 Derivation of the Left Inverse

Derivation

References