Proofs Chapter 13: Derivatives of Structured Matrices
Proofs Chapter 13: Derivatives of Structured Matrices
In this chapter we prove differentiation formulas for structured matrices (symmetric, Toeplitz, triangular, etc.). In practice, matrices rarely have arbitrary form; they typically possess symmetry, banded structure, or other constraints. Differentiation that accounts for structure is important both for computational efficiency and numerical stability, arising in covariance matrix estimation (symmetric positive definite), efficient estimation exploiting Toeplitz structure in time series, and numerically stable optimization via Cholesky decomposition. We present systematic derivations using the vec operator and Kronecker products.
Prerequisites: Chapter 5 (Derivatives of the Trace), Chapter 4 (Basic Matrix Differentiation Formulas). Related chapters: Chapter 15 (Derivatives of Special Matrices).
13. Derivatives of Structured Matrices
Unless stated otherwise, all formulas in this chapter hold under the following conditions:
- All formulas use the denominator layout convention
- Structured matrices have constraints between entries, reducing the number of independent variables
- $\circ$ denotes the Hadamard (element-wise) product
We derive differentiation formulas for matrices with special structure (symmetric, Toeplitz, triangular, etc.). In structured matrices, constraints between entries reduce the number of independent variables, so standard matrix differentiation formulas cannot be applied directly.
Fundamentals of Structured Matrices
We first develop the general framework for differentiating structured matrices and then apply it to the cases of general and symmetric matrices.
13.1 General Formula for Differentiation via Structure Matrices
Proof
Consider a scalar function $f(\boldsymbol{A})$. When $\boldsymbol{A}$ is a structured matrix, there are constraints between its entries, so changing an independent variable $A_{ij}$ causes other entries to change as well.
Applying the chain rule (1.26), we consider how $f$ changes through all entries $A_{kl}$ that vary when $A_{ij}$ is perturbed.
\begin{equation}\frac{df}{dA_{ij}} = \sum_{k,l} \frac{\partial f}{\partial A_{kl}} \frac{\partial A_{kl}}{\partial A_{ij}} \label{eq:13-1-1}\end{equation}
We define the structure matrix $\boldsymbol{S}^{ij}$. It represents "how the entire matrix $\boldsymbol{A}$ changes when $A_{ij}$ is perturbed by one unit."
\begin{equation}\boldsymbol{S}^{ij} = \frac{\partial \boldsymbol{A}}{\partial A_{ij}} \label{eq:13-1-2}\end{equation}
The $(k,l)$ entry of $\boldsymbol{S}^{ij}$ is given by
\begin{equation}(\boldsymbol{S}^{ij})_{kl} = \frac{\partial A_{kl}}{\partial A_{ij}} \label{eq:13-1-3}\end{equation}
We rewrite $\eqref{eq:13-1-1}$ in matrix form. Introducing the gradient matrix $\boldsymbol{G} = \displaystyle\frac{\partial f}{\partial \boldsymbol{A}}$, we have $G_{kl} = \displaystyle\frac{\partial f}{\partial A_{kl}}$.
\begin{equation}\frac{df}{dA_{ij}} = \sum_{k,l} G_{kl} (\boldsymbol{S}^{ij})_{kl} \label{eq:13-1-4}\end{equation}
The element-wise sum of products of two matrices can be expressed as a trace. In general, $\sum_{k,l} P_{kl} Q_{kl} = \text{tr}(\boldsymbol{P}^\top \boldsymbol{Q})$.
\begin{equation}\sum_{k,l} G_{kl} (\boldsymbol{S}^{ij})_{kl} = \text{tr}(\boldsymbol{G}^\top \boldsymbol{S}^{ij}) \label{eq:13-1-5}\end{equation}
Substituting $\eqref{eq:13-1-5}$ into $\eqref{eq:13-1-4}$, we obtain the final result.
\begin{equation}\frac{df}{dA_{ij}} = \text{tr}\left[\left(\frac{\partial f}{\partial \boldsymbol{A}}\right)^\top \boldsymbol{S}^{ij}\right] \label{eq:13-1-6}\end{equation}
13.2 Structure Matrix for a General Matrix
Proof
Consider a general matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$. In a general matrix, all entries are independent and there are no constraints between them.
We differentiate the entry $A_{kl}$ of $\boldsymbol{A}$ with respect to the independent variable $A_{ij}$. Since changing $A_{ij}$ does not affect any other entry, we have
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ij}} = \begin{cases} 1 & (k = i \text{ and } l = j) \\ 0 & (\text{otherwise}) \end{cases} \label{eq:13-2-1}\end{equation}
We express $\eqref{eq:13-2-1}$ using the Kronecker delta.
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ij}} = \delta_{ik} \delta_{jl} \label{eq:13-2-2}\end{equation}
where $\delta_{ik}$ equals 1 when $i = k$ and 0 otherwise.
We define the single-entry matrix $\boldsymbol{J}^{ij}$ as the matrix whose only nonzero entry is a 1 in position $(i,j)$.
\begin{equation}(\boldsymbol{J}^{ij})_{kl} = \delta_{ik} \delta_{jl} \label{eq:13-2-3}\end{equation}
Comparing $\eqref{eq:13-2-2}$ and $\eqref{eq:13-2-3}$, they coincide.
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ij}} = (\boldsymbol{J}^{ij})_{kl} \label{eq:13-2-4}\end{equation}
Since $\eqref{eq:13-2-4}$ holds for all $(k,l)$, we obtain the final result in matrix form.
\begin{equation}\boldsymbol{S}^{ij} = \frac{\partial \boldsymbol{A}}{\partial A_{ij}} = \boldsymbol{J}^{ij} \label{eq:13-2-5}\end{equation}
13.3 Structure Matrix for a Symmetric Matrix
Proof
Consider a symmetric matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$. The symmetry condition requires that for all $i, j$:
\begin{equation}A_{ij} = A_{ji} \label{eq:13-3-1}\end{equation}
We analyze how the entry $A_{kl}$ of $\boldsymbol{A}$ changes when the independent variable $A_{ij}$ (with $i \leq j$) is perturbed, by considering two cases.
[Case 1] When $i \neq j$ (off-diagonal entry):
Perturbing $A_{ij}$ also changes $A_{ji}$ by the constraint $\eqref{eq:13-3-1}$.
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ij}} = \begin{cases} 1 & (k = i, l = j) \\ 1 & (k = j, l = i) \\ 0 & (\text{otherwise}) \end{cases} \label{eq:13-3-2}\end{equation}
[Case 2] When $i = j$ (diagonal entry):
Perturbing $A_{ii}$ trivially satisfies $A_{ii} = A_{ii}$, and no other entry changes.
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ii}} = \begin{cases} 1 & (k = i, l = i) \\ 0 & (\text{otherwise}) \end{cases} \label{eq:13-3-3}\end{equation}
We write $\eqref{eq:13-3-2}$ and $\eqref{eq:13-3-3}$ in a unified form using the Kronecker delta.
\begin{equation}\frac{\partial A_{kl}}{\partial A_{ij}} = \delta_{ik}\delta_{jl} + \delta_{il}\delta_{jk} - \delta_{ij}\delta_{ik}\delta_{jl} \label{eq:13-3-4}\end{equation}
We verify each term in $\eqref{eq:13-3-4}$:
- First term $\delta_{ik}\delta_{jl}$: equals 1 when $(k,l) = (i,j)$
- Second term $\delta_{il}\delta_{jk}$: equals 1 when $(k,l) = (j,i)$
- Third term $-\delta_{ij}\delta_{ik}\delta_{jl}$: equals $-1$ when $i = j$ and $(k,l) = (i,i)$ (correcting the double count of diagonal entries)
We express $\eqref{eq:13-3-4}$ in matrix form using the definition $(\boldsymbol{J}^{pq})_{kl} = \delta_{pk}\delta_{ql}$.
\begin{equation}(\boldsymbol{S}^{ij})_{kl} = (\boldsymbol{J}^{ij})_{kl} + (\boldsymbol{J}^{ji})_{kl} - \delta_{ij}(\boldsymbol{J}^{ij})_{kl} \label{eq:13-3-5}\end{equation}
Since $\eqref{eq:13-3-5}$ holds for all $(k,l)$, we obtain the final result.
\begin{equation}\boldsymbol{S}^{ij} = \boldsymbol{J}^{ij} + \boldsymbol{J}^{ji} - \delta_{ij}\boldsymbol{J}^{ij} \label{eq:13-3-6}\end{equation}
13.4 Differentiation with Respect to a Symmetric Matrix
Proof
Consider a scalar function $f(\boldsymbol{A})$. When $\boldsymbol{A}$ is symmetric, the independent variables are only the lower triangular part (including the diagonal).
From 13.1, $\eqref{eq:13-1-6}$, the differentiation formula using the structure matrix is
\begin{equation}\frac{df}{dA_{ij}} = \text{tr}\left[\left(\frac{\partial f}{\partial \boldsymbol{A}}\right)^\top \boldsymbol{S}^{ij}\right] \label{eq:13-4-1}\end{equation}
Let $\boldsymbol{G} = \displaystyle\frac{\partial f}{\partial \boldsymbol{A}}\big|_{\text{gen}}$ denote the gradient computed as if $\boldsymbol{A}$ were a general matrix. Using the trace identity $\text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{pq}) = G_{pq}$:
\begin{equation}\text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{pq}) = \sum_{k,l} G_{kl} (\boldsymbol{J}^{pq})_{kl} = \sum_{k,l} G_{kl} \delta_{pk}\delta_{ql} = G_{pq} \label{eq:13-4-2}\end{equation}
From 13.3, $\eqref{eq:13-3-6}$, the structure matrix for a symmetric matrix is $\boldsymbol{S}^{ij} = \boldsymbol{J}^{ij} + \boldsymbol{J}^{ji} - \delta_{ij}\boldsymbol{J}^{ij}$. Substituting into $\eqref{eq:13-4-1}$:
\begin{equation}\frac{df}{dA_{ij}} = \text{tr}\left[\boldsymbol{G}^\top (\boldsymbol{J}^{ij} + \boldsymbol{J}^{ji} - \delta_{ij}\boldsymbol{J}^{ij})\right] \label{eq:13-4-3}\end{equation}
By linearity of the trace, we expand $\eqref{eq:13-4-3}$.
\begin{equation}\frac{df}{dA_{ij}} = \text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{ij}) + \text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{ji}) - \delta_{ij}\text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{ij}) \label{eq:13-4-4}\end{equation}
Using $\eqref{eq:13-4-2}$ to evaluate each term:
\begin{equation}\text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{ij}) = G_{ij} \label{eq:13-4-5}\end{equation}
\begin{equation}\text{tr}(\boldsymbol{G}^\top \boldsymbol{J}^{ji}) = G_{ji} \label{eq:13-4-6}\end{equation}
Substituting $\eqref{eq:13-4-5}$ and $\eqref{eq:13-4-6}$ into $\eqref{eq:13-4-4}$:
\begin{equation}\frac{df}{dA_{ij}} = G_{ij} + G_{ji} - \delta_{ij} G_{ij} \label{eq:13-4-7}\end{equation}
We write $\eqref{eq:13-4-7}$ in matrix form. The $(i,j)$ entry of the symmetric derivative $\displaystyle\frac{\partial f}{\partial \boldsymbol{A}}\big|_{\text{sym}}$ is given by $\eqref{eq:13-4-7}$.
\begin{equation}\left(\frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{sym}}\right)_{ij} = G_{ij} + G_{ji} - \delta_{ij} G_{ij} \label{eq:13-4-8}\end{equation}
We express each term in $\eqref{eq:13-4-8}$ as a matrix:
- $G_{ij}$ is the $(i,j)$ entry of $\boldsymbol{G}$
- $G_{ji}$ is the $(i,j)$ entry of $\boldsymbol{G}^\top$
- $\delta_{ij} G_{ij}$ is the $(i,j)$ entry of the diagonal matrix $\text{diag}(\boldsymbol{G})$
Therefore, we obtain the final result.
\begin{equation}\frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{sym}} = \boldsymbol{G} + \boldsymbol{G}^\top - \text{diag}(\boldsymbol{G}) \label{eq:13-4-9}\end{equation}
Substituting $\boldsymbol{G} = \displaystyle\frac{\partial f}{\partial \boldsymbol{A}}\big|_{\text{gen}}$:
\begin{equation}\frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{sym}} = \frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{gen}} + \left(\frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{gen}}\right)^\top - \text{diag}\left(\frac{\partial f}{\partial \boldsymbol{A}}\bigg|_{\text{gen}}\right) \label{eq:13-4-10}\end{equation}
13.1 The vec Operator and Related Matrices
We prove properties of the vec operator, which converts a matrix into a column vector, and related matrices.
13.5 Vectorization of a Matrix Product
Proof
Consider the matrix product $\boldsymbol{C} = \boldsymbol{A}\boldsymbol{X}\boldsymbol{B}$, where $\boldsymbol{C} \in \mathbb{R}^{m \times q}$.
We compute the $(i,j)$ entry of $\boldsymbol{C}$. By the definition of matrix multiplication:
\begin{equation}C_{ij} = \sum_{k=0}^{n-1} \sum_{l=0}^{p-1} A_{ik} X_{kl} B_{lj} \label{eq:13-7-1}\end{equation}
We recall the definition of the vec operator. $\text{vec}(\boldsymbol{X})$ is the $np$-dimensional vector formed by stacking the columns of $\boldsymbol{X}$. The entry $X_{kl}$ occupies position $k + nl$ in $\text{vec}(\boldsymbol{X})$.
\begin{equation}[\text{vec}(\boldsymbol{X})]_{k + nl} = X_{kl} \label{eq:13-7-2}\end{equation}
Similarly, $C_{ij}$ occupies position $i + mj$ in $\text{vec}(\boldsymbol{C})$.
\begin{equation}[\text{vec}(\boldsymbol{C})]_{i + mj} = C_{ij} \label{eq:13-7-3}\end{equation}
We recall the definition of the Kronecker product $\boldsymbol{B}^\top \otimes \boldsymbol{A}$. Since $\boldsymbol{B}^\top \in \mathbb{R}^{q \times p}$ and $\boldsymbol{A} \in \mathbb{R}^{m \times n}$, we have $\boldsymbol{B}^\top \otimes \boldsymbol{A} \in \mathbb{R}^{mq \times np}$.
\begin{equation}\boldsymbol{B}^\top \otimes \boldsymbol{A} = \begin{pmatrix} B_{00} \boldsymbol{A} & B_{10} \boldsymbol{A} & \cdots & B_{p-1,0} \boldsymbol{A} \\ B_{01} \boldsymbol{A} & B_{11} \boldsymbol{A} & \cdots & B_{p-1,1} \boldsymbol{A} \\ \vdots & \vdots & \ddots & \vdots \\ B_{0,q-1} \boldsymbol{A} & B_{1,q-1} \boldsymbol{A} & \cdots & B_{p-1,q-1} \boldsymbol{A} \end{pmatrix} \label{eq:13-7-4}\end{equation}
We express the entries of $\boldsymbol{B}^\top \otimes \boldsymbol{A}$ using indices. The entry in row $i + mj$ ($0 \leq i \leq m-1$, $0 \leq j \leq q-1$) and column $k + nl$ ($0 \leq k \leq n-1$, $0 \leq l \leq p-1$) is
\begin{equation}(\boldsymbol{B}^\top \otimes \boldsymbol{A})_{i + mj,\, k + nl} = (\boldsymbol{B}^\top)_{jl} A_{ik} = B_{lj} A_{ik} \label{eq:13-7-5}\end{equation}
We compute the entries of $(\boldsymbol{B}^\top \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X})$. By the definition of matrix-vector multiplication, the entry at position $i + mj$ is
\begin{equation}[(\boldsymbol{B}^\top \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X})]_{i + mj} = \sum_{k=0}^{n-1} \sum_{l=0}^{p-1} (\boldsymbol{B}^\top \otimes \boldsymbol{A})_{i + mj,\, k + nl} [\text{vec}(\boldsymbol{X})]_{k + nl} \label{eq:13-7-6}\end{equation}
Substituting $\eqref{eq:13-7-5}$ and $\eqref{eq:13-7-2}$ into $\eqref{eq:13-7-6}$:
\begin{equation}[(\boldsymbol{B}^\top \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X})]_{i + mj} = \sum_{k=0}^{n-1} \sum_{l=0}^{p-1} B_{lj} A_{ik} X_{kl} \label{eq:13-7-7}\end{equation}
We rearrange the right-hand side of $\eqref{eq:13-7-7}$. Since the factors are scalars, their order can be permuted.
\begin{equation}\sum_{k=0}^{n-1} \sum_{l=0}^{p-1} B_{lj} A_{ik} X_{kl} = \sum_{k=0}^{n-1} \sum_{l=0}^{p-1} A_{ik} X_{kl} B_{lj} \label{eq:13-7-8}\end{equation}
Comparing $\eqref{eq:13-7-1}$ and $\eqref{eq:13-7-8}$, they are identical.
\begin{equation}[(\boldsymbol{B}^\top \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X})]_{i + mj} = C_{ij} = [\text{vec}(\boldsymbol{C})]_{i + mj} \label{eq:13-7-9}\end{equation}
Since $\eqref{eq:13-7-9}$ holds for all $(i,j)$, the vectors are equal. Therefore, we obtain the final result.
\begin{equation}\text{vec}(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}) = (\boldsymbol{B}^\top \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X}) \label{eq:13-7-10}\end{equation}
13.6 The Commutation Matrix
Proof
Consider a matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$. We examine the relationship between the positions of entries in $\text{vec}(\boldsymbol{A})$ and $\text{vec}(\boldsymbol{A}^\top)$.
The $(i,j)$ entry $A_{ij}$ of $\boldsymbol{A}$ occupies position $i + mj$ in $\text{vec}(\boldsymbol{A})$.
\begin{equation}[\text{vec}(\boldsymbol{A})]_{i + mj} = A_{ij} \label{eq:13-8-1}\end{equation}
The $(j,i)$ entry of $\boldsymbol{A}^\top \in \mathbb{R}^{n \times m}$ is $(\boldsymbol{A}^\top)_{ji} = A_{ij}$. This entry occupies position $j + ni$ in $\text{vec}(\boldsymbol{A}^\top)$.
\begin{equation}[\text{vec}(\boldsymbol{A}^\top)]_{j + ni} = A_{ij} \label{eq:13-8-2}\end{equation}
From $\eqref{eq:13-8-1}$ and $\eqref{eq:13-8-2}$, the entry at position $i + mj$ in $\text{vec}(\boldsymbol{A})$ must be moved to position $j + ni$ in $\text{vec}(\boldsymbol{A}^\top)$.
We define the commutation matrix $\boldsymbol{K}_{mn}$. It is an $mn \times mn$ permutation matrix with the property
\begin{equation}(\boldsymbol{K}_{mn})_{j + ni,\, i + mj} = 1 \label{eq:13-8-3}\end{equation}
All other entries are zero.
We compute the entries of $\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A})$. The entry at position $j + ni$ is
\begin{equation}[\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A})]_{j + ni} = \sum_{k=0}^{mn-1} (\boldsymbol{K}_{mn})_{j + ni,\, k} [\text{vec}(\boldsymbol{A})]_k \label{eq:13-8-4}\end{equation}
Since $\boldsymbol{K}_{mn}$ is a permutation matrix with exactly one 1 per row, by $\eqref{eq:13-8-3}$ the only nonzero entry in row $j + ni$ is in column $i + mj$.
\begin{equation}[\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A})]_{j + ni} = [\text{vec}(\boldsymbol{A})]_{i + mj} \label{eq:13-8-5}\end{equation}
Substituting $\eqref{eq:13-8-1}$ into $\eqref{eq:13-8-5}$:
\begin{equation}[\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A})]_{j + ni} = A_{ij} \label{eq:13-8-6}\end{equation}
Comparing $\eqref{eq:13-8-2}$ and $\eqref{eq:13-8-6}$, they coincide.
\begin{equation}[\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A})]_{j + ni} = [\text{vec}(\boldsymbol{A}^\top)]_{j + ni} \label{eq:13-8-7}\end{equation}
Since $\eqref{eq:13-8-7}$ holds for all $(i,j)$ ($0 \leq i \leq m-1$, $0 \leq j \leq n-1$), we obtain the final result.
\begin{equation}\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{A}) = \text{vec}(\boldsymbol{A}^\top) \label{eq:13-8-8}\end{equation}
13.7 Reordering Kronecker Products
Proof
For an arbitrary matrix $\boldsymbol{X} \in \mathbb{R}^{p \times q}$, we show equality by applying both sides to $\text{vec}(\boldsymbol{X})$.
[Left-hand side computation]
In 13.5, $\eqref{eq:13-7-10}$, substituting $\boldsymbol{B} \to \boldsymbol{B}^\top$ gives
\begin{equation}\text{vec}(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}^\top) = (\boldsymbol{B} \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X}) \label{eq:13-9-1}\end{equation}
To compute $(\boldsymbol{A} \otimes \boldsymbol{B})\,\text{vec}(\boldsymbol{X})$, we use another form of $\eqref{eq:13-7-10}$. Swapping indices:
\begin{equation}\text{vec}(\boldsymbol{B}\boldsymbol{Y}\boldsymbol{A}^\top) = (\boldsymbol{A} \otimes \boldsymbol{B})\,\text{vec}(\boldsymbol{Y}) \label{eq:13-9-2}\end{equation}
where $\boldsymbol{Y} \in \mathbb{R}^{q \times p}$.
Setting $\boldsymbol{Y} = \boldsymbol{X}^\top$, we have $\text{vec}(\boldsymbol{Y}) = \text{vec}(\boldsymbol{X}^\top)$. By 13.6, $\eqref{eq:13-8-8}$:
\begin{equation}\text{vec}(\boldsymbol{X}^\top) = \boldsymbol{K}_{pq}\,\text{vec}(\boldsymbol{X}) \label{eq:13-9-3}\end{equation}
Substituting $\eqref{eq:13-9-3}$ into $\eqref{eq:13-9-2}$:
\begin{equation}\text{vec}(\boldsymbol{B}\boldsymbol{X}^\top\boldsymbol{A}^\top) = (\boldsymbol{A} \otimes \boldsymbol{B})\boldsymbol{K}_{pq}\,\text{vec}(\boldsymbol{X}) \label{eq:13-9-4}\end{equation}
We apply the commutation matrix to the left-hand side. Setting $\boldsymbol{C} = \boldsymbol{A}\boldsymbol{X}\boldsymbol{B}^\top \in \mathbb{R}^{m \times n}$, we have $\boldsymbol{C}^\top = \boldsymbol{B}\boldsymbol{X}^\top\boldsymbol{A}^\top \in \mathbb{R}^{n \times m}$.
\begin{equation}\boldsymbol{K}_{mn}\,\text{vec}(\boldsymbol{C}) = \text{vec}(\boldsymbol{C}^\top) = \text{vec}(\boldsymbol{B}\boldsymbol{X}^\top\boldsymbol{A}^\top) \label{eq:13-9-5}\end{equation}
Since $\text{vec}(\boldsymbol{C}) = (\boldsymbol{B} \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X})$ by $\eqref{eq:13-9-1}$, equation $\eqref{eq:13-9-5}$ becomes
\begin{equation}\boldsymbol{K}_{mn}(\boldsymbol{B} \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X}) = \text{vec}(\boldsymbol{B}\boldsymbol{X}^\top\boldsymbol{A}^\top) \label{eq:13-9-6}\end{equation}
[Right-hand side computation]
From $\eqref{eq:13-9-4}$, we have $(\boldsymbol{A} \otimes \boldsymbol{B})\boldsymbol{K}_{pq}\,\text{vec}(\boldsymbol{X}) = \text{vec}(\boldsymbol{B}\boldsymbol{X}^\top\boldsymbol{A}^\top)$.
[Comparison]
Comparing the left- and right-hand sides of $\eqref{eq:13-9-6}$ and interchanging $\boldsymbol{A}$ and $\boldsymbol{B}$:
\begin{equation}\boldsymbol{K}_{mn}(\boldsymbol{A} \otimes \boldsymbol{B})\,\text{vec}(\boldsymbol{X}) = \text{vec}(\boldsymbol{A}\boldsymbol{X}^\top\boldsymbol{B}^\top) \label{eq:13-9-7}\end{equation}
On the other hand, interchanging $\boldsymbol{A}$ and $\boldsymbol{B}$ in $\eqref{eq:13-9-2}$:
\begin{equation}(\boldsymbol{B} \otimes \boldsymbol{A})\,\text{vec}(\boldsymbol{X}^\top) = \text{vec}(\boldsymbol{A}\boldsymbol{X}^\top\boldsymbol{B}^\top) \label{eq:13-9-8}\end{equation}
Substituting $\eqref{eq:13-9-3}$ into $\eqref{eq:13-9-8}$:
\begin{equation}(\boldsymbol{B} \otimes \boldsymbol{A})\boldsymbol{K}_{pq}\,\text{vec}(\boldsymbol{X}) = \text{vec}(\boldsymbol{A}\boldsymbol{X}^\top\boldsymbol{B}^\top) \label{eq:13-9-9}\end{equation}
From $\eqref{eq:13-9-7}$ and $\eqref{eq:13-9-9}$, for all $\boldsymbol{X}$:
\begin{equation}\boldsymbol{K}_{mn}(\boldsymbol{A} \otimes \boldsymbol{B})\,\text{vec}(\boldsymbol{X}) = (\boldsymbol{B} \otimes \boldsymbol{A})\boldsymbol{K}_{pq}\,\text{vec}(\boldsymbol{X}) \label{eq:13-9-10}\end{equation}
Since $\text{vec}(\boldsymbol{X})$ is arbitrary, the matrices are equal.
\begin{equation}\boldsymbol{K}_{mn}(\boldsymbol{A} \otimes \boldsymbol{B}) = (\boldsymbol{B} \otimes \boldsymbol{A})\boldsymbol{K}_{pq} \label{eq:13-9-11}\end{equation}
13.8 The Duplication Matrix
Proof (constructive)
Consider a symmetric matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$. By symmetry $A_{ij} = A_{ji}$, the independent entries are only those in the lower triangular part (including the diagonal).
We define the $\text{vech}$ operator. $\text{vech}(\boldsymbol{A})$ is the $n(n+1)/2$-dimensional vector formed by stacking the lower triangular entries of $\boldsymbol{A}$ column by column.
\begin{equation}\text{vech}(\boldsymbol{A}) = (A_{00}, A_{10}, \ldots, A_{n-1,0}, A_{11}, A_{21}, \ldots, A_{n-1,1}, \ldots, A_{n-1,n-1})^\top \label{eq:13-10-1}\end{equation}
We determine the position $\nu(i,j)$ of $A_{ij}$ ($i \geq j$) in $\text{vech}(\boldsymbol{A})$. The lower triangular part of column $j$ contains $n - j$ entries: $A_{jj}, A_{j+1,j}, \ldots, A_{n-1,j}$.
\begin{equation}\nu(i,j) = \sum_{k=0}^{j-1}(n - k) + (i - j) = \frac{j(2n - j - 1)}{2} + i \label{eq:13-10-2}\end{equation}
The position of $A_{ij}$ in $\text{vec}(\boldsymbol{A})$ is $i + nj$.
\begin{equation}\mu(i,j) = i + nj \label{eq:13-10-3}\end{equation}
We construct the duplication matrix $\boldsymbol{D}_n \in \mathbb{R}^{n^2 \times n(n+1)/2}$, which places each entry of $\text{vech}(\boldsymbol{A})$ into the appropriate positions of $\text{vec}(\boldsymbol{A})$.
For $A_{ij}$ ($i \geq j$), $\boldsymbol{D}_n$ places a 1 at the following positions:
- Row $\mu(i,j) = i + nj$, column $\nu(i,j)$: placing $A_{ij}$ at position $(i,j)$
- When $i \neq j$, row $\mu(j,i) = j + ni$, column $\nu(i,j)$: duplicating $A_{ij}$ at position $(j,i)$
We express the entries of $\boldsymbol{D}_n$ as formulas. For the $\nu(i,j)$-th entry ($i \geq j$) of $\text{vech}(\boldsymbol{A})$:
\begin{equation}(\boldsymbol{D}_n)_{\mu(i,j),\, \nu(i,j)} = 1 \label{eq:13-10-4}\end{equation}
\begin{equation}(\boldsymbol{D}_n)_{\mu(j,i),\, \nu(i,j)} = 1 \quad (\text{when } i \neq j) \label{eq:13-10-5}\end{equation}
We compute $\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A})$. The entry at position $\mu(i,j) = i + nj$ is as follows.
[Case 1] When $i \geq j$:
\begin{equation}[\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A})]_{i + nj} = [\text{vech}(\boldsymbol{A})]_{\nu(i,j)} = A_{ij} \label{eq:13-10-6}\end{equation}
[Case 2] When $i < j$ (upper triangular part):
By symmetry $A_{ij} = A_{ji}$, and $A_{ji}$ appears at position $\nu(j,i)$ in $\text{vech}(\boldsymbol{A})$.
\begin{equation}[\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A})]_{i + nj} = [\text{vech}(\boldsymbol{A})]_{\nu(j,i)} = A_{ji} = A_{ij} \label{eq:13-10-7}\end{equation}
From $\eqref{eq:13-10-6}$ and $\eqref{eq:13-10-7}$, for all $(i,j)$:
\begin{equation}[\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A})]_{i + nj} = A_{ij} = [\text{vec}(\boldsymbol{A})]_{i + nj} \label{eq:13-10-8}\end{equation}
Therefore, we obtain the final result.
\begin{equation}\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A}) = \text{vec}(\boldsymbol{A}) \label{eq:13-10-9}\end{equation}
13.9 The Elimination Matrix
Proof (constructive)
The elimination matrix $\boldsymbol{L}_n$ extracts only the lower triangular part from $\text{vec}(\boldsymbol{A})$.
We use the position functions $\nu(i,j)$ and $\mu(i,j)$ defined in 13.8.
\begin{equation}\nu(i,j) = \frac{j(2n - j - 1)}{2} + i \quad (i \geq j) \label{eq:13-11-1}\end{equation}
\begin{equation}\mu(i,j) = i + nj \label{eq:13-11-2}\end{equation}
We construct $\boldsymbol{L}_n \in \mathbb{R}^{n(n+1)/2 \times n^2}$. The $\nu(i,j)$-th entry ($i \geq j$) of $\text{vech}(\boldsymbol{A})$ is obtained from the $\mu(i,j)$-th entry of $\text{vec}(\boldsymbol{A})$.
\begin{equation}(\boldsymbol{L}_n)_{\nu(i,j),\, \mu(i,j)} = 1 \quad (i \geq j) \label{eq:13-11-3}\end{equation}
All other entries are zero.
We compute $\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A})$. The entry at position $\nu(i,j)$ ($i \geq j$) is
\begin{equation}[\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A})]_{\nu(i,j)} = \sum_{k=0}^{n^2-1} (\boldsymbol{L}_n)_{\nu(i,j),\, k} [\text{vec}(\boldsymbol{A})]_k \label{eq:13-11-4}\end{equation}
$\boldsymbol{L}_n$ has exactly one 1 per row. By $\eqref{eq:13-11-3}$, row $\nu(i,j)$ has a 1 only in column $\mu(i,j)$.
\begin{equation}[\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A})]_{\nu(i,j)} = [\text{vec}(\boldsymbol{A})]_{\mu(i,j)} \label{eq:13-11-5}\end{equation}
By $\eqref{eq:13-11-2}$, $[\text{vec}(\boldsymbol{A})]_{\mu(i,j)} = [\text{vec}(\boldsymbol{A})]_{i + nj} = A_{ij}$.
\begin{equation}[\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A})]_{\nu(i,j)} = A_{ij} \label{eq:13-11-6}\end{equation}
By the definition in $\eqref{eq:13-10-1}$, $[\text{vech}(\boldsymbol{A})]_{\nu(i,j)} = A_{ij}$. Therefore:
\begin{equation}[\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A})]_{\nu(i,j)} = [\text{vech}(\boldsymbol{A})]_{\nu(i,j)} \label{eq:13-11-7}\end{equation}
Since $\eqref{eq:13-11-7}$ holds for all $(i,j)$ ($i \geq j$), we obtain the final result.
\begin{equation}\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A}) = \text{vech}(\boldsymbol{A}) \label{eq:13-11-8}\end{equation}
13.10 Relationship Between Elimination and Duplication Matrices
Proof
Consider an arbitrary symmetric matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$.
From 13.8, $\eqref{eq:13-10-9}$, the property of the duplication matrix is
\begin{equation}\boldsymbol{D}_n\,\text{vech}(\boldsymbol{A}) = \text{vec}(\boldsymbol{A}) \label{eq:13-12-1}\end{equation}
From 13.9, $\eqref{eq:13-11-8}$, the property of the elimination matrix is
\begin{equation}\boldsymbol{L}_n\,\text{vec}(\boldsymbol{A}) = \text{vech}(\boldsymbol{A}) \label{eq:13-12-2}\end{equation}
Left-multiplying both sides of $\eqref{eq:13-12-1}$ by $\boldsymbol{L}_n$:
\begin{equation}\boldsymbol{L}_n \boldsymbol{D}_n\,\text{vech}(\boldsymbol{A}) = \boldsymbol{L}_n\,\text{vec}(\boldsymbol{A}) \label{eq:13-12-3}\end{equation}
Substituting $\eqref{eq:13-12-2}$ into the right-hand side of $\eqref{eq:13-12-3}$:
\begin{equation}\boldsymbol{L}_n \boldsymbol{D}_n\,\text{vech}(\boldsymbol{A}) = \text{vech}(\boldsymbol{A}) \label{eq:13-12-4}\end{equation}
Rearranging $\eqref{eq:13-12-4}$:
\begin{equation}(\boldsymbol{L}_n \boldsymbol{D}_n - \boldsymbol{I}_{n(n+1)/2})\,\text{vech}(\boldsymbol{A}) = \boldsymbol{0} \label{eq:13-12-5}\end{equation}
Since $\text{vech}(\boldsymbol{A})$ corresponds to an arbitrary symmetric matrix, it can be any vector in $\mathbb{R}^{n(n+1)/2}$.
For $\eqref{eq:13-12-5}$ to hold for all $\text{vech}(\boldsymbol{A})$, the coefficient matrix must be the zero matrix.
\begin{equation}\boldsymbol{L}_n \boldsymbol{D}_n - \boldsymbol{I}_{n(n+1)/2} = \boldsymbol{O} \label{eq:13-12-6}\end{equation}
Rearranging $\eqref{eq:13-12-6}$, we obtain the final result.
\begin{equation}\boldsymbol{L}_n \boldsymbol{D}_n = \boldsymbol{I}_{n(n+1)/2} \label{eq:13-12-7}\end{equation}
13.11 Derivative of Vectorization
Proof
Consider a matrix $\boldsymbol{X} \in \mathbb{R}^{m \times n}$. $\text{vec}(\boldsymbol{X})$ is an $mn$-dimensional vector.
Let the entries of $\text{vec}(\boldsymbol{X})$ be $y_k$ ($k = 0, \ldots, mn-1$). By the definition of the vec operator, $y_k$ corresponds to an entry of $\boldsymbol{X}$ as follows:
\begin{equation}y_k = X_{ij} \quad \text{where } k = i + mj \label{eq:13-13-1}\end{equation}
Similarly, let the entries of $\text{vec}(\boldsymbol{X})$ as independent variables be $x_l$ ($l = 0, \ldots, mn-1$).
\begin{equation}x_l = X_{pq} \quad \text{where } l = p + mq \label{eq:13-13-2}\end{equation}
From $\eqref{eq:13-13-1}$ and $\eqref{eq:13-13-2}$, $y_k = x_k$. That is, each entry of $\text{vec}(\boldsymbol{X})$ is the identity map of itself as a variable.
\begin{equation}y_k = x_k \quad (k = 0, \ldots, mn-1) \label{eq:13-13-3}\end{equation}
We differentiate $y_k$ with respect to $x_l$. By $\eqref{eq:13-13-3}$:
\begin{equation}\frac{\partial y_k}{\partial x_l} = \frac{\partial x_k}{\partial x_l} \label{eq:13-13-4}\end{equation}
Since $x_k$ and $x_l$ are independent variables, the partial derivative is the Kronecker delta.
\begin{equation}\frac{\partial x_k}{\partial x_l} = \delta_{kl} = \begin{cases} 1 & (k = l) \\ 0 & (k \neq l) \end{cases} \label{eq:13-13-5}\end{equation}
Substituting $\eqref{eq:13-13-5}$ into $\eqref{eq:13-13-4}$:
\begin{equation}\frac{\partial y_k}{\partial x_l} = \delta_{kl} \label{eq:13-13-6}\end{equation}
The $(k,l)$ entry of the derivative matrix $\displaystyle\frac{\partial\,\text{vec}(\boldsymbol{X})}{\partial\,\text{vec}(\boldsymbol{X})^\top}$ is $\displaystyle\frac{\partial y_k}{\partial x_l}$.
\begin{equation}\left(\frac{\partial\,\text{vec}(\boldsymbol{X})}{\partial\,\text{vec}(\boldsymbol{X})^\top}\right)_{kl} = \frac{\partial y_k}{\partial x_l} = \delta_{kl} \label{eq:13-13-7}\end{equation}
$\delta_{kl}$ is the $(k,l)$ entry of the identity matrix $\boldsymbol{I}_{mn}$.
\begin{equation}(\boldsymbol{I}_{mn})_{kl} = \delta_{kl} \label{eq:13-13-8}\end{equation}
From $\eqref{eq:13-13-7}$ and $\eqref{eq:13-13-8}$, the entries coincide for all $(k,l)$, so the matrices are equal.
\begin{equation}\frac{\partial\,\text{vec}(\boldsymbol{X})}{\partial\,\text{vec}(\boldsymbol{X})^\top} = \boldsymbol{I}_{mn} \label{eq:13-13-9}\end{equation}
13.2 Gradients via Cholesky Decomposition
13.12 Gradient of the Cholesky Decomposition
Proof
We seek the gradient of the loss function $\mathcal{L}$ with respect to $\boldsymbol{A}$ via the Cholesky decomposition. By the chain rule:
\begin{equation}\frac{\partial \mathcal{L}}{\partial A_{ij}} = \sum_{k,l} \frac{\partial \mathcal{L}}{\partial L_{kl}} \frac{\partial L_{kl}}{\partial A_{ij}} \label{eq:13-14-1}\end{equation}
We differentiate $\boldsymbol{A} = \boldsymbol{L}\boldsymbol{L}^\top$. Taking differentials of both sides:
\begin{equation}d\boldsymbol{A} = (d\boldsymbol{L})\,\boldsymbol{L}^\top + \boldsymbol{L}\,(d\boldsymbol{L})^\top \label{eq:13-14-2}\end{equation}
Setting $\boldsymbol{S} = \boldsymbol{L}^{-1}\,(d\boldsymbol{A})\,\boldsymbol{L}^{-\top}$ and multiplying $\eqref{eq:13-14-2}$ on the left by $\boldsymbol{L}^{-1}$ and on the right by $\boldsymbol{L}^{-\top}$:
\begin{equation}\boldsymbol{S} = \boldsymbol{L}^{-1}(d\boldsymbol{L}) + (\boldsymbol{L}^{-1}(d\boldsymbol{L}))^\top \label{eq:13-14-3}\end{equation}
$\boldsymbol{\Omega} = \boldsymbol{L}^{-1}(d\boldsymbol{L})$ is lower triangular (since both $\boldsymbol{L}^{-1}$ and $d\boldsymbol{L}$ are lower triangular). From $\eqref{eq:13-14-3}$, $\boldsymbol{S} = \boldsymbol{\Omega} + \boldsymbol{\Omega}^\top$, so
\begin{equation}\boldsymbol{\Omega} = \text{tril}(\boldsymbol{S}) - \tfrac{1}{2}\text{diag}(\boldsymbol{S}) \label{eq:13-14-4}\end{equation}
Here $\text{tril}$ extracts the lower triangular part (including the diagonal) and $\text{diag}$ extracts only the diagonal entries. Since $\boldsymbol{S}$ is symmetric, the diagonal entries of $\text{tril}(\boldsymbol{S})$ are $S_{ii}$, while the diagonal entries of $\boldsymbol{\Omega} + \boldsymbol{\Omega}^\top$ are $2\Omega_{ii}$, giving $\Omega_{ii} = \frac{1}{2}S_{ii}$, which confirms $\eqref{eq:13-14-4}$.
Using $d\boldsymbol{L} = \boldsymbol{L}\boldsymbol{\Omega}$, the differential of the scalar loss $\mathcal{L}$ is
\begin{equation}d\mathcal{L} = \text{tr}\!\bigl(\bar{\boldsymbol{L}}^\top\, d\boldsymbol{L}\bigr) = \text{tr}\!\bigl(\bar{\boldsymbol{L}}^\top \boldsymbol{L}\boldsymbol{\Omega}\bigr) \label{eq:13-14-5}\end{equation}
Setting $\boldsymbol{\Phi} = \boldsymbol{L}^\top \bar{\boldsymbol{L}}$. Since $\boldsymbol{\Omega}$ is lower triangular, $\text{tr}(\boldsymbol{\Phi}\boldsymbol{\Omega}) = \text{tr}(\text{tril}(\boldsymbol{\Phi})\,\boldsymbol{\Omega})$. Substituting $\eqref{eq:13-14-4}$:
\begin{equation}d\mathcal{L} = \text{tr}\!\bigl(\text{tril}(\boldsymbol{\Phi})\,(\text{tril}(\boldsymbol{S}) - \tfrac{1}{2}\text{diag}(\boldsymbol{S}))\bigr) \label{eq:13-14-6}\end{equation}
Substituting $\boldsymbol{S} = \boldsymbol{L}^{-1}\,(d\boldsymbol{A})\,\boldsymbol{L}^{-\top}$ and rearranging into the form $d\mathcal{L} = \text{tr}\!\bigl(\bar{\boldsymbol{A}}^\top\,d\boldsymbol{A}\bigr)$, for symmetric $d\boldsymbol{A}$:
\begin{equation}\bar{\boldsymbol{A}} = \boldsymbol{L}^{-\top}\,\text{tril}\!\bigl(\boldsymbol{L}^\top \bar{\boldsymbol{L}}\bigr)\,\boldsymbol{L}^{-1} \label{eq:13-14-7}\end{equation}
Since $\boldsymbol{A}$ is symmetric, contributions to diagonal entries are double-counted. Applying the diagonal correction, the final formula is:
\begin{equation}\bar{\boldsymbol{A}} = \boldsymbol{L}^{-\top}\!\left(\text{tril}(\boldsymbol{\Phi}) - \tfrac{1}{2}\text{diag}(\boldsymbol{\Phi})\right)\boldsymbol{L}^{-1} \label{eq:13-14-7b}\end{equation}
References: Murray, I. (2016). Differentiation of the Cholesky decomposition. arXiv:1602.07527.
13.13 Derivative of the Log-Determinant via Cholesky Decomposition
Proof
For a symmetric positive definite matrix $\boldsymbol{A} = \boldsymbol{L}\boldsymbol{L}^\top$, we express the determinant in terms of the Cholesky factor.
\begin{equation}|\boldsymbol{A}| = |\boldsymbol{L}\boldsymbol{L}^\top| = |\boldsymbol{L}|^2 \label{eq:13-15-1}\end{equation}
Since $\boldsymbol{L}$ is lower triangular, $|\boldsymbol{L}| = \prod_{i=0}^{n-1} L_{ii}$. Taking the logarithm of $\eqref{eq:13-15-1}$:
\begin{equation}\log|\boldsymbol{A}| = 2\log|\boldsymbol{L}| = 2\sum_{i=0}^{n-1} \log L_{ii} \label{eq:13-15-2}\end{equation}
Differentiating $\eqref{eq:13-15-2}$ with respect to $L_{ii}$:
\begin{equation}\frac{\partial \log|\boldsymbol{A}|}{\partial L_{ii}} = \frac{2}{L_{ii}} \label{eq:13-15-3}\end{equation}
For off-diagonal entries:
\begin{equation}\frac{\partial \log|\boldsymbol{A}|}{\partial L_{ij}} = 0 \quad (i \neq j) \label{eq:13-15-4}\end{equation}
Therefore, the gradient with respect to $\boldsymbol{L}$ is
\begin{equation}\bar{\boldsymbol{L}} = \frac{\partial \log|\boldsymbol{A}|}{\partial \boldsymbol{L}} = 2\,\text{diag}(L_{00}^{-1}, \ldots, L_{n-1,n-1}^{-1}) \label{eq:13-15-5}\end{equation}
Using the result from 13.12, we compute the gradient with respect to $\boldsymbol{A}$. We find $\boldsymbol{\Phi} = \boldsymbol{L}^\top \bar{\boldsymbol{L}}$. Since $\bar{\boldsymbol{L}}$ is diagonal:
\begin{equation}\boldsymbol{\Phi} = \boldsymbol{L}^\top \cdot 2\,\text{diag}(L_{ii}^{-1}) = 2\,\boldsymbol{L}^\top\,\text{diag}(L_{ii}^{-1}) \label{eq:13-15-6}\end{equation}
Since $\boldsymbol{L}^\top$ is upper triangular and $\text{diag}(L_{ii}^{-1})$ is diagonal, $\boldsymbol{\Phi}$ is upper triangular. Hence $\text{tril}(\boldsymbol{\Phi})$ retains only the diagonal entries.
\begin{equation}\text{tril}(\boldsymbol{\Phi})_{ii} = \Phi_{ii} = (L^\top)_{ii} \cdot 2L_{ii}^{-1} = L_{ii} \cdot 2L_{ii}^{-1} = 2 \label{eq:13-15-7}\end{equation}
Applying the diagonal correction from $\eqref{eq:13-14-7b}$: $\text{tril}(\boldsymbol{\Phi}) - \frac{1}{2}\text{diag}(\boldsymbol{\Phi})$. Since the diagonal of $\text{tril}(\boldsymbol{\Phi})$ is $2$, we get $2 - \frac{1}{2} \cdot 2 = 1$. That is:
\begin{equation}\text{tril}(\boldsymbol{\Phi}) - \tfrac{1}{2}\text{diag}(\boldsymbol{\Phi}) = \boldsymbol{I}_n \label{eq:13-15-8}\end{equation}
Substituting into the formula $\eqref{eq:13-14-7b}$ from 13.12:
\begin{equation}\frac{\partial \log|\boldsymbol{A}|}{\partial \boldsymbol{A}} = \boldsymbol{L}^{-\top}\,\boldsymbol{I}_n\,\boldsymbol{L}^{-1} = \boldsymbol{L}^{-\top}\boldsymbol{L}^{-1} = (\boldsymbol{L}\boldsymbol{L}^\top)^{-1} = \boldsymbol{A}^{-1} \label{eq:13-15-9}\end{equation}
Since $\boldsymbol{A}$ is symmetric, $\boldsymbol{A}^{-\top} = \boldsymbol{A}^{-1}$, so
\begin{equation}\frac{\partial \log|\boldsymbol{A}|}{\partial \boldsymbol{A}} = \boldsymbol{A}^{-\top} \label{eq:13-15-10}\end{equation}
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
- Matrix calculus - Wikipedia