Proofs Chapter 3: Vector-by-Vector Derivatives
Jacobian Matrices from Component-wise Differentiation
This chapter proves formulas for differentiating vector-valued functions with respect to vectors to obtain Jacobian matrices. The Jacobian matrix is fundamental in multivariate analysis: gradient propagation between layers of neural networks, probability density transformation via the change-of-variables formula, and kinematics Jacobians in robotics. We derive the Jacobian matrices for identity, linear, and affine transforms from component-wise computation.
Prerequisites: Chapter 2 (Scalar by Vector Derivatives). Chapters using these results: Chapter 4 (Basic Matrix Derivative Formulas), Chapter 14 (Matrix Chain Rule).
Unless stated otherwise, the formulas in this chapter hold under the following conditions:
- All formulas use the denominator layout convention
- Differentiating a vector $\boldsymbol{y} \in \mathbb{R}^M$ with respect to a vector $\boldsymbol{x} \in \mathbb{R}^N$ yields the Jacobian matrix $\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \in \mathbb{R}^{N \times M}$
- Functions are defined on a differentiable open set
3.1 Identity Transform
Proof
In the denominator layout, differentiating a vector $\boldsymbol{y} \in \mathbb{R}^M$ with respect to a vector $\boldsymbol{x} \in \mathbb{R}^N$ yields a Jacobian matrix (an $N \times M$ matrix). Its $(i, j)$ entry is $\displaystyle\frac{\partial y_j}{\partial x_i}$.
Write $\boldsymbol{x}$ in components.
\begin{equation} \boldsymbol{x} = \begin{pmatrix} x_0 \\ x_1 \\ \vdots \\ x_{N-1} \end{pmatrix} \label{eq:3-1-1} \end{equation}
Compute the $(i, j)$ entry of the Jacobian matrix. For the identity transform, $y_j = x_j$.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \frac{\partial x_j}{\partial x_i} \label{eq:3-1-2} \end{equation}
We consider two cases depending on whether $x_j$ is the same variable as $x_i$ ($i = j$) or independent ($i \neq j$).
\begin{equation} \frac{\partial x_j}{\partial x_i} = \begin{cases} 1, & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-1-3} \end{equation}
When $i = j$, $\displaystyle\frac{\partial x_i}{\partial x_i} = 1$. When $i \neq j$, $x_j$ does not depend on $x_i$, so the partial derivative is $0$.
The result of \eqref{eq:3-1-3} is precisely the definition of the Kronecker delta $\delta_{ij}$.
\begin{equation} \frac{\partial x_j}{\partial x_i} = \delta_{ij} \label{eq:3-1-4} \end{equation}
Write out the Jacobian matrix explicitly. Each entry is defined by a partial derivative.
\begin{equation} \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \begin{pmatrix} \displaystyle\frac{\partial x_0}{\partial x_0} & \displaystyle\frac{\partial x_1}{\partial x_0} & \cdots & \displaystyle\frac{\partial x_{N-1}}{\partial x_0} \\[0.5em] \displaystyle\frac{\partial x_0}{\partial x_1} & \displaystyle\frac{\partial x_1}{\partial x_1} & \cdots & \displaystyle\frac{\partial x_{N-1}}{\partial x_1} \\[0.5em] \vdots & \vdots & \ddots & \vdots \\[0.5em] \displaystyle\frac{\partial x_0}{\partial x_{N-1}} & \displaystyle\frac{\partial x_1}{\partial x_{N-1}} & \cdots & \displaystyle\frac{\partial x_{N-1}}{\partial x_{N-1}} \end{pmatrix} \label{eq:3-1-5} \end{equation}
Substituting the result of \eqref{eq:3-1-4} into \eqref{eq:3-1-5}, only the diagonal entries are 1 and all off-diagonal entries are 0.
\begin{equation} \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix} \label{eq:3-1-6} \end{equation}
This matrix is the $N \times N$ identity matrix $\boldsymbol{I}$. Therefore, the final result is:
\begin{equation} \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \boldsymbol{I} \label{eq:3-1-7} \end{equation}
3.2 Linear Transform
Proof
Let $\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x}$, where $\boldsymbol{y} \in \mathbb{R}^M$.
Write the $j$-th component of $\boldsymbol{y}$ using the definition of matrix-vector multiplication.
\begin{equation} y_j = (\boldsymbol{A}\boldsymbol{x})_j \label{eq:3-2-1} \end{equation}
The $j$-th component of the matrix-vector product is the inner product of the $j$-th row of $\boldsymbol{A}$ and $\boldsymbol{x}$.
\begin{equation} y_j = \sum_{k=0}^{N-1} A_{jk} x_k \label{eq:3-2-2} \end{equation}
Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $y_j$ with respect to $x_i$.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \frac{\partial}{\partial x_i} \sum_{k=0}^{N-1} A_{jk} x_k \label{eq:3-2-3} \end{equation}
Since differentiation is a linear operator, we can interchange it with the summation. Since $A_{jk}$ are constants, they factor out of the derivative.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \sum_{k=0}^{N-1} A_{jk} \frac{\partial x_k}{\partial x_i} \label{eq:3-2-4} \end{equation}
By Formula (3.1), $\displaystyle\frac{\partial x_k}{\partial x_i} = \delta_{ki}$.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \sum_{k=0}^{N-1} A_{jk} \delta_{ki} \label{eq:3-2-5} \end{equation}
Using the sifting property of the Kronecker delta: $\delta_{ki} = 1$ only when $k = i$, so only the $k = i$ term survives in the sum.
\begin{equation} \frac{\partial y_j}{\partial x_i} = A_{ji} \label{eq:3-2-6} \end{equation}
Write out the Jacobian matrix explicitly. The $(i, j)$ entry is $A_{ji}$.
\begin{equation} \frac{\partial (\boldsymbol{A}\boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} A_{00} & A_{10} & \cdots & A_{(M-1)0} \\ A_{01} & A_{11} & \cdots & A_{(M-1)1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0(N-1)} & A_{1(N-1)} & \cdots & A_{(M-1)(N-1)} \end{pmatrix} \label{eq:3-2-7} \end{equation}
This matrix is the transpose $\boldsymbol{A}^\top$ of $\boldsymbol{A}$. By the definition of the transpose $(\boldsymbol{A}^\top)_{ij} = A_{ji}$, equation \eqref{eq:3-2-6} equals $(\boldsymbol{A}^\top)_{ij}$.
\begin{equation} \frac{\partial (\boldsymbol{A}\boldsymbol{x})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top \label{eq:3-2-8} \end{equation}
This is an $N \times M$ matrix.
3.3 Constant Vector
Proof
Write the constant vector $\boldsymbol{a}$ in components. Each component $a_j$ is a constant.
\begin{equation} \boldsymbol{a} = \begin{pmatrix} a_0 \\ a_1 \\ \vdots \\ a_{M-1} \end{pmatrix} \label{eq:3-3-1} \end{equation}
Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $a_j$ with respect to $x_i$.
\begin{equation} \left(\frac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}}\right)_{ij} = \frac{\partial a_j}{\partial x_i} \label{eq:3-3-2} \end{equation}
Since $a_j$ is a constant and does not depend on $x_i$, the partial derivative of a constant is $0$.
\begin{equation} \frac{\partial a_j}{\partial x_i} = 0 \label{eq:3-3-3} \end{equation}
Equation \eqref{eq:3-3-3} holds for all pairs $(i, j)$. Therefore, all entries of the Jacobian matrix are 0.
\begin{equation} \frac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}} = \begin{pmatrix} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \label{eq:3-3-4} \end{equation}
This matrix is the $N \times M$ zero matrix $\boldsymbol{O}$.
\begin{equation} \frac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}} = \boldsymbol{O} \label{eq:3-3-5} \end{equation}
3.4 Affine Transform
Proof
Let $\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x} + \boldsymbol{b}$, where $\boldsymbol{y} \in \mathbb{R}^M$.
Write the $j$-th component of $\boldsymbol{y}$.
\begin{equation} y_j = (\boldsymbol{A}\boldsymbol{x})_j + b_j \label{eq:3-4-1} \end{equation}
Expand using the definition of matrix-vector multiplication.
\begin{equation} y_j = \sum_{k=0}^{N-1} A_{jk} x_k + b_j \label{eq:3-4-2} \end{equation}
Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $y_j$ with respect to $x_i$.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \frac{\partial}{\partial x_i} \left( \sum_{k=0}^{N-1} A_{jk} x_k + b_j \right) \label{eq:3-4-3} \end{equation}
The derivative of a sum is the sum of the derivatives.
\begin{equation} \frac{\partial y_j}{\partial x_i} = \frac{\partial}{\partial x_i} \left( \sum_{k=0}^{N-1} A_{jk} x_k \right) + \frac{\partial b_j}{\partial x_i} \label{eq:3-4-4} \end{equation}
Since $b_j$ is a constant, its partial derivative is $0$.
\begin{equation} \frac{\partial b_j}{\partial x_i} = 0 \label{eq:3-4-5} \end{equation}
The first term gives $A_{ji}$ by the same calculation as Formula (3.2).
\begin{equation} \frac{\partial}{\partial x_i} \left( \sum_{k=0}^{N-1} A_{jk} x_k \right) = A_{ji} \label{eq:3-4-6} \end{equation}
Substituting \eqref{eq:3-4-5} and \eqref{eq:3-4-6} into \eqref{eq:3-4-4}:
\begin{equation} \frac{\partial y_j}{\partial x_i} = A_{ji} + 0 = A_{ji} \label{eq:3-4-7} \end{equation}
As in Formula (3.2), since the $(i, j)$ entry is $A_{ji}$, the Jacobian matrix equals $\boldsymbol{A}^\top$.
\begin{equation} \frac{\partial (\boldsymbol{A}\boldsymbol{x} + \boldsymbol{b})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top \label{eq:3-4-8} \end{equation}
3.5 Linear Transform with Transpose
Proof
Let $\boldsymbol{y}^\top = \boldsymbol{x}^\top \boldsymbol{A}$. Here $\boldsymbol{y}^\top$ is a $1 \times M$ row vector.
Write the $j$-th component of $\boldsymbol{y}^\top$. By the definition of row vector times matrix multiplication:
\begin{equation} (\boldsymbol{x}^\top \boldsymbol{A})_j = \sum_{k=0}^{N-1} x_k A_{kj} \label{eq:3-5-1} \end{equation}
This is the inner product of $\boldsymbol{x}$ and the $j$-th column of $\boldsymbol{A}$.
Let $y_j = (\boldsymbol{x}^\top \boldsymbol{A})_j$. Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \displaystyle\frac{\partial}{\partial x_i} \sum_{k=0}^{N-1} x_k A_{kj} \label{eq:3-5-2} \end{equation}
Interchange differentiation and summation. Since $A_{kj}$ are constants:
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \sum_{k=0}^{N-1} A_{kj} \displaystyle\frac{\partial x_k}{\partial x_i} \label{eq:3-5-3} \end{equation}
By Formula (3.1), $\displaystyle\frac{\partial x_k}{\partial x_i} = \delta_{ki}$. Substituting into \eqref{eq:3-5-3}:
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \sum_{k=0}^{N-1} A_{kj} \delta_{ki} \label{eq:3-5-4} \end{equation}
Using the sifting property of the Kronecker delta, only the $k = i$ term survives:
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = A_{ij} \label{eq:3-5-5} \end{equation}
Since the $(i, j)$ entry of the Jacobian matrix is $A_{ij}$, the Jacobian matrix is $\boldsymbol{A}$ itself. Therefore:
\begin{equation} \displaystyle\frac{\partial (\boldsymbol{x}^\top \boldsymbol{A})}{\partial \boldsymbol{x}} = \boldsymbol{A} \label{eq:3-5-6} \end{equation}
This is an $N \times M$ matrix.
3.6 Sum/Difference Rule (Vector Version)
Proof
Let $\boldsymbol{y}(\boldsymbol{x}) = \boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{g}(\boldsymbol{x})$, where $\boldsymbol{y} \in \mathbb{R}^M$.
Write the $j$-th component of $\boldsymbol{y}$. Since vector addition/subtraction is performed component-wise:
\begin{equation} y_j(\boldsymbol{x}) = f_j(\boldsymbol{x}) \pm g_j(\boldsymbol{x}) \label{eq:3-6-1} \end{equation}
Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \displaystyle\frac{\partial}{\partial x_i} (f_j(\boldsymbol{x}) \pm g_j(\boldsymbol{x})) \label{eq:3-6-2} \end{equation}
Apply the sum/difference rule for scalar functions. The derivative of a sum/difference is the sum/difference of the derivatives:
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \displaystyle\frac{\partial f_j(\boldsymbol{x})}{\partial x_i} \pm \displaystyle\frac{\partial g_j(\boldsymbol{x})}{\partial x_i} \label{eq:3-6-3} \end{equation}
Interpret each term on the right-hand side as a Jacobian matrix entry.
\begin{equation} \displaystyle\frac{\partial f_j(\boldsymbol{x})}{\partial x_i} = \left( \displaystyle\frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij}, \quad \displaystyle\frac{\partial g_j(\boldsymbol{x})}{\partial x_i} = \left( \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-6-4} \end{equation}
Combining \eqref{eq:3-6-3} and \eqref{eq:3-6-4}:
\begin{equation} \left( \displaystyle\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right)_{ij} = \left( \displaystyle\frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \pm \left( \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-6-5} \end{equation}
This holds for all pairs $(i, j)$. Since matrix addition/subtraction is performed entry-wise:
\begin{equation} \displaystyle\frac{\partial (\boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = \displaystyle\frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \pm \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \label{eq:3-6-6} \end{equation}
3.7 Product Rule (Scalar × Vector)
Proof
Let $\boldsymbol{y}(\boldsymbol{x}) = f(\boldsymbol{x}) \boldsymbol{g}(\boldsymbol{x})$. This is a product of a scalar and a vector, so $\boldsymbol{y} \in \mathbb{R}^M$.
Write the $j$-th component of $\boldsymbol{y}$. The scalar-vector product acts on each component:
\begin{equation} y_j(\boldsymbol{x}) = f(\boldsymbol{x}) \cdot g_j(\boldsymbol{x}) \label{eq:3-7-1} \end{equation}
Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \displaystyle\frac{\partial}{\partial x_i} (f(\boldsymbol{x}) \cdot g_j(\boldsymbol{x})) \label{eq:3-7-2} \end{equation}
Applying the scalar product rule (Formula 2.9):
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = f(\boldsymbol{x}) \displaystyle\frac{\partial g_j(\boldsymbol{x})}{\partial x_i} + g_j(\boldsymbol{x}) \displaystyle\frac{\partial f(\boldsymbol{x})}{\partial x_i} \label{eq:3-7-3} \end{equation}
Interpret the first term on the right-hand side.
\begin{equation} f(\boldsymbol{x}) \displaystyle\frac{\partial g_j(\boldsymbol{x})}{\partial x_i} = f(\boldsymbol{x}) \left( \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-7-4} \end{equation}
This is the product of $f(\boldsymbol{x})$ and the $(i, j)$ entry of the Jacobian of $\boldsymbol{g}$.
For the second term: $\displaystyle\frac{\partial f(\boldsymbol{x})}{\partial x_i}$ is the $i$-th component of the gradient vector $\displaystyle\frac{\partial f}{\partial \boldsymbol{x}} \in \mathbb{R}^N$, and $g_j(\boldsymbol{x})$ is the $j$-th component of $\boldsymbol{g}(\boldsymbol{x}) \in \mathbb{R}^M$. The outer product of the $N$-dimensional column vector $\displaystyle\frac{\partial f}{\partial \boldsymbol{x}}$ and the $M$-dimensional row vector $\boldsymbol{g}^\top$ is an $N \times M$ matrix whose $(i, j)$ entry is:
\begin{equation} \left( \displaystyle\frac{\partial f}{\partial \boldsymbol{x}} \boldsymbol{g}^\top \right)_{ij} = \displaystyle\frac{\partial f}{\partial x_i} \cdot g_j \label{eq:3-7-5} \end{equation}
Combining \eqref{eq:3-7-3}, \eqref{eq:3-7-4}, and \eqref{eq:3-7-5}:
\begin{equation} \left( \displaystyle\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right)_{ij} = f(\boldsymbol{x}) \left( \displaystyle\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}} \right)_{ij} + \left( \displaystyle\frac{\partial f}{\partial \boldsymbol{x}} \boldsymbol{g}^\top \right)_{ij} \label{eq:3-7-6} \end{equation}
Since this holds for all $(i, j)$, in matrix form:
\begin{equation} \displaystyle\frac{\partial (f(\boldsymbol{x}) \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = f(\boldsymbol{x}) \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} + \displaystyle\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} \boldsymbol{g}(\boldsymbol{x})^\top \label{eq:3-7-7} \end{equation}
3.8 Element-wise Square of a Vector
Proof
Let $\boldsymbol{y} = \boldsymbol{x} \odot \boldsymbol{x}$. By the definition of the Hadamard product, the $j$-th component of $\boldsymbol{y}$ is:
\begin{equation} y_j = x_j \cdot x_j = x_j^2 \label{eq:3-8-1} \end{equation}
Writing $\boldsymbol{y}$ in components:
\begin{equation} \boldsymbol{y} = \boldsymbol{x} \odot \boldsymbol{x} = \begin{pmatrix} x_0^2 \\ x_1^2 \\ \vdots \\ x_{N-1}^2 \end{pmatrix} \label{eq:3-8-2} \end{equation}
Differentiate $y_j = x_j^2$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \displaystyle\frac{\partial (x_j^2)}{\partial x_i} \label{eq:3-8-3} \end{equation}
When $i = j$, differentiate $x_j^2$ with respect to $x_j$. By the power rule (1.18):
\begin{equation} \displaystyle\frac{\partial (x_j^2)}{\partial x_j} = 2x_j \label{eq:3-8-4} \end{equation}
When $i \neq j$, $x_j^2$ does not depend on $x_i$:
\begin{equation} \displaystyle\frac{\partial (x_j^2)}{\partial x_i} = 0 \label{eq:3-8-5} \end{equation}
Combining \eqref{eq:3-8-4} and \eqref{eq:3-8-5}:
\begin{equation} \displaystyle\frac{\partial y_j}{\partial x_i} = \begin{cases} 2x_j, & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-8-6} \end{equation}
Writing out the Jacobian matrix, only the diagonal entries are nonzero.
\begin{equation} \displaystyle\frac{\partial (\boldsymbol{x} \odot \boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} 2x_0 & 0 & 0 & \cdots & 0 \\ 0 & 2x_1 & 0 & \cdots & 0 \\ 0 & 0 & 2x_2 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 2x_{N-1} \end{pmatrix} \label{eq:3-8-7} \end{equation}
Expressing this as a diagonal matrix, where $\text{diag}(\boldsymbol{x})$ has diagonal entries $x_0, x_1, \ldots, x_{N-1}$:
\begin{equation} \displaystyle\frac{\partial (\boldsymbol{x} \odot \boldsymbol{x})}{\partial \boldsymbol{x}} = 2\,\text{diag}(\boldsymbol{x}) \label{eq:3-8-8} \end{equation}
3.9 Derivative of the Cross Product
Proof
Let $\boldsymbol{h}(t) = \boldsymbol{f}(t) \times \boldsymbol{g}(t)$. Write the components using the definition of the cross product, indexing components by $1, 2, 3$.
\begin{equation} \boldsymbol{h} = \boldsymbol{f} \times \boldsymbol{g} = \begin{pmatrix} f_2 g_3 - f_3 g_2 \\ f_3 g_1 - f_1 g_3 \\ f_1 g_2 - f_2 g_1 \end{pmatrix} \label{eq:3-9-1} \end{equation}
Differentiate the first component $h_1 = f_2 g_3 - f_3 g_2$ with respect to $t$.
\begin{equation} \displaystyle\frac{dh_1}{dt} = \displaystyle\frac{d}{dt}(f_2 g_3 - f_3 g_2) \label{eq:3-9-2} \end{equation}
The derivative of a difference is the difference of the derivatives:
\begin{equation} \displaystyle\frac{dh_1}{dt} = \displaystyle\frac{d}{dt}(f_2 g_3) - \displaystyle\frac{d}{dt}(f_3 g_2) \label{eq:3-9-3} \end{equation}
Applying the product rule (1.25) to each term:
\begin{equation} \displaystyle\frac{d}{dt}(f_2 g_3) = \displaystyle\frac{df_2}{dt} g_3 + f_2 \displaystyle\frac{dg_3}{dt}, \quad \displaystyle\frac{d}{dt}(f_3 g_2) = \displaystyle\frac{df_3}{dt} g_2 + f_3 \displaystyle\frac{dg_2}{dt} \label{eq:3-9-4} \end{equation}
Substituting \eqref{eq:3-9-4} into \eqref{eq:3-9-3}:
\begin{equation} \displaystyle\frac{dh_1}{dt} = \displaystyle\frac{df_2}{dt} g_3 + f_2 \displaystyle\frac{dg_3}{dt} - \displaystyle\frac{df_3}{dt} g_2 - f_3 \displaystyle\frac{dg_2}{dt} \label{eq:3-9-5} \end{equation}
Rearrange by grouping terms containing derivatives of $\boldsymbol{f}$ and terms containing derivatives of $\boldsymbol{g}$.
\begin{equation} \displaystyle\frac{dh_1}{dt} = \left( \displaystyle\frac{df_2}{dt} g_3 - \displaystyle\frac{df_3}{dt} g_2 \right) + \left( f_2 \displaystyle\frac{dg_3}{dt} - f_3 \displaystyle\frac{dg_2}{dt} \right) \label{eq:3-9-6} \end{equation}
Compare with the definition of the cross product. The first component of $\displaystyle\frac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}$ is $\displaystyle\frac{df_2}{dt} g_3 - \displaystyle\frac{df_3}{dt} g_2$, and the first component of $\boldsymbol{f} \times \displaystyle\frac{d\boldsymbol{g}}{dt}$ is $f_2 \displaystyle\frac{dg_3}{dt} - f_3 \displaystyle\frac{dg_2}{dt}$. Therefore:
\begin{equation} \displaystyle\frac{dh_1}{dt} = \left( \displaystyle\frac{d\boldsymbol{f}}{dt} \times \boldsymbol{g} \right)_1 + \left( \boldsymbol{f} \times \displaystyle\frac{d\boldsymbol{g}}{dt} \right)_1 \label{eq:3-9-7} \end{equation}
Performing the same calculation for the 2nd and 3rd components, we find that for all $k = 1, 2, 3$:
\begin{equation} \displaystyle\frac{dh_k}{dt} = \left( \displaystyle\frac{d\boldsymbol{f}}{dt} \times \boldsymbol{g} \right)_k + \left( \boldsymbol{f} \times \displaystyle\frac{d\boldsymbol{g}}{dt} \right)_k \label{eq:3-9-8} \end{equation}
Combining as a vector equation:
\begin{equation} \displaystyle\frac{d}{dt}(\boldsymbol{f}(t) \times \boldsymbol{g}(t)) = \displaystyle\frac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}(t) + \boldsymbol{f}(t) \times \displaystyle\frac{d\boldsymbol{g}}{dt} \label{eq:3-9-9} \end{equation}
3.10 Time Derivative of the 2-Norm
Proof
Let $g(t) = \|\boldsymbol{f}(t)\|$. Writing the definition of the 2-norm:
\begin{equation} g(t) = \|\boldsymbol{f}(t)\| = \sqrt{\boldsymbol{f}(t) \cdot \boldsymbol{f}(t)} \label{eq:3-10-1} \end{equation}
Expanding the inner product in components:
\begin{equation} \boldsymbol{f}(t) \cdot \boldsymbol{f}(t) = \sum_{k=0}^{N-1} f_k(t)^2 \label{eq:3-10-2} \end{equation}
For notational convenience, let $h(t) = \boldsymbol{f}(t) \cdot \boldsymbol{f}(t) = \|\boldsymbol{f}(t)\|^2$.
\begin{equation} g(t) = \sqrt{h(t)} = h(t)^{1/2} \label{eq:3-10-3} \end{equation}
Differentiate $g(t)$ with respect to $t$. Applying the chain rule (1.26):
\begin{equation} \displaystyle\frac{dg}{dt} = \displaystyle\frac{d}{dh}(h^{1/2}) \cdot \displaystyle\frac{dh}{dt} \label{eq:3-10-4} \end{equation}
Compute $\displaystyle\frac{d}{dh}(h^{1/2})$. By the power rule (1.19):
\begin{equation} \displaystyle\frac{d}{dh}(h^{1/2}) = \displaystyle\frac{1}{2} h^{-1/2} = \displaystyle\frac{1}{2\sqrt{h}} = \displaystyle\frac{1}{2\|\boldsymbol{f}(t)\|} \label{eq:3-10-5} \end{equation}
Compute $\displaystyle\frac{dh}{dt}$. Since $h = \sum_{k=0}^{N-1} f_k^2$:
\begin{equation} \displaystyle\frac{dh}{dt} = \sum_{k=0}^{N-1} \displaystyle\frac{d}{dt}(f_k^2) \label{eq:3-10-6} \end{equation}
Compute $\displaystyle\frac{d}{dt}(f_k^2)$. By the chain rule (1.26):
\begin{equation} \displaystyle\frac{d}{dt}(f_k^2) = 2 f_k \displaystyle\frac{df_k}{dt} \label{eq:3-10-7} \end{equation}
Substituting \eqref{eq:3-10-7} into \eqref{eq:3-10-6}:
\begin{equation} \displaystyle\frac{dh}{dt} = \sum_{k=0}^{N-1} 2 f_k \displaystyle\frac{df_k}{dt} = 2 \sum_{k=0}^{N-1} f_k \displaystyle\frac{df_k}{dt} \label{eq:3-10-8} \end{equation}
Interpreting this sum as an inner product:
\begin{equation} \displaystyle\frac{dh}{dt} = 2 \left( \boldsymbol{f}(t) \cdot \displaystyle\frac{d\boldsymbol{f}}{dt} \right) \label{eq:3-10-9} \end{equation}
Substituting \eqref{eq:3-10-5} and \eqref{eq:3-10-9} into \eqref{eq:3-10-4}:
\begin{equation} \displaystyle\frac{dg}{dt} = \displaystyle\frac{1}{2\|\boldsymbol{f}(t)\|} \cdot 2 \left( \boldsymbol{f}(t) \cdot \displaystyle\frac{d\boldsymbol{f}}{dt} \right) \label{eq:3-10-10} \end{equation}
The factors of 2 cancel:
\begin{equation} \displaystyle\frac{d}{dt} \|\boldsymbol{f}(t)\| = \displaystyle\frac{\boldsymbol{f}(t) \cdot \displaystyle\frac{d\boldsymbol{f}}{dt}}{\|\boldsymbol{f}(t)\|} \label{eq:3-10-11} \end{equation}
3.11 Element-wise Function Application
Proof
$\boldsymbol{g}(\boldsymbol{x})$ is the result of applying the scalar function $f$ to each component of $\boldsymbol{x}$. The $j$-th component is:
\begin{equation} g_j(\boldsymbol{x}) = f(x_j) \label{eq:3-11-1} \end{equation}
Writing $\boldsymbol{g}$ in components:
\begin{equation} \boldsymbol{g}(\boldsymbol{x}) = \begin{pmatrix} f(x_0) \\ f(x_1) \\ \vdots \\ f(x_{N-1}) \end{pmatrix} \label{eq:3-11-2} \end{equation}
Differentiate $g_j = f(x_j)$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.
\begin{equation} \displaystyle\frac{\partial g_j}{\partial x_i} = \displaystyle\frac{\partial f(x_j)}{\partial x_i} \label{eq:3-11-3} \end{equation}
When $i = j$, differentiate $f(x_j)$ with respect to $x_j$. By the chain rule (1.26):
\begin{equation} \displaystyle\frac{\partial f(x_j)}{\partial x_j} = f'(x_j) \cdot \displaystyle\frac{\partial x_j}{\partial x_j} = f'(x_j) \cdot 1 = f'(x_j) \label{eq:3-11-4} \end{equation}
When $i \neq j$, $f(x_j)$ does not depend on $x_i$ (since $x_j$ and $x_i$ are independent variables):
\begin{equation} \displaystyle\frac{\partial f(x_j)}{\partial x_i} = 0 \label{eq:3-11-5} \end{equation}
Combining \eqref{eq:3-11-4} and \eqref{eq:3-11-5}:
\begin{equation} \displaystyle\frac{\partial g_j}{\partial x_i} = \begin{cases} f'(x_j), & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-11-6} \end{equation}
Writing out the Jacobian matrix, only the diagonal entries are nonzero.
\begin{equation} \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} f'(x_0) & 0 & \cdots & 0 \\ 0 & f'(x_1) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & f'(x_{N-1}) \end{pmatrix} \label{eq:3-11-7} \end{equation}
Expressing this as a diagonal matrix:
\begin{equation} \displaystyle\frac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \text{diag}(f'(x_0), f'(x_1), \ldots, f'(x_{N-1})) \label{eq:3-11-8} \end{equation}
Applied Formulas
Application-specific formulas from this chapter are organized by field:
3.12 Hadamard Product (Element-wise Product)
Proof
The $i$-th component of the Hadamard product is defined as:
\begin{equation} (\boldsymbol{x} \odot \boldsymbol{y})_i = x_i \, y_i \label{eq:3-12-1} \end{equation}
Differentiate with respect to $x_j$. Since $y_i$ is a constant independent of $\boldsymbol{x}$:
\begin{equation} \frac{\partial (x_i \, y_i)}{\partial x_j} = y_i \frac{\partial x_i}{\partial x_j} = y_i \, \delta_{ij} \label{eq:3-12-2} \end{equation}
Since the $(i, j)$ entry of the Jacobian is $y_i \, \delta_{ij}$, it is nonzero only when $i = j$, taking the value $y_i$. Therefore the Jacobian is a diagonal matrix.
\begin{equation} \frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{y}) \label{eq:3-12-3} \end{equation}
3.13 Jacobian of the Softmax Function
($\boldsymbol{p} = \mathrm{softmax}(\boldsymbol{x})$)
Proof
Differentiate the $i$-th softmax output with respect to $x_j$. Let $S = \sum_{k} e^{x_k}$.
\begin{equation} p_i = \frac{e^{x_i}}{S} \label{eq:3-13-1} \end{equation}
Case $i = j$: Apply the quotient rule.
\begin{equation} \frac{\partial p_i}{\partial x_i} = \frac{e^{x_i} S - e^{x_i} e^{x_i}}{S^2} = p_i - p_i^2 = p_i(1 - p_i) \label{eq:3-13-2} \end{equation}
Case $i \neq j$: The numerator $e^{x_i}$ does not depend on $x_j$.
\begin{equation} \frac{\partial p_i}{\partial x_j} = -\frac{e^{x_i} e^{x_j}}{S^2} = -p_i \, p_j \label{eq:3-13-3} \end{equation}
Combining \eqref{eq:3-13-2} and \eqref{eq:3-13-3} using the Kronecker delta:
\begin{equation} \frac{\partial p_i}{\partial x_j} = p_i(\delta_{ij} - p_j) \label{eq:3-13-4} \end{equation}
In matrix form:
\begin{equation} \frac{\partial \boldsymbol{p}}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{p}) - \boldsymbol{p}\boldsymbol{p}^\top \label{eq:3-13-5} \end{equation}
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
- Matrix calculus - Wikipedia