What does it mean to differentiate a vector by a vector?

Differentiating y = f(x) ∈ R^m with respect to x ∈ R^n yields an m×n matrix whose (i, j) entry is ∂y_i/∂x_j. This matrix is called the Jacobian. It plays a central role in backpropagation through neural networks and in the change-of-variables formula for probability densities.

What is the derivative ∂x/∂x of the identity map?

∂x/∂x = I, the n×n identity matrix. The (i, j) entry ∂x_i/∂x_j equals the Kronecker delta δ_{ij}. It acts as the identity element of the chain rule in differential calculus.

What is the derivative of the linear map y = A x?

∂(A x)/∂x = A. The (i, j) entry ∂(Σ_k A_{ik} x_k)/∂x_j = A_{ij} follows directly. The Jacobian does not depend on x and equals the transformation matrix itself—this constancy is a hallmark of linear maps.

What is the derivative of the affine map y = A x + b?

∂(A x + b)/∂x = A. The constant vector b contributes zero, leaving only the linear part. This form appears throughout fully connected layers in neural networks and in coordinate-transformation derivatives.

Proofs Chapter 3: Vector-by-Vector Derivatives

3.1 Identity Transform

Formula: $\displaystyle\dfrac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \boldsymbol{I}$

Conditions: $\boldsymbol{x} \in \mathbb{R}^N$

Proof

In the denominator layout, differentiating a vector $\boldsymbol{y} \in \mathbb{R}^M$ with respect to a vector $\boldsymbol{x} \in \mathbb{R}^N$ yields a Jacobian matrix (an $N \times M$ matrix). Its $(i, j)$ entry is $\displaystyle\dfrac{\partial y_j}{\partial x_i}$.

Write $\boldsymbol{x}$ in components.

\begin{equation} \boldsymbol{x} = \begin{pmatrix} x_0 \\ x_1 \\ \vdots \\ x_{N-1} \end{pmatrix} \label{eq:3-1-1} \end{equation}

Compute the $(i, j)$ entry of the Jacobian matrix. For the identity transform, $y_j = x_j$.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \dfrac{\partial x_j}{\partial x_i} \label{eq:3-1-2} \end{equation}

We consider two cases depending on whether $x_j$ is the same variable as $x_i$ ($i = j$) or independent ($i \neq j$).

\begin{equation} \dfrac{\partial x_j}{\partial x_i} = \begin{cases} 1, & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-1-3} \end{equation}

When $i = j$, $\displaystyle\dfrac{\partial x_i}{\partial x_i} = 1$. When $i \neq j$, $x_j$ does not depend on $x_i$, so the partial derivative is $0$.

The result of \eqref{eq:3-1-3} is precisely the definition of the Kronecker delta $\delta_{ij}$.

\begin{equation} \dfrac{\partial x_j}{\partial x_i} = \delta_{ij} \label{eq:3-1-4} \end{equation}

Write out the Jacobian matrix explicitly. Each entry is defined by a partial derivative.

\begin{equation} \dfrac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \begin{pmatrix} \displaystyle\dfrac{\partial x_0}{\partial x_0} & \displaystyle\dfrac{\partial x_1}{\partial x_0} & \cdots & \displaystyle\dfrac{\partial x_{N-1}}{\partial x_0} \\[0.5em] \displaystyle\dfrac{\partial x_0}{\partial x_1} & \displaystyle\dfrac{\partial x_1}{\partial x_1} & \cdots & \displaystyle\dfrac{\partial x_{N-1}}{\partial x_1} \\[0.5em] \vdots & \vdots & \ddots & \vdots \\[0.5em] \displaystyle\dfrac{\partial x_0}{\partial x_{N-1}} & \displaystyle\dfrac{\partial x_1}{\partial x_{N-1}} & \cdots & \displaystyle\dfrac{\partial x_{N-1}}{\partial x_{N-1}} \end{pmatrix} \label{eq:3-1-5} \end{equation}

Substituting the result of \eqref{eq:3-1-4} into \eqref{eq:3-1-5}, only the diagonal entries are 1 and all off-diagonal entries are 0.

\begin{equation} \dfrac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix} \label{eq:3-1-6} \end{equation}

This matrix is the $N \times N$ identity matrix $\boldsymbol{I}$. Therefore, the final result is:

\begin{equation} \dfrac{\partial \boldsymbol{x}}{\partial \boldsymbol{x}} = \boldsymbol{I} \label{eq:3-1-7} \end{equation}

3.2 Linear Transform

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{A}\boldsymbol{x})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{M \times N}$ is a constant matrix, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Let $\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x}$, where $\boldsymbol{y} \in \mathbb{R}^M$.

Write the $j$-th component of $\boldsymbol{y}$ using the definition of matrix-vector multiplication.

\begin{equation} y_j = (\boldsymbol{A}\boldsymbol{x})_j \label{eq:3-2-1} \end{equation}

The $j$-th component of the matrix-vector product is the inner product of the $j$-th row of $\boldsymbol{A}$ and $\boldsymbol{x}$.

\begin{equation} y_j = \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k \label{eq:3-2-2} \end{equation}

Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $y_j$ with respect to $x_i$.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \dfrac{\partial}{\partial x_i} \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k \label{eq:3-2-3} \end{equation}

Since differentiation is a linear operator, we can interchange it with the summation. Since $A_{jk}$ are constants, they factor out of the derivative.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \displaystyle\sum_{k=0}^{N-1} A_{jk} \dfrac{\partial x_k}{\partial x_i} \label{eq:3-2-4} \end{equation}

By Formula (3.1), $\displaystyle\dfrac{\partial x_k}{\partial x_i} = \delta_{ki}$.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \displaystyle\sum_{k=0}^{N-1} A_{jk} \delta_{ki} \label{eq:3-2-5} \end{equation}

Using the sifting property of the Kronecker delta: $\delta_{ki} = 1$ only when $k = i$, so only the $k = i$ term survives in the sum.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = A_{ji} \label{eq:3-2-6} \end{equation}

Write out the Jacobian matrix explicitly. The $(i, j)$ entry is $A_{ji}$.

\begin{equation} \dfrac{\partial (\boldsymbol{A}\boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} A_{00} & A_{10} & \cdots & A_{(M-1)0} \\ A_{01} & A_{11} & \cdots & A_{(M-1)1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{0(N-1)} & A_{1(N-1)} & \cdots & A_{(M-1)(N-1)} \end{pmatrix} \label{eq:3-2-7} \end{equation}

This matrix is the transpose $\boldsymbol{A}^\top$ of $\boldsymbol{A}$. By the definition of the transpose $(\boldsymbol{A}^\top)_{ij} = A_{ji}$, equation \eqref{eq:3-2-6} equals $(\boldsymbol{A}^\top)_{ij}$.

\begin{equation} \dfrac{\partial (\boldsymbol{A}\boldsymbol{x})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top \label{eq:3-2-8} \end{equation}

This is an $N \times M$ matrix.

3.3 Constant Vector

Formula: $\displaystyle\dfrac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}} = \boldsymbol{O}$

Conditions: $\boldsymbol{a} \in \mathbb{R}^M$ is a constant vector, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Write the constant vector $\boldsymbol{a}$ in components. Each component $a_j$ is a constant.

\begin{equation} \boldsymbol{a} = \begin{pmatrix} a_0 \\ a_1 \\ \vdots \\ a_{M-1} \end{pmatrix} \label{eq:3-3-1} \end{equation}

Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $a_j$ with respect to $x_i$.

\begin{equation} \left(\dfrac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}}\right)_{ij} = \dfrac{\partial a_j}{\partial x_i} \label{eq:3-3-2} \end{equation}

Since $a_j$ is a constant and does not depend on $x_i$, the partial derivative of a constant is $0$.

\begin{equation} \dfrac{\partial a_j}{\partial x_i} = 0 \label{eq:3-3-3} \end{equation}

Equation \eqref{eq:3-3-3} holds for all pairs $(i, j)$. Therefore, all entries of the Jacobian matrix are 0.

\begin{equation} \dfrac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}} = \begin{pmatrix} 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \label{eq:3-3-4} \end{equation}

This matrix is the $N \times M$ zero matrix $\boldsymbol{O}$.

\begin{equation} \dfrac{\partial \boldsymbol{a}}{\partial \boldsymbol{x}} = \boldsymbol{O} \label{eq:3-3-5} \end{equation}

3.4 Affine Transform

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{A}\boldsymbol{x} + \boldsymbol{b})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{M \times N}$ is a constant matrix, $\boldsymbol{b} \in \mathbb{R}^M$ is a constant vector, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Let $\boldsymbol{y} = \boldsymbol{A}\boldsymbol{x} + \boldsymbol{b}$, where $\boldsymbol{y} \in \mathbb{R}^M$.

Write the $j$-th component of $\boldsymbol{y}$.

\begin{equation} y_j = (\boldsymbol{A}\boldsymbol{x})_j + b_j \label{eq:3-4-1} \end{equation}

Expand using the definition of matrix-vector multiplication.

\begin{equation} y_j = \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k + b_j \label{eq:3-4-2} \end{equation}

Compute the $(i, j)$ entry of the Jacobian matrix by differentiating $y_j$ with respect to $x_i$.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \dfrac{\partial}{\partial x_i} \left( \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k + b_j \right) \label{eq:3-4-3} \end{equation}

The derivative of a sum is the sum of the derivatives.

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = \dfrac{\partial}{\partial x_i} \left( \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k \right) + \dfrac{\partial b_j}{\partial x_i} \label{eq:3-4-4} \end{equation}

Since $b_j$ is a constant, its partial derivative is $0$.

\begin{equation} \dfrac{\partial b_j}{\partial x_i} = 0 \label{eq:3-4-5} \end{equation}

The first term gives $A_{ji}$ by the same calculation as Formula (3.2).

\begin{equation} \dfrac{\partial}{\partial x_i} \left( \displaystyle\sum_{k=0}^{N-1} A_{jk} x_k \right) = A_{ji} \label{eq:3-4-6} \end{equation}

Substituting \eqref{eq:3-4-5} and \eqref{eq:3-4-6} into \eqref{eq:3-4-4}:

\begin{equation} \dfrac{\partial y_j}{\partial x_i} = A_{ji} + 0 = A_{ji} \label{eq:3-4-7} \end{equation}

As in Formula (3.2), since the $(i, j)$ entry is $A_{ji}$, the Jacobian matrix equals $\boldsymbol{A}^\top$.

\begin{equation} \dfrac{\partial (\boldsymbol{A}\boldsymbol{x} + \boldsymbol{b})}{\partial \boldsymbol{x}} = \boldsymbol{A}^\top \label{eq:3-4-8} \end{equation}

Remark: The constant term $\boldsymbol{b}$ vanishes under differentiation. This means the Jacobian of an affine transform does not depend on the translation component.

3.5 Linear Transform with Transpose

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{x}^\top \boldsymbol{A})}{\partial \boldsymbol{x}} = \boldsymbol{A}$

Conditions: $\boldsymbol{A} \in \mathbb{R}^{N \times M}$ is a constant matrix, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Let $\boldsymbol{y}^\top = \boldsymbol{x}^\top \boldsymbol{A}$. Here $\boldsymbol{y}^\top$ is a $1 \times M$ row vector.

Write the $j$-th component of $\boldsymbol{y}^\top$. By the definition of row vector times matrix multiplication:

\begin{equation} (\boldsymbol{x}^\top \boldsymbol{A})_j = \displaystyle\sum_{k=0}^{N-1} x_k A_{kj} \label{eq:3-5-1} \end{equation}

This is the inner product of $\boldsymbol{x}$ and the $j$-th column of $\boldsymbol{A}$.

Let $y_j = (\boldsymbol{x}^\top \boldsymbol{A})_j$. Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\dfrac{\partial}{\partial x_i} \displaystyle\sum_{k=0}^{N-1} x_k A_{kj} \label{eq:3-5-2} \end{equation}

Interchange differentiation and summation. Since $A_{kj}$ are constants:

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\sum_{k=0}^{N-1} A_{kj} \displaystyle\dfrac{\partial x_k}{\partial x_i} \label{eq:3-5-3} \end{equation}

By Formula (3.1), $\displaystyle\dfrac{\partial x_k}{\partial x_i} = \delta_{ki}$. Substituting into \eqref{eq:3-5-3}:

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\sum_{k=0}^{N-1} A_{kj} \delta_{ki} \label{eq:3-5-4} \end{equation}

Using the sifting property of the Kronecker delta, only the $k = i$ term survives:

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = A_{ij} \label{eq:3-5-5} \end{equation}

Since the $(i, j)$ entry of the Jacobian matrix is $A_{ij}$, the Jacobian matrix is $\boldsymbol{A}$ itself. Therefore:

\begin{equation} \displaystyle\dfrac{\partial (\boldsymbol{x}^\top \boldsymbol{A})}{\partial \boldsymbol{x}} = \boldsymbol{A} \label{eq:3-5-6} \end{equation}

This is an $N \times M$ matrix.

Remark: Comparing with Formula (3.2): the derivative of $\boldsymbol{A}\boldsymbol{x}$ is $\boldsymbol{A}^\top$, while the derivative of $\boldsymbol{x}^\top \boldsymbol{A}$ is $\boldsymbol{A}$. Note the presence or absence of the transpose.

3.6 Sum/Difference Rule (Vector Version)

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = \displaystyle\dfrac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \pm \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}}$

Conditions: $\boldsymbol{f}: \mathbb{R}^N \to \mathbb{R}^M$ and $\boldsymbol{g}: \mathbb{R}^N \to \mathbb{R}^M$ are vector-valued functions, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Let $\boldsymbol{y}(\boldsymbol{x}) = \boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{g}(\boldsymbol{x})$, where $\boldsymbol{y} \in \mathbb{R}^M$.

Write the $j$-th component of $\boldsymbol{y}$. Since vector addition/subtraction is performed component-wise:

\begin{equation} y_j(\boldsymbol{x}) = f_j(\boldsymbol{x}) \pm g_j(\boldsymbol{x}) \label{eq:3-6-1} \end{equation}

Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\dfrac{\partial}{\partial x_i} (f_j(\boldsymbol{x}) \pm g_j(\boldsymbol{x})) \label{eq:3-6-2} \end{equation}

Apply the sum/difference rule for scalar functions. The derivative of a sum/difference is the sum/difference of the derivatives:

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\dfrac{\partial f_j(\boldsymbol{x})}{\partial x_i} \pm \displaystyle\dfrac{\partial g_j(\boldsymbol{x})}{\partial x_i} \label{eq:3-6-3} \end{equation}

Interpret each term on the right-hand side as a Jacobian matrix entry.

\begin{equation} \displaystyle\dfrac{\partial f_j(\boldsymbol{x})}{\partial x_i} = \left( \displaystyle\dfrac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij}, \quad \displaystyle\dfrac{\partial g_j(\boldsymbol{x})}{\partial x_i} = \left( \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-6-4} \end{equation}

Combining \eqref{eq:3-6-3} and \eqref{eq:3-6-4}:

\begin{equation} \left( \displaystyle\dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right)_{ij} = \left( \displaystyle\dfrac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \pm \left( \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-6-5} \end{equation}

This holds for all pairs $(i, j)$. Since matrix addition/subtraction is performed entry-wise:

\begin{equation} \displaystyle\dfrac{\partial (\boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = \displaystyle\dfrac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial \boldsymbol{x}} \pm \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \label{eq:3-6-6} \end{equation}

Remark: This linearity follows from the fact that scalar differentiation linearity holds at each entry of the Jacobian matrix.

3.7 Product Rule (Scalar × Vector)

Formula: $\displaystyle\dfrac{\partial (f(\boldsymbol{x}) \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = f(\boldsymbol{x}) \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} + \displaystyle\dfrac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} \boldsymbol{g}(\boldsymbol{x})^\top$

Conditions: $f: \mathbb{R}^N \to \mathbb{R}$ is a scalar function, $\boldsymbol{g}: \mathbb{R}^N \to \mathbb{R}^M$ is a vector-valued function, $\boldsymbol{x} \in \mathbb{R}^N$

Proof

Let $\boldsymbol{y}(\boldsymbol{x}) = f(\boldsymbol{x}) \boldsymbol{g}(\boldsymbol{x})$. This is a product of a scalar and a vector, so $\boldsymbol{y} \in \mathbb{R}^M$.

Write the $j$-th component of $\boldsymbol{y}$. The scalar-vector product acts on each component:

\begin{equation} y_j(\boldsymbol{x}) = f(\boldsymbol{x}) \cdot g_j(\boldsymbol{x}) \label{eq:3-7-1} \end{equation}

Differentiate $y_j$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\dfrac{\partial}{\partial x_i} (f(\boldsymbol{x}) \cdot g_j(\boldsymbol{x})) \label{eq:3-7-2} \end{equation}

Applying the scalar product rule (Formula 2.9):

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = f(\boldsymbol{x}) \displaystyle\dfrac{\partial g_j(\boldsymbol{x})}{\partial x_i} + g_j(\boldsymbol{x}) \displaystyle\dfrac{\partial f(\boldsymbol{x})}{\partial x_i} \label{eq:3-7-3} \end{equation}

Interpret the first term on the right-hand side.

\begin{equation} f(\boldsymbol{x}) \displaystyle\dfrac{\partial g_j(\boldsymbol{x})}{\partial x_i} = f(\boldsymbol{x}) \left( \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} \right)_{ij} \label{eq:3-7-4} \end{equation}

This is the product of $f(\boldsymbol{x})$ and the $(i, j)$ entry of the Jacobian of $\boldsymbol{g}$.

For the second term: $\displaystyle\dfrac{\partial f(\boldsymbol{x})}{\partial x_i}$ is the $i$-th component of the gradient vector $\displaystyle\dfrac{\partial f}{\partial \boldsymbol{x}} \in \mathbb{R}^N$, and $g_j(\boldsymbol{x})$ is the $j$-th component of $\boldsymbol{g}(\boldsymbol{x}) \in \mathbb{R}^M$. The outer product of the $N$-dimensional column vector $\displaystyle\dfrac{\partial f}{\partial \boldsymbol{x}}$ and the $M$-dimensional row vector $\boldsymbol{g}^\top$ is an $N \times M$ matrix whose $(i, j)$ entry is:

\begin{equation} \left( \displaystyle\dfrac{\partial f}{\partial \boldsymbol{x}} \boldsymbol{g}^\top \right)_{ij} = \displaystyle\dfrac{\partial f}{\partial x_i} \cdot g_j \label{eq:3-7-5} \end{equation}

Combining \eqref{eq:3-7-3}, \eqref{eq:3-7-4}, and \eqref{eq:3-7-5}:

\begin{equation} \left( \displaystyle\dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \right)_{ij} = f(\boldsymbol{x}) \left( \displaystyle\dfrac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}} \right)_{ij} + \left( \displaystyle\dfrac{\partial f}{\partial \boldsymbol{x}} \boldsymbol{g}^\top \right)_{ij} \label{eq:3-7-6} \end{equation}

Since this holds for all $(i, j)$, in matrix form:

\begin{equation} \displaystyle\dfrac{\partial (f(\boldsymbol{x}) \boldsymbol{g}(\boldsymbol{x}))}{\partial \boldsymbol{x}} = f(\boldsymbol{x}) \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} + \displaystyle\dfrac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} \boldsymbol{g}(\boldsymbol{x})^\top \label{eq:3-7-7} \end{equation}

Remark: The first term $f \displaystyle\dfrac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}$ is a scalar multiple giving an $N \times M$ matrix. The second term $\displaystyle\dfrac{\partial f}{\partial \boldsymbol{x}} \boldsymbol{g}^\top$ is an outer product ($N \times 1$ times $1 \times M$), also yielding an $N \times M$ matrix.

3.8 Element-wise Square of a Vector

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{x} \odot \boldsymbol{x})}{\partial \boldsymbol{x}} = 2\,\text{diag}(\boldsymbol{x})$

Conditions: $\boldsymbol{x} \in \mathbb{R}^N$, $\odot$ denotes the Hadamard product (element-wise product)

Proof

Let $\boldsymbol{y} = \boldsymbol{x} \odot \boldsymbol{x}$. By the definition of the Hadamard product, the $j$-th component of $\boldsymbol{y}$ is:

\begin{equation} y_j = x_j \cdot x_j = x_j^2 \label{eq:3-8-1} \end{equation}

Writing $\boldsymbol{y}$ in components:

\begin{equation} \boldsymbol{y} = \boldsymbol{x} \odot \boldsymbol{x} = \begin{pmatrix} x_0^2 \\ x_1^2 \\ \vdots \\ x_{N-1}^2 \end{pmatrix} \label{eq:3-8-2} \end{equation}

Differentiate $y_j = x_j^2$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \displaystyle\dfrac{\partial (x_j^2)}{\partial x_i} \label{eq:3-8-3} \end{equation}

When $i = j$, differentiate $x_j^2$ with respect to $x_j$. By the power rule (1.18):

\begin{equation} \displaystyle\dfrac{\partial (x_j^2)}{\partial x_j} = 2x_j \label{eq:3-8-4} \end{equation}

When $i \neq j$, $x_j^2$ does not depend on $x_i$:

\begin{equation} \displaystyle\dfrac{\partial (x_j^2)}{\partial x_i} = 0 \label{eq:3-8-5} \end{equation}

Combining \eqref{eq:3-8-4} and \eqref{eq:3-8-5}:

\begin{equation} \displaystyle\dfrac{\partial y_j}{\partial x_i} = \begin{cases} 2x_j, & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-8-6} \end{equation}

Writing out the Jacobian matrix, only the diagonal entries are nonzero.

\begin{equation} \displaystyle\dfrac{\partial (\boldsymbol{x} \odot \boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} 2x_0 & 0 & 0 & \cdots & 0 \\ 0 & 2x_1 & 0 & \cdots & 0 \\ 0 & 0 & 2x_2 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 2x_{N-1} \end{pmatrix} \label{eq:3-8-7} \end{equation}

Expressing this as a diagonal matrix, where $\text{diag}(\boldsymbol{x})$ has diagonal entries $x_0, x_1, \ldots, x_{N-1}$:

\begin{equation} \displaystyle\dfrac{\partial (\boldsymbol{x} \odot \boldsymbol{x})}{\partial \boldsymbol{x}} = 2\,\text{diag}(\boldsymbol{x}) \label{eq:3-8-8} \end{equation}

3.9 Derivative of the Cross Product

Formula: $\displaystyle\dfrac{d}{dt}(\boldsymbol{f}(t) \times \boldsymbol{g}(t)) = \displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}(t) + \boldsymbol{f}(t) \times \displaystyle\dfrac{d\boldsymbol{g}}{dt}$

Conditions: $\boldsymbol{f}: \mathbb{R} \to \mathbb{R}^3$ and $\boldsymbol{g}: \mathbb{R} \to \mathbb{R}^3$ are vector-valued functions of time $t \in \mathbb{R}$

Proof

Let $\boldsymbol{h}(t) = \boldsymbol{f}(t) \times \boldsymbol{g}(t)$. Write the components using the definition of the cross product, indexing components by $1, 2, 3$.

\begin{equation} \boldsymbol{h} = \boldsymbol{f} \times \boldsymbol{g} = \begin{pmatrix} f_2 g_3 - f_3 g_2 \\ f_3 g_1 - f_1 g_3 \\ f_1 g_2 - f_2 g_1 \end{pmatrix} \label{eq:3-9-1} \end{equation}

Differentiate the first component $h_1 = f_2 g_3 - f_3 g_2$ with respect to $t$.

\begin{equation} \displaystyle\dfrac{dh_1}{dt} = \displaystyle\dfrac{d}{dt}(f_2 g_3 - f_3 g_2) \label{eq:3-9-2} \end{equation}

The derivative of a difference is the difference of the derivatives:

\begin{equation} \displaystyle\dfrac{dh_1}{dt} = \displaystyle\dfrac{d}{dt}(f_2 g_3) - \displaystyle\dfrac{d}{dt}(f_3 g_2) \label{eq:3-9-3} \end{equation}

Applying the product rule (1.25) to each term:

\begin{equation} \displaystyle\dfrac{d}{dt}(f_2 g_3) = \displaystyle\dfrac{df_2}{dt} g_3 + f_2 \displaystyle\dfrac{dg_3}{dt}, \quad \displaystyle\dfrac{d}{dt}(f_3 g_2) = \displaystyle\dfrac{df_3}{dt} g_2 + f_3 \displaystyle\dfrac{dg_2}{dt} \label{eq:3-9-4} \end{equation}

Substituting \eqref{eq:3-9-4} into \eqref{eq:3-9-3}:

\begin{equation} \displaystyle\dfrac{dh_1}{dt} = \displaystyle\dfrac{df_2}{dt} g_3 + f_2 \displaystyle\dfrac{dg_3}{dt} - \displaystyle\dfrac{df_3}{dt} g_2 - f_3 \displaystyle\dfrac{dg_2}{dt} \label{eq:3-9-5} \end{equation}

Rearrange by grouping terms containing derivatives of $\boldsymbol{f}$ and terms containing derivatives of $\boldsymbol{g}$.

\begin{equation} \displaystyle\dfrac{dh_1}{dt} = \left( \displaystyle\dfrac{df_2}{dt} g_3 - \displaystyle\dfrac{df_3}{dt} g_2 \right) + \left( f_2 \displaystyle\dfrac{dg_3}{dt} - f_3 \displaystyle\dfrac{dg_2}{dt} \right) \label{eq:3-9-6} \end{equation}

Compare with the definition of the cross product. The first component of $\displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}$ is $\displaystyle\dfrac{df_2}{dt} g_3 - \displaystyle\dfrac{df_3}{dt} g_2$, and the first component of $\boldsymbol{f} \times \displaystyle\dfrac{d\boldsymbol{g}}{dt}$ is $f_2 \displaystyle\dfrac{dg_3}{dt} - f_3 \displaystyle\dfrac{dg_2}{dt}$. Therefore:

\begin{equation} \displaystyle\dfrac{dh_1}{dt} = \left( \displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g} \right)_1 + \left( \boldsymbol{f} \times \displaystyle\dfrac{d\boldsymbol{g}}{dt} \right)_1 \label{eq:3-9-7} \end{equation}

Performing the same calculation for the 2nd and 3rd components, we find that for all $k = 1, 2, 3$:

\begin{equation} \displaystyle\dfrac{dh_k}{dt} = \left( \displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g} \right)_k + \left( \boldsymbol{f} \times \displaystyle\dfrac{d\boldsymbol{g}}{dt} \right)_k \label{eq:3-9-8} \end{equation}

Combining as a vector equation:

\begin{equation} \displaystyle\dfrac{d}{dt}(\boldsymbol{f}(t) \times \boldsymbol{g}(t)) = \displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}(t) + \boldsymbol{f}(t) \times \displaystyle\dfrac{d\boldsymbol{g}}{dt} \label{eq:3-9-9} \end{equation}

Remark: The cross product is non-commutative ($\boldsymbol{a} \times \boldsymbol{b} \neq \boldsymbol{b} \times \boldsymbol{a}$), so the order of terms matters. $\displaystyle\dfrac{d\boldsymbol{f}}{dt} \times \boldsymbol{g}$ and $\boldsymbol{g} \times \displaystyle\dfrac{d\boldsymbol{f}}{dt}$ differ by a sign.

3.10 Time Derivative of the 2-Norm

Formula: $\displaystyle\dfrac{d}{dt} \|\boldsymbol{f}(t)\| = \displaystyle\dfrac{\boldsymbol{f}(t) \cdot \displaystyle\dfrac{d\boldsymbol{f}}{dt}}{\|\boldsymbol{f}(t)\|}$

Conditions: $\boldsymbol{f}: \mathbb{R} \to \mathbb{R}^N$ is a vector-valued function of time $t \in \mathbb{R}$, $\|\boldsymbol{f}(t)\| \neq 0$

Proof

Let $g(t) = \|\boldsymbol{f}(t)\|$. Writing the definition of the 2-norm:

\begin{equation} g(t) = \|\boldsymbol{f}(t)\| = \sqrt{\boldsymbol{f}(t) \cdot \boldsymbol{f}(t)} \label{eq:3-10-1} \end{equation}

Expanding the inner product in components:

\begin{equation} \boldsymbol{f}(t) \cdot \boldsymbol{f}(t) = \displaystyle\sum_{k=0}^{N-1} f_k(t)^2 \label{eq:3-10-2} \end{equation}

For notational convenience, let $h(t) = \boldsymbol{f}(t) \cdot \boldsymbol{f}(t) = \|\boldsymbol{f}(t)\|^2$.

\begin{equation} g(t) = \sqrt{h(t)} = h(t)^{1/2} \label{eq:3-10-3} \end{equation}

Differentiate $g(t)$ with respect to $t$. Applying the chain rule (1.26):

\begin{equation} \displaystyle\dfrac{dg}{dt} = \displaystyle\dfrac{d}{dh}(h^{1/2}) \cdot \displaystyle\dfrac{dh}{dt} \label{eq:3-10-4} \end{equation}

Compute $\displaystyle\dfrac{d}{dh}(h^{1/2})$. By the power rule (1.19):

\begin{equation} \displaystyle\dfrac{d}{dh}(h^{1/2}) = \displaystyle\dfrac{1}{2} h^{-1/2} = \displaystyle\dfrac{1}{2\sqrt{h}} = \displaystyle\dfrac{1}{2\|\boldsymbol{f}(t)\|} \label{eq:3-10-5} \end{equation}

Compute $\displaystyle\dfrac{dh}{dt}$. Since $h = \displaystyle\sum_{k=0}^{N-1} f_k^2$:

\begin{equation} \displaystyle\dfrac{dh}{dt} = \displaystyle\sum_{k=0}^{N-1} \displaystyle\dfrac{d}{dt}(f_k^2) \label{eq:3-10-6} \end{equation}

Compute $\displaystyle\dfrac{d}{dt}(f_k^2)$. By the chain rule (1.26):

\begin{equation} \displaystyle\dfrac{d}{dt}(f_k^2) = 2 f_k \displaystyle\dfrac{df_k}{dt} \label{eq:3-10-7} \end{equation}

Substituting \eqref{eq:3-10-7} into \eqref{eq:3-10-6}:

\begin{equation} \displaystyle\dfrac{dh}{dt} = \displaystyle\sum_{k=0}^{N-1} 2 f_k \displaystyle\dfrac{df_k}{dt} = 2 \displaystyle\sum_{k=0}^{N-1} f_k \displaystyle\dfrac{df_k}{dt} \label{eq:3-10-8} \end{equation}

Interpreting this sum as an inner product:

\begin{equation} \displaystyle\dfrac{dh}{dt} = 2 \left( \boldsymbol{f}(t) \cdot \displaystyle\dfrac{d\boldsymbol{f}}{dt} \right) \label{eq:3-10-9} \end{equation}

Substituting \eqref{eq:3-10-5} and \eqref{eq:3-10-9} into \eqref{eq:3-10-4}:

\begin{equation} \displaystyle\dfrac{dg}{dt} = \displaystyle\dfrac{1}{2\|\boldsymbol{f}(t)\|} \cdot 2 \left( \boldsymbol{f}(t) \cdot \displaystyle\dfrac{d\boldsymbol{f}}{dt} \right) \label{eq:3-10-10} \end{equation}

The factors of 2 cancel:

\begin{equation} \displaystyle\dfrac{d}{dt} \|\boldsymbol{f}(t)\| = \displaystyle\dfrac{\boldsymbol{f}(t) \cdot \displaystyle\dfrac{d\boldsymbol{f}}{dt}}{\|\boldsymbol{f}(t)\|} \label{eq:3-10-11} \end{equation}

Remark: The numerator is the inner product of $\boldsymbol{f}$ and $\displaystyle\dfrac{d\boldsymbol{f}}{dt}$, representing the projection of the velocity onto the direction of $\boldsymbol{f}$. When $\|\boldsymbol{f}\| = 0$ (zero vector), the denominator vanishes and the expression is undefined.

3.11 Element-wise Function Application

Formula: $\displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \text{diag}(f'(x_0), f'(x_1), \ldots, f'(x_{N-1}))$

Conditions: $f: \mathbb{R} \to \mathbb{R}$ is a differentiable scalar function, $\boldsymbol{x} \in \mathbb{R}^N$, $\boldsymbol{g}(\boldsymbol{x}) = (f(x_0), f(x_1), \ldots, f(x_{N-1}))^\top \in \mathbb{R}^N$

Proof

$\boldsymbol{g}(\boldsymbol{x})$ is the result of applying the scalar function $f$ to each component of $\boldsymbol{x}$. The $j$-th component is:

\begin{equation} g_j(\boldsymbol{x}) = f(x_j) \label{eq:3-11-1} \end{equation}

Writing $\boldsymbol{g}$ in components:

\begin{equation} \boldsymbol{g}(\boldsymbol{x}) = \begin{pmatrix} f(x_0) \\ f(x_1) \\ \vdots \\ f(x_{N-1}) \end{pmatrix} \label{eq:3-11-2} \end{equation}

Differentiate $g_j = f(x_j)$ with respect to $x_i$ to compute the $(i, j)$ entry of the Jacobian.

\begin{equation} \displaystyle\dfrac{\partial g_j}{\partial x_i} = \displaystyle\dfrac{\partial f(x_j)}{\partial x_i} \label{eq:3-11-3} \end{equation}

When $i = j$, differentiate $f(x_j)$ with respect to $x_j$. By the chain rule (1.26):

\begin{equation} \displaystyle\dfrac{\partial f(x_j)}{\partial x_j} = f'(x_j) \cdot \displaystyle\dfrac{\partial x_j}{\partial x_j} = f'(x_j) \cdot 1 = f'(x_j) \label{eq:3-11-4} \end{equation}

When $i \neq j$, $f(x_j)$ does not depend on $x_i$ (since $x_j$ and $x_i$ are independent variables):

\begin{equation} \displaystyle\dfrac{\partial f(x_j)}{\partial x_i} = 0 \label{eq:3-11-5} \end{equation}

Combining \eqref{eq:3-11-4} and \eqref{eq:3-11-5}:

\begin{equation} \displaystyle\dfrac{\partial g_j}{\partial x_i} = \begin{cases} f'(x_j), & i = j \\ 0, & i \neq j \end{cases} \label{eq:3-11-6} \end{equation}

Writing out the Jacobian matrix, only the diagonal entries are nonzero.

\begin{equation} \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \begin{pmatrix} f'(x_0) & 0 & \cdots & 0 \\ 0 & f'(x_1) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & f'(x_{N-1}) \end{pmatrix} \label{eq:3-11-7} \end{equation}

Expressing this as a diagonal matrix:

\begin{equation} \displaystyle\dfrac{\partial \boldsymbol{g}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \text{diag}(f'(x_0), f'(x_1), \ldots, f'(x_{N-1})) \label{eq:3-11-8} \end{equation}

Remark: This formula is frequently used when differentiating neural network activation functions (ReLU, sigmoid, etc.). One may also write $f' \circ \boldsymbol{x}$ to denote element-wise application of the derivative.

3.12 Hadamard Product (Element-wise Product)

Formula: $\displaystyle\dfrac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{y})$

Conditions: $\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^N$, $\boldsymbol{y}$ is a constant vector independent of $\boldsymbol{x}$

Proof

The $i$-th component of the Hadamard product is defined as:

\begin{equation} (\boldsymbol{x} \odot \boldsymbol{y})_i = x_i \, y_i \label{eq:3-12-1} \end{equation}

Differentiate with respect to $x_j$. Since $y_i$ is a constant independent of $\boldsymbol{x}$:

\begin{equation} \dfrac{\partial (x_i \, y_i)}{\partial x_j} = y_i \dfrac{\partial x_i}{\partial x_j} = y_i \, \delta_{ij} \label{eq:3-12-2} \end{equation}

Since the $(i, j)$ entry of the Jacobian is $y_i \, \delta_{ij}$, it is nonzero only when $i = j$, taking the value $y_i$. Therefore the Jacobian is a diagonal matrix.

\begin{equation} \dfrac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{y}) \label{eq:3-12-3} \end{equation}

Remark: In the general case where $\boldsymbol{y}$ also depends on $\boldsymbol{x}$, the product rule gives $\mathrm{diag}(\boldsymbol{x}) \, \partial \boldsymbol{y}/\partial \boldsymbol{x} + \mathrm{diag}(\boldsymbol{y})$ (see 6.1).

3.13 Jacobian of the Softmax Function

Formula: $\displaystyle\dfrac{\partial\, \mathrm{softmax}(\boldsymbol{x})}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{p}) - \boldsymbol{p}\boldsymbol{p}^\top$
($\boldsymbol{p} = \mathrm{softmax}(\boldsymbol{x})$)

Conditions: $\boldsymbol{x} \in \mathbb{R}^N$, $p_i = e^{x_i} / \displaystyle\sum_{k} e^{x_k}$

Proof

Differentiate the $i$-th softmax output with respect to $x_j$. Let $S = \displaystyle\sum_{k} e^{x_k}$.

\begin{equation} p_i = \dfrac{e^{x_i}}{S} \label{eq:3-13-1} \end{equation}

Case $i = j$: Apply the quotient rule.

\begin{equation} \dfrac{\partial p_i}{\partial x_i} = \dfrac{e^{x_i} S - e^{x_i} e^{x_i}}{S^2} = p_i - p_i^2 = p_i(1 - p_i) \label{eq:3-13-2} \end{equation}

Case $i \neq j$: The numerator $e^{x_i}$ does not depend on $x_j$.

\begin{equation} \dfrac{\partial p_i}{\partial x_j} = -\dfrac{e^{x_i} e^{x_j}}{S^2} = -p_i \, p_j \label{eq:3-13-3} \end{equation}

Combining \eqref{eq:3-13-2} and \eqref{eq:3-13-3} using the Kronecker delta:

\begin{equation} \dfrac{\partial p_i}{\partial x_j} = p_i(\delta_{ij} - p_j) \label{eq:3-13-4} \end{equation}

In matrix form:

\begin{equation} \dfrac{\partial \boldsymbol{p}}{\partial \boldsymbol{x}} = \mathrm{diag}(\boldsymbol{p}) - \boldsymbol{p}\boldsymbol{p}^\top \label{eq:3-13-5} \end{equation}

Remark: This proof is identical in content to 6.2. In Chapter 3 it is presented as a concrete example of vector-by-vector differentiation yielding a Jacobian; in Chapter 6 it appears as differentiation of an activation function.

Proofs Chapter 3: Vector-by-Vector Derivatives

3.1 Identity Transform

Proof

3.2 Linear Transform

Proof

3.3 Constant Vector

Proof

3.4 Affine Transform

Proof

3.5 Linear Transform with Transpose

Proof

3.6 Sum/Difference Rule (Vector Version)

Proof

3.7 Product Rule (Scalar × Vector)

Proof

3.8 Element-wise Square of a Vector

Proof

3.9 Derivative of the Cross Product

Proof

3.10 Time Derivative of the 2-Norm

Proof

3.11 Element-wise Function Application

Proof

Applied Formulas

3.12 Hadamard Product (Element-wise Product)

Proof

3.13 Jacobian of the Softmax Function

Proof

References