Proofs Chapter 6: Hadamard Product Derivatives

Proofs: Derivative of Hadamard Product

This chapter proves the derivative of the Hadamard product (element-wise product). In neural network backpropagation, the derivative of the activation function acts on the gradient in the form of a Hadamard product. The formulas in this chapter provide the foundation for that operation.

Prerequisites: Chapter 4 (Basic Formulas for Matrix Derivatives), Chapter 5 (Trace Derivatives). Related chapters: Chapter 14 (Matrix Chain Rule).

Prerequisites for This Chapter
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
  • All formulas are based on the denominator layout
  • The Hadamard product $\odot$ denotes the element-wise product between vectors (or matrices) of the same dimensions

6.1 Derivative of the Hadamard Product

Formula: $\displaystyle\frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{z}} = \mathrm{diag}(\boldsymbol{x}) \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} + \mathrm{diag}(\boldsymbol{y}) \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{z}}$
Conditions: $\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^N$ are both functions of $\boldsymbol{z} \in \mathbb{R}^M$
Proof

The $i$-th component of the Hadamard product is defined as follows.

\begin{equation} (\boldsymbol{x} \odot \boldsymbol{y})_i = x_i \, y_i \label{eq:6-1-1} \end{equation}

We differentiate this with respect to $z_j$. Applying the scalar product rule (Leibniz rule) yields the following.

\begin{equation} \frac{\partial (x_i \, y_i)}{\partial z_j} = x_i \frac{\partial y_i}{\partial z_j} + y_i \frac{\partial x_i}{\partial z_j} \label{eq:6-1-2} \end{equation}

Writing this as an entry of the Jacobian matrix $\boldsymbol{J} \in \mathbb{R}^{N \times M}$ (or $\mathbb{R}^{M \times N}$ in the denominator layout), the matrix form becomes the following.

\begin{equation} \frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{z}} = \mathrm{diag}(\boldsymbol{x}) \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} + \mathrm{diag}(\boldsymbol{y}) \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{z}} \label{eq:6-1-3} \end{equation}

Here, $\mathrm{diag}(\boldsymbol{x})$ is the matrix with the components of $\boldsymbol{x}$ placed along the diagonal, and row $i$ of $\mathrm{diag}(\boldsymbol{x}) \boldsymbol{A}$ is row $i$ of $\boldsymbol{A}$ scaled by $x_i$. This correctly represents the component-wise product $x_i \cdot (\partial y_i / \partial z_j)$.

Note: When $\boldsymbol{x}$ is a constant vector independent of $\boldsymbol{z}$, we have $\partial \boldsymbol{x}/\partial \boldsymbol{z} = \boldsymbol{O}$, and the formula simplifies to $\mathrm{diag}(\boldsymbol{x}) \, \partial \boldsymbol{y}/\partial \boldsymbol{z}$.

References