Proofs Chapter 6: Hadamard Product Derivatives
Proofs: Derivative of Hadamard Product
This chapter proves the derivative of the Hadamard product (element-wise product). In neural network backpropagation, the derivative of the activation function acts on the gradient in the form of a Hadamard product. The formulas in this chapter provide the foundation for that operation.
Prerequisites: Chapter 4 (Basic Formulas for Matrix Derivatives), Chapter 5 (Trace Derivatives). Related chapters: Chapter 14 (Matrix Chain Rule).
Unless otherwise stated, the formulas in this chapter hold under the following conditions:
- All formulas are based on the denominator layout
- The Hadamard product $\odot$ denotes the element-wise product between vectors (or matrices) of the same dimensions
6.1 Derivative of the Hadamard Product
Proof
The $i$-th component of the Hadamard product is defined as follows.
\begin{equation} (\boldsymbol{x} \odot \boldsymbol{y})_i = x_i \, y_i \label{eq:6-1-1} \end{equation}
We differentiate this with respect to $z_j$. Applying the scalar product rule (Leibniz rule) yields the following.
\begin{equation} \frac{\partial (x_i \, y_i)}{\partial z_j} = x_i \frac{\partial y_i}{\partial z_j} + y_i \frac{\partial x_i}{\partial z_j} \label{eq:6-1-2} \end{equation}
Writing this as an entry of the Jacobian matrix $\boldsymbol{J} \in \mathbb{R}^{N \times M}$ (or $\mathbb{R}^{M \times N}$ in the denominator layout), the matrix form becomes the following.
\begin{equation} \frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{z}} = \mathrm{diag}(\boldsymbol{x}) \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} + \mathrm{diag}(\boldsymbol{y}) \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{z}} \label{eq:6-1-3} \end{equation}
Here, $\mathrm{diag}(\boldsymbol{x})$ is the matrix with the components of $\boldsymbol{x}$ placed along the diagonal, and row $i$ of $\mathrm{diag}(\boldsymbol{x}) \boldsymbol{A}$ is row $i$ of $\boldsymbol{A}$ scaled by $x_i$. This correctly represents the component-wise product $x_i \cdot (\partial y_i / \partial z_j)$.
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. www.deeplearningbook.org