What is the derivative formula for the Hadamard product?

When vectors x and y are both functions of z, the derivative of the Hadamard product (element-wise product) is given by ∂(x⊙y)/∂z = diag(x)·∂y/∂z + diag(y)·∂x/∂z. This is obtained by applying the scalar product rule (Leibniz rule) component-wise and expressing it in matrix form using diag matrices.

How is the Hadamard product derivative used in deep learning?

In neural network backpropagation, activation functions (ReLU, sigmoid, etc.) are applied element-wise, so their derivatives act on the gradient in the form of a Hadamard product. Specifically, the gradient propagates as the Hadamard product of the error signal δ and the derivative of the activation function f'(z), i.e., δ⊙f'(z).

Proofs Chapter 6: Hadamard Product Derivatives

Proofs: Derivative of Hadamard Product

This chapter proves the derivative of the Hadamard product (element-wise product). In neural network backpropagation, the derivative of the activation function acts on the gradient in the form of a Hadamard product. The formulas in this chapter provide the foundation for that operation.

Prerequisites: Chapter 4 (Basic Formulas for Matrix Derivatives), Chapter 5 (Trace Derivatives). Related chapters: Chapter 14 (Matrix Chain Rule).

Prerequisites for This Chapter
Unless otherwise stated, the formulas in this chapter hold under the following conditions:

All formulas are based on the denominator layout
The Hadamard product $\odot$ denotes the element-wise product between vectors (or matrices) of the same dimensions

6.1 Derivative of the Hadamard Product

Formula: $\displaystyle\frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{z}} = \mathrm{diag}(\boldsymbol{x}) \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} + \mathrm{diag}(\boldsymbol{y}) \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{z}}$

Conditions: $\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^N$ are both functions of $\boldsymbol{z} \in \mathbb{R}^M$

Proof

The $i$-th component of the Hadamard product is defined as follows.

\begin{equation} (\boldsymbol{x} \odot \boldsymbol{y})_i = x_i \, y_i \label{eq:6-1-1} \end{equation}

We differentiate this with respect to $z_j$. Applying the scalar product rule (Leibniz rule) yields the following.

\begin{equation} \frac{\partial (x_i \, y_i)}{\partial z_j} = x_i \frac{\partial y_i}{\partial z_j} + y_i \frac{\partial x_i}{\partial z_j} \label{eq:6-1-2} \end{equation}

Writing this as an entry of the Jacobian matrix $\boldsymbol{J} \in \mathbb{R}^{N \times M}$ (or $\mathbb{R}^{M \times N}$ in the denominator layout), the matrix form becomes the following.

\begin{equation} \frac{\partial (\boldsymbol{x} \odot \boldsymbol{y})}{\partial \boldsymbol{z}} = \mathrm{diag}(\boldsymbol{x}) \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{z}} + \mathrm{diag}(\boldsymbol{y}) \frac{\partial \boldsymbol{x}}{\partial \boldsymbol{z}} \label{eq:6-1-3} \end{equation}

Here, $\mathrm{diag}(\boldsymbol{x})$ is the matrix with the components of $\boldsymbol{x}$ placed along the diagonal, and row $i$ of $\mathrm{diag}(\boldsymbol{x}) \boldsymbol{A}$ is row $i$ of $\boldsymbol{A}$ scaled by $x_i$. This correctly represents the component-wise product $x_i \cdot (\partial y_i / \partial z_j)$.

Note: When $\boldsymbol{x}$ is a constant vector independent of $\boldsymbol{z}$, we have $\partial \boldsymbol{x}/\partial \boldsymbol{z} = \boldsymbol{O}$, and the formula simplifies to $\mathrm{diag}(\boldsymbol{x}) \, \partial \boldsymbol{y}/\partial \boldsymbol{z}$.

References

Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. www.deeplearningbook.org