Applications to Machine Learning

Notation Convention
All formulas on this page use the denominator layout convention. See Layout Conventions for details.

Table of Contents

  • Activation Functions (6.9-6.10)
  • Fully Connected Layers (17.1-17.3)
  • Normalization Layers (17.4-17.7)
  • Attention Mechanisms (17.8-17.12)
  • Convolution & Pooling (17.13-17.17)
  • Regularization (17.18-17.19)
  • VAE (17.20-17.22)
  • 18.1 Gradient of SVD Backpropagation
  • 18.2 Gradient of Singular Values
  • 18.3 Definition of the Fisher Information Matrix
  • 18.4 Hessian Representation of the Fisher Information Matrix
  • 18.5 Natural Gradient
  • 18.13 Policy Gradient Theorem
  • 18.14 Policy Gradient with Baseline
  • 18.18 Gradient of Skip-gram (Negative Sampling)
  • 18.19 Gradient of GloVe
  • 18.15 InfoNCE Loss Function
  • 18.18 Gradient of Cholesky Decomposition
  • 18.13 Sinkhorn Distance
  • 18.16 Gaussian Processes
  • 18.23 Belief Propagation
  • 18.26 Dictionary Learning & LASSO

Computer Vision

3.14 Differentiation of the Homography Matrix

Formula: $\displaystyle\frac{\partial \boldsymbol{p}'}{\partial \boldsymbol{H}} = \displaystyle\frac{1}{w'}\begin{pmatrix} \boldsymbol{p}^\top & \boldsymbol{0}^\top & -x'\boldsymbol{p}^\top \\ \boldsymbol{0}^\top & \boldsymbol{p}^\top & -y'\boldsymbol{p}^\top \end{pmatrix}$
Conditions: $\boldsymbol{p}' = \pi(\boldsymbol{H}\boldsymbol{p})$, $\pi$: normalization function, $w' = \boldsymbol{h}_3^\top \boldsymbol{p}$
Explanation

Gradient of point transformation under a 2D projective transformation (homography) $\boldsymbol{H} \in \mathbb{R}^{3 \times 3}$. Used in image registration, panoramic stitching, and augmented reality.

The homogeneous coordinates $\tilde{\boldsymbol{p}} = \boldsymbol{H}\boldsymbol{p}$ are normalized to obtain image coordinates $\boldsymbol{p}' = (x', y')^\top = (\tilde{p}_1/\tilde{p}_3, \tilde{p}_2/\tilde{p}_3)^\top$.

Medical Image Reconstruction

12.13 Gradient of Tikhonov Regularization

Formula: $\displaystyle\frac{\partial J}{\partial \boldsymbol{x}} = 2\boldsymbol{A}^\top(\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}) + 2\lambda\boldsymbol{L}^\top\boldsymbol{L}\boldsymbol{x}$
Conditions: $J(\boldsymbol{x}) = \|\boldsymbol{A}\boldsymbol{x} - \boldsymbol{y}\|^2 + \lambda\|\boldsymbol{L}\boldsymbol{x}\|^2$, $\boldsymbol{A}$: forward model matrix, $\boldsymbol{L}$: regularization matrix
Explanation

Regularization of the inverse problem in CT/MRI image reconstruction. Setting the gradient to zero yields the Tikhonov solution $\boldsymbol{x}^* = (\boldsymbol{A}^\top\boldsymbol{A} + \lambda\boldsymbol{L}^\top\boldsymbol{L})^{-1}\boldsymbol{A}^\top\boldsymbol{y}$.

12.15 Subgradient of Total Variation (TV) Regularization

Formula: $\displaystyle\frac{\partial}{\partial \boldsymbol{x}}\text{TV}(\boldsymbol{x}) = -\text{div}\left(\displaystyle\frac{\nabla \boldsymbol{x}}{|\nabla \boldsymbol{x}|}\right)$
Conditions: $\text{TV}(\boldsymbol{x}) = \|\nabla \boldsymbol{x}\|_1$, non-differentiable at $|\nabla \boldsymbol{x}| = 0$
Explanation

Total variation regularization removes noise while preserving edges, making it well-suited for medical imaging. In practice, it is smoothed as $|\nabla \boldsymbol{x}| + \epsilon$ ($\epsilon > 0$ is a small constant).

Detail Pages

For detailed formulas and proofs in each area, please see the following pages: