Applications to Machine Learning
All formulas on this page use the denominator layout convention. See Layout Conventions for details.
Table of Contents
- Activation Functions (6.9-6.10)
- Fully Connected Layers (17.1-17.3)
- Normalization Layers (17.4-17.7)
- Attention Mechanisms (17.8-17.12)
- Convolution & Pooling (17.13-17.17)
- Regularization (17.18-17.19)
- VAE (17.20-17.22)
- 18.1 Gradient of SVD Backpropagation
- 18.2 Gradient of Singular Values
- 18.3 Definition of the Fisher Information Matrix
- 18.4 Hessian Representation of the Fisher Information Matrix
- 18.5 Natural Gradient
- 18.13 Policy Gradient Theorem
- 18.14 Policy Gradient with Baseline
- 18.18 Gradient of Skip-gram (Negative Sampling)
- 18.19 Gradient of GloVe
- 18.15 InfoNCE Loss Function
- 18.18 Gradient of Cholesky Decomposition
- 18.13 Sinkhorn Distance
- 18.16 Gaussian Processes
- 18.23 Belief Propagation
- 18.26 Dictionary Learning & LASSO
Computer Vision
3.14 Differentiation of the Homography Matrix
Explanation
Gradient of point transformation under a 2D projective transformation (homography) $\boldsymbol{H} \in \mathbb{R}^{3 \times 3}$. Used in image registration, panoramic stitching, and augmented reality.
The homogeneous coordinates $\tilde{\boldsymbol{p}} = \boldsymbol{H}\boldsymbol{p}$ are normalized to obtain image coordinates $\boldsymbol{p}' = (x', y')^\top = (\tilde{p}_1/\tilde{p}_3, \tilde{p}_2/\tilde{p}_3)^\top$.
Medical Image Reconstruction
12.13 Gradient of Tikhonov Regularization
Explanation
Regularization of the inverse problem in CT/MRI image reconstruction. Setting the gradient to zero yields the Tikhonov solution $\boldsymbol{x}^* = (\boldsymbol{A}^\top\boldsymbol{A} + \lambda\boldsymbol{L}^\top\boldsymbol{L})^{-1}\boldsymbol{A}^\top\boldsymbol{y}$.
12.15 Subgradient of Total Variation (TV) Regularization
Explanation
Total variation regularization removes noise while preserving edges, making it well-suited for medical imaging. In practice, it is smoothed as $|\nabla \boldsymbol{x}| + \epsilon$ ($\epsilon > 0$ is a small constant).
Detail Pages
For detailed formulas and proofs in each area, please see the following pages: