Vector & Matrix Calculus
Formula Cheat Sheet, Proofs & Applications — 16 Chapters + 22 Applied Fields
What Is Vector/Matrix Calculus?
Vector/matrix calculus (Matrix Calculus) is a field that systematically handles differentiation of functions whose variables are scalars, vectors, or matrices. By organizing partial derivatives into matrix and vector forms, it enables concise notation for multi-dimensional gradient computations.
Intuitive Understanding
The derivative of a single-variable function $f(x)$ is the scalar $\dfrac{df}{dx}$. What happens when we differentiate an $n$-variable function $f(\boldsymbol{x})$ with respect to $\boldsymbol{x} = (x_1, \ldots, x_n)^\top$? The answer is the gradient vector (a vector of all partial derivatives):
$$\frac{\partial f}{\partial \boldsymbol{x}} = \begin{pmatrix} \dfrac{\partial f}{\partial x_1} \\[6pt] \vdots \\[6pt] \dfrac{\partial f}{\partial x_n} \end{pmatrix}$$Similarly, differentiating a vector-valued function $\boldsymbol{f}(\boldsymbol{x})$ with respect to a vector yields the Jacobian matrix, and differentiating a scalar function with respect to a matrix $\boldsymbol{X}$ yields the matrix gradient. This is the fundamental idea behind vector and matrix differentiation.
Types of Differentiation and Result Shapes
| Function | Variable | Result Shape | Name |
|---|---|---|---|
| Scalar $f$ | Vector $\boldsymbol{x} \in \mathbb{R}^n$ | $n$-dim vector | Gradient |
| Vector $\boldsymbol{f} \in \mathbb{R}^m$ | Vector $\boldsymbol{x} \in \mathbb{R}^n$ | $n \times m$ matrix | Jacobian matrix |
| Scalar $f$ | Vector $\boldsymbol{x}$ (2nd derivative) | $n \times n$ matrix | Hessian matrix |
| Scalar $f$ | Matrix $\boldsymbol{X} \in \mathbb{R}^{m \times n}$ | $m \times n$ matrix | Matrix gradient |
Why Is It Needed?
- Machine Learning: Backpropagation in neural networks is matrix calculus itself — computing the gradient of the loss with respect to weight matrices $\frac{\partial L}{\partial \boldsymbol{W}}$
- Statistics: Maximum likelihood estimation requires gradients of log-likelihoods with respect to parameter vectors. Also essential for Fisher information matrices and Gaussian processes
- Optimization: Newton's method and quasi-Newton methods use the gradient (1st order) and Hessian (2nd order) to accelerate convergence
- Control Engineering: Derivation of LQR (Linear Quadratic Regulator) and Kalman filters requires differentiation of quadratic forms
- Physics: Strain energy functions in continuum mechanics, Green's function derivatives in electromagnetism, etc.
Quick-Reference Formula Table (Denominator Layout)
Below are the most frequently used formulas. For the complete collection, see the Formula Sheet.
Vector Differentiation (Scalar by Vector)
| Function | Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ | Notes |
|---|---|---|
| $\boldsymbol{a}^\top \boldsymbol{x}$ | $\boldsymbol{a}$ | Linear function |
| $\boldsymbol{x}^\top \boldsymbol{A} \boldsymbol{x}$ | $(\boldsymbol{A} + \boldsymbol{A}^\top)\boldsymbol{x}$ | Quadratic form ($2\boldsymbol{A}\boldsymbol{x}$ if $\boldsymbol{A}$ is symmetric) |
| $\|\boldsymbol{x}\|^2 = \boldsymbol{x}^\top\boldsymbol{x}$ | $2\boldsymbol{x}$ | Squared norm |
| $\boldsymbol{x}^\top \boldsymbol{A}^\top \boldsymbol{b}$ | $\boldsymbol{A}^\top \boldsymbol{b}$ | Affine transformation |
| $\|\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\|^2$ | $2\boldsymbol{A}^\top(\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b})$ | Normal equation (least squares) |
Vector Differentiation (Vector by Vector)
| Function | Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ | Notes |
|---|---|---|
| $\boldsymbol{A}\boldsymbol{x}$ | $\boldsymbol{A}^\top$ | Jacobian |
| $\boldsymbol{x}$ | $\boldsymbol{I}$ | Identity map |
Matrix Differentiation (Scalar by Matrix)
| Function | Derivative $\dfrac{\partial}{\partial \boldsymbol{X}}$ | Notes |
|---|---|---|
| $\mathrm{tr}(\boldsymbol{A}\boldsymbol{X})$ | $\boldsymbol{A}^\top$ | Trace derivative (most fundamental) |
| $\mathrm{tr}(\boldsymbol{X}^\top\boldsymbol{A}\boldsymbol{X})$ | $(\boldsymbol{A}+\boldsymbol{A}^\top)\boldsymbol{X}$ | Trace of quadratic form |
| $\mathrm{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$ | $-\boldsymbol{X}^{-\top}\boldsymbol{A}^\top\boldsymbol{B}^\top\boldsymbol{X}^{-\top}$ | Trace involving inverse |
| $\log|\boldsymbol{X}|$ | $\boldsymbol{X}^{-\top}$ | Log-determinant (common in MLE) |
| $|\boldsymbol{X}|$ | $|\boldsymbol{X}|\,\boldsymbol{X}^{-\top}$ | Determinant derivative |
| $\boldsymbol{X}^{-1}$ (matrix-valued) | $-\boldsymbol{X}^{-1} d\boldsymbol{X}\, \boldsymbol{X}^{-1}$ | Inverse matrix derivative |
All formulas above use the denominator layout (Jacobian formulation). In the numerator layout, results are transposed. See the Introduction for details on choosing a layout.
Denominator Layout vs. Numerator Layout
Two notational conventions coexist in matrix calculus, and different references use different ones. Results differ by a transpose, so mixing them leads to errors.
| Aspect | Denominator Layout | Numerator Layout |
|---|---|---|
| $\dfrac{\partial f}{\partial \boldsymbol{x}}$ ($f$: scalar, $\boldsymbol{x} \in \mathbb{R}^n$) | $n \times 1$ column vector | $1 \times n$ row vector |
| Also known as | Jacobian formulation | Hessian formulation |
| Dominant fields | Statistics, ML, optimization | Some engineering & physics |
| Chain rule order | Right to left | Left to right |
This series adopts the denominator layout. For a detailed comparison, see the Introduction. For layout trends across ~60 disciplines, see Layout Conventions by Field.
What Makes This Series Unique
- 16-chapter formula collection — Covers scalar, vector, and matrix differentiation exhaustively (trace, determinant, inverse, eigenvalue, norm, structured matrices, complex matrices)
- Full proofs for every formula — Rigorous derivations using component-wise computation and cofactor expansion in the Proofs collection (16 chapters + 3 Lie group chapters)
- 22 applied fields — Concrete applications from statistics and machine learning to molecular dynamics, robotics, and financial engineering
- Both layout conventions — A conversion table between denominator and numerator layouts in Appendix A of the Formula Sheet
- Lie group differentiation — Extended coverage of rotation matrices $SO(3)$ and rigid transformations $SE(3)$
Series Contents
-
Chapter 1
Introduction to Vector/Matrix Calculus
Why it is needed, what it can do, and choosing a layout convention
-
Chapter 2
Vector/Matrix Calculus Formula Sheet
Complete 16-chapter collection of differentiation formulas for scalars, vectors, and matrices
-
Chapter 3
Proofs of Vector/Matrix Calculus Formulas
Detailed derivations (16 chapters + 3 Lie group chapters + 4 application chapters)
-
Chapter 4
Introduction to Tensor Calculus
Generalization to higher-order tensors
-
Chapter 5
Automatic Differentiation and Optimization
Connection with deep learning frameworks
-
Appendix
Layout Conventions by Field
Notation trends across approximately 60 disciplines
Prerequisites
- Fundamentals of linear algebra (matrix operations, transpose, inverse, determinant)
- Fundamentals of calculus (partial derivatives, chain rule)
- Basics of vector analysis (gradient)