Vector & Matrix Calculus

Formula Cheat Sheet, Proofs & Applications — 16 Chapters + 22 Applied Fields

What Is Vector/Matrix Calculus?

Vector/matrix calculus (Matrix Calculus) is a field that systematically handles differentiation of functions whose variables are scalars, vectors, or matrices. By organizing partial derivatives into matrix and vector forms, it enables concise notation for multi-dimensional gradient computations.

Intuitive Understanding

The derivative of a single-variable function $f(x)$ is the scalar $\dfrac{df}{dx}$. What happens when we differentiate an $n$-variable function $f(\boldsymbol{x})$ with respect to $\boldsymbol{x} = (x_1, \ldots, x_n)^\top$? The answer is the gradient vector (a vector of all partial derivatives):

$$\frac{\partial f}{\partial \boldsymbol{x}} = \begin{pmatrix} \dfrac{\partial f}{\partial x_1} \\[6pt] \vdots \\[6pt] \dfrac{\partial f}{\partial x_n} \end{pmatrix}$$

Similarly, differentiating a vector-valued function $\boldsymbol{f}(\boldsymbol{x})$ with respect to a vector yields the Jacobian matrix, and differentiating a scalar function with respect to a matrix $\boldsymbol{X}$ yields the matrix gradient. This is the fundamental idea behind vector and matrix differentiation.

Types of Differentiation and Result Shapes

Function Variable Result Shape Name
Scalar $f$ Vector $\boldsymbol{x} \in \mathbb{R}^n$ $n$-dim vector Gradient
Vector $\boldsymbol{f} \in \mathbb{R}^m$ Vector $\boldsymbol{x} \in \mathbb{R}^n$ $n \times m$ matrix Jacobian matrix
Scalar $f$ Vector $\boldsymbol{x}$ (2nd derivative) $n \times n$ matrix Hessian matrix
Scalar $f$ Matrix $\boldsymbol{X} \in \mathbb{R}^{m \times n}$ $m \times n$ matrix Matrix gradient

Why Is It Needed?

  • Machine Learning: Backpropagation in neural networks is matrix calculus itself — computing the gradient of the loss with respect to weight matrices $\frac{\partial L}{\partial \boldsymbol{W}}$
  • Statistics: Maximum likelihood estimation requires gradients of log-likelihoods with respect to parameter vectors. Also essential for Fisher information matrices and Gaussian processes
  • Optimization: Newton's method and quasi-Newton methods use the gradient (1st order) and Hessian (2nd order) to accelerate convergence
  • Control Engineering: Derivation of LQR (Linear Quadratic Regulator) and Kalman filters requires differentiation of quadratic forms
  • Physics: Strain energy functions in continuum mechanics, Green's function derivatives in electromagnetism, etc.

Quick-Reference Formula Table (Denominator Layout)

Below are the most frequently used formulas. For the complete collection, see the Formula Sheet.

Vector Differentiation (Scalar by Vector)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ Notes
$\boldsymbol{a}^\top \boldsymbol{x}$ $\boldsymbol{a}$ Linear function
$\boldsymbol{x}^\top \boldsymbol{A} \boldsymbol{x}$ $(\boldsymbol{A} + \boldsymbol{A}^\top)\boldsymbol{x}$ Quadratic form ($2\boldsymbol{A}\boldsymbol{x}$ if $\boldsymbol{A}$ is symmetric)
$\|\boldsymbol{x}\|^2 = \boldsymbol{x}^\top\boldsymbol{x}$ $2\boldsymbol{x}$ Squared norm
$\boldsymbol{x}^\top \boldsymbol{A}^\top \boldsymbol{b}$ $\boldsymbol{A}^\top \boldsymbol{b}$ Affine transformation
$\|\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\|^2$ $2\boldsymbol{A}^\top(\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b})$ Normal equation (least squares)

Vector Differentiation (Vector by Vector)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ Notes
$\boldsymbol{A}\boldsymbol{x}$ $\boldsymbol{A}^\top$ Jacobian
$\boldsymbol{x}$ $\boldsymbol{I}$ Identity map

Matrix Differentiation (Scalar by Matrix)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{X}}$ Notes
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X})$ $\boldsymbol{A}^\top$ Trace derivative (most fundamental)
$\mathrm{tr}(\boldsymbol{X}^\top\boldsymbol{A}\boldsymbol{X})$ $(\boldsymbol{A}+\boldsymbol{A}^\top)\boldsymbol{X}$ Trace of quadratic form
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$ $-\boldsymbol{X}^{-\top}\boldsymbol{A}^\top\boldsymbol{B}^\top\boldsymbol{X}^{-\top}$ Trace involving inverse
$\log|\boldsymbol{X}|$ $\boldsymbol{X}^{-\top}$ Log-determinant (common in MLE)
$|\boldsymbol{X}|$ $|\boldsymbol{X}|\,\boldsymbol{X}^{-\top}$ Determinant derivative
$\boldsymbol{X}^{-1}$ (matrix-valued) $-\boldsymbol{X}^{-1} d\boldsymbol{X}\, \boldsymbol{X}^{-1}$ Inverse matrix derivative

All formulas above use the denominator layout (Jacobian formulation). In the numerator layout, results are transposed. See the Introduction for details on choosing a layout.

Denominator Layout vs. Numerator Layout

Two notational conventions coexist in matrix calculus, and different references use different ones. Results differ by a transpose, so mixing them leads to errors.

Aspect Denominator Layout Numerator Layout
$\dfrac{\partial f}{\partial \boldsymbol{x}}$ ($f$: scalar, $\boldsymbol{x} \in \mathbb{R}^n$) $n \times 1$ column vector $1 \times n$ row vector
Also known as Jacobian formulation Hessian formulation
Dominant fields Statistics, ML, optimization Some engineering & physics
Chain rule order Right to left Left to right

This series adopts the denominator layout. For a detailed comparison, see the Introduction. For layout trends across ~60 disciplines, see Layout Conventions by Field.

What Makes This Series Unique

  • 16-chapter formula collection — Covers scalar, vector, and matrix differentiation exhaustively (trace, determinant, inverse, eigenvalue, norm, structured matrices, complex matrices)
  • Full proofs for every formula — Rigorous derivations using component-wise computation and cofactor expansion in the Proofs collection (16 chapters + 3 Lie group chapters)
  • 22 applied fields — Concrete applications from statistics and machine learning to molecular dynamics, robotics, and financial engineering
  • Both layout conventions — A conversion table between denominator and numerator layouts in Appendix A of the Formula Sheet
  • Lie group differentiation — Extended coverage of rotation matrices $SO(3)$ and rigid transformations $SE(3)$

Series Contents

  1. Chapter 1 Introduction to Vector/Matrix Calculus

    Why it is needed, what it can do, and choosing a layout convention

  2. Chapter 2 Vector/Matrix Calculus Formula Sheet

    Complete 16-chapter collection of differentiation formulas for scalars, vectors, and matrices

  3. Chapter 3 Proofs of Vector/Matrix Calculus Formulas

    Detailed derivations (16 chapters + 3 Lie group chapters + 4 application chapters)

  4. Chapter 4 Introduction to Tensor Calculus

    Generalization to higher-order tensors

  5. Chapter 5 Automatic Differentiation and Optimization

    Connection with deep learning frameworks

  6. Appendix Layout Conventions by Field

    Notation trends across approximately 60 disciplines

Prerequisites

  • Fundamentals of linear algebra (matrix operations, transpose, inverse, determinant)
  • Fundamentals of calculus (partial derivatives, chain rule)
  • Basics of vector analysis (gradient)