Vector & Matrix Calculus

Formula Cheat Sheet, Proofs & Applications — 16 Chapters + 22 Applied Fields

What Is Vector/Matrix Calculus?

Vector/matrix calculus (Matrix Calculus) is a field that systematically handles differentiation of functions whose variables are scalars, vectors, or matrices. By organizing partial derivatives into matrix and vector forms, it enables concise notation for multi-dimensional gradient computations.

Intuitive Understanding

The derivative of a single-variable function $f(x)$ is the scalar $\dfrac{df}{dx}$. What happens when we differentiate an $n$-variable function $f(\boldsymbol{x})$ with respect to $\boldsymbol{x} = (x_1, \ldots, x_n)^\top$? The answer is the gradient vector (a vector of all partial derivatives):

$$\frac{\partial f}{\partial \boldsymbol{x}} = \begin{pmatrix} \dfrac{\partial f}{\partial x_1} \\[6pt] \vdots \\[6pt] \dfrac{\partial f}{\partial x_n} \end{pmatrix}$$

Similarly, differentiating a vector-valued function $\boldsymbol{f}(\boldsymbol{x})$ with respect to a vector yields the Jacobian matrix, and differentiating a scalar function with respect to a matrix $\boldsymbol{X}$ yields the matrix gradient. This is the fundamental idea behind vector and matrix differentiation.

Types of Differentiation and Result Shapes

Function Variable Result Shape Name
Scalar $f$ Vector $\boldsymbol{x} \in \mathbb{R}^n$ $n$-dim vector Gradient
Vector $\boldsymbol{f} \in \mathbb{R}^m$ Vector $\boldsymbol{x} \in \mathbb{R}^n$ $n \times m$ matrix Jacobian matrix
Scalar $f$ Vector $\boldsymbol{x}$ (2nd derivative) $n \times n$ matrix Hessian matrix
Scalar $f$ Matrix $\boldsymbol{X} \in \mathbb{R}^{m \times n}$ $m \times n$ matrix Matrix gradient

Why Is It Needed?

  • Machine Learning: Backpropagation in neural networks is matrix calculus itself — computing the gradient of the loss with respect to weight matrices $\frac{\partial L}{\partial \boldsymbol{W}}$
  • Statistics: Maximum likelihood estimation requires gradients of log-likelihoods with respect to parameter vectors. Also essential for Fisher information matrices and Gaussian processes
  • Optimization: Newton's method and quasi-Newton methods use the gradient (1st order) and Hessian (2nd order) to accelerate convergence
  • Control Engineering: Derivation of LQR (Linear Quadratic Regulator) and Kalman filters requires differentiation of quadratic forms
  • Physics: Strain energy functions in continuum mechanics, Green's function derivatives in electromagnetism, etc.

Quick-Reference Formula Table (Denominator Layout)

Below are the most frequently used formulas. For the complete collection, see the Formula Sheet.

Vector Differentiation (Scalar by Vector)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ Notes
$\boldsymbol{a}^\top \boldsymbol{x}$ $\boldsymbol{a}$ Linear function
$\boldsymbol{x}^\top \boldsymbol{A} \boldsymbol{x}$ $(\boldsymbol{A} + \boldsymbol{A}^\top)\boldsymbol{x}$ Quadratic form ($2\boldsymbol{A}\boldsymbol{x}$ if $\boldsymbol{A}$ is symmetric)
$\|\boldsymbol{x}\|^2 = \boldsymbol{x}^\top\boldsymbol{x}$ $2\boldsymbol{x}$ Squared norm
$\boldsymbol{x}^\top \boldsymbol{A}^\top \boldsymbol{b}$ $\boldsymbol{A}^\top \boldsymbol{b}$ Affine transformation
$\|\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\|^2$ $2\boldsymbol{A}^\top(\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b})$ Normal equation (least squares)

Vector Differentiation (Vector by Vector)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$ Notes
$\boldsymbol{A}\boldsymbol{x}$ $\boldsymbol{A}^\top$ Jacobian (denominator layout; numerator layout gives $\boldsymbol{A}$)
$\boldsymbol{x}$ $\boldsymbol{I}$ Identity map

Matrix Differentiation (Scalar by Matrix)

Function Derivative $\dfrac{\partial}{\partial \boldsymbol{X}}$ Notes
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X})$ $\boldsymbol{A}^\top$ Trace derivative (most fundamental)
$\mathrm{tr}(\boldsymbol{X}^\top\boldsymbol{A}\boldsymbol{X})$ $(\boldsymbol{A}+\boldsymbol{A}^\top)\boldsymbol{X}$ Trace of quadratic form
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$ $-\boldsymbol{X}^{-\top}\boldsymbol{A}^\top\boldsymbol{B}^\top\boldsymbol{X}^{-\top}$ Trace involving inverse
$\log|\boldsymbol{X}|$ $\boldsymbol{X}^{-\top}$ Log-determinant (common in MLE)
$|\boldsymbol{X}|$ $|\boldsymbol{X}|\,\boldsymbol{X}^{-\top}$ Determinant derivative
$\boldsymbol{X}^{-1}$ (matrix-valued) $d(\boldsymbol{X}^{-1}) = -\boldsymbol{X}^{-1}\, d\boldsymbol{X}\, \boldsymbol{X}^{-1}$ Inverse matrix derivative (differential form; the elementwise $\partial/\partial X_{kl}$ is a 4th-order tensor, so this form is preferred in practice)

All formulas above use the denominator layout (Jacobian formulation). In the numerator layout, results are transposed. See the Introduction for details on choosing a layout.

Denominator Layout vs. Numerator Layout

Two notational conventions coexist in matrix calculus, and different references use different ones. Results differ by a transpose, so mixing them leads to errors.

Aspect Denominator Layout Numerator Layout
$\dfrac{\partial f}{\partial \boldsymbol{x}}$ ($f$: scalar, $\boldsymbol{x} \in \mathbb{R}^n$) $n \times 1$ column vector $1 \times n$ row vector
Also known as Jacobian formulation Hessian formulation
Dominant fields Statistics, ML, optimization Some engineering & physics
Chain rule order Right to left:
$\dfrac{\partial z}{\partial \boldsymbol{x}} = \dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}\, \dfrac{\partial z}{\partial \boldsymbol{y}}$
Left to right:
$\dfrac{\partial z}{\partial \boldsymbol{x}} = \dfrac{\partial z}{\partial \boldsymbol{y}}\, \dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}$

This series adopts the denominator layout. For a detailed comparison, see the Introduction. For layout trends across ~60 disciplines, see Layout Conventions by Field.

What Makes This Series Unique

  • 16-chapter formula collection — Covers scalar, vector, and matrix differentiation exhaustively (trace, determinant, inverse, eigenvalue, norm, structured matrices, complex matrices)
  • Full proofs for every formula — Rigorous derivations using component-wise computation and cofactor expansion in the Proofs collection (16 chapters + 3 Lie group chapters)
  • 22 applied fields — Concrete applications from statistics and machine learning to molecular dynamics, robotics, and financial engineering
  • Both layout conventions — A conversion table between denominator and numerator layouts in Appendix A of the Formula Sheet
  • Lie group differentiation — Extended coverage of rotation matrices $SO(3)$ and rigid transformations $SE(3)$

Series Contents

  1. Introduction to Vector/Matrix Calculus

    Why it is needed, what it can do, and choosing a layout convention

  2. Vector/Matrix Calculus Formula Sheet

    Complete 16-chapter collection of differentiation formulas for scalars, vectors, and matrices

  3. Proofs of Vector/Matrix Calculus Formulas

    Detailed derivations (16 chapters + 3 Lie group chapters + 4 application chapters)

  4. Applications to Machine Learning

    Gradient formulas for neural networks, SVD, Fisher information, reinforcement learning, and NLP

  5. Applications to Statistics (hub)

    Foundational distributions, latent variables, variance components, spatial, neural, and information geometry — 60+ formulas across 6 themes

  6. Introduction to Tensor Calculus

    Generalization to higher-order tensors

  7. Automatic Differentiation and Optimization

    Connection with deep learning frameworks

  8. Layout Conventions by Field

    Notation trends across approximately 60 disciplines

Prerequisites

  • Fundamentals of linear algebra (matrix operations, transpose, inverse, determinant)
  • Fundamentals of calculus (partial derivatives, chain rule)
  • Basics of vector analysis (gradient)