What is the difference between denominator layout and numerator layout?

In the denominator layout (Jacobian formulation), differentiating a scalar by a vector yields a column vector. In the numerator layout (Hessian formulation), it yields a row vector. The denominator layout is dominant in machine learning and statistics. Formulas differ by a transpose depending on the convention, so care is needed when reading different references.

What are the most commonly used matrix calculus formulas?

The most frequently used formulas are: ∂(Ax)/∂x = Aᵀ (linear transformation), ∂(xᵀAx)/∂x = (A + Aᵀ)x (quadratic form), ∂tr(AX)/∂X = Aᵀ (trace derivative), and ∂log|X|/∂X = X⁻ᵀ (log-determinant derivative). Quadratic forms and trace derivatives are especially common in machine learning.

What fields use vector/matrix calculus?

It is used in machine learning (gradient computation for loss functions, backpropagation), statistics (maximum likelihood estimation, Fisher information matrix), optimization (Newton's method, quasi-Newton methods), control engineering (LQR, Kalman filter), physics simulations, signal processing, and econometrics, among many others.

What is the difference between vector differentiation and matrix differentiation?

Vector differentiation refers to differentiating a scalar or vector with respect to a vector (producing gradients or Jacobian matrices). Matrix differentiation refers to differentiating a scalar or matrix with respect to a matrix (involving trace and determinant derivative formulas). Together they are called Matrix Calculus.

Vector & Matrix Calculus

Q: What is vector/matrix calculus?

Vector/matrix calculus (Matrix Calculus) is a field that systematically handles differentiation of functions whose variables are scalars, vectors, or matrices. By organizing partial derivatives into matrix and vector forms, it enables concise notation for gradient computations in machine learning, maximum likelihood estimation in statistics, and more.

Formula Cheat Sheet, Proofs & Applications — 16 Chapters + 22 Applied Fields

What Is Vector/Matrix Calculus?

Vector/matrix calculus (Matrix Calculus) is a field that systematically handles differentiation of functions whose variables are scalars, vectors, or matrices. By organizing partial derivatives into matrix and vector forms, it enables concise notation for multi-dimensional gradient computations.

Intuitive Understanding

The derivative of a single-variable function $f(x)$ is the scalar $\dfrac{df}{dx}$. What happens when we differentiate an $n$-variable function $f(\boldsymbol{x})$ with respect to $\boldsymbol{x} = (x_1, \ldots, x_n)^\top$? The answer is the gradient vector (a vector of all partial derivatives):

$$\frac{\partial f}{\partial \boldsymbol{x}} = \begin{pmatrix} \dfrac{\partial f}{\partial x_1} \\[6pt] \vdots \\[6pt] \dfrac{\partial f}{\partial x_n} \end{pmatrix}$$

Similarly, differentiating a vector-valued function $\boldsymbol{f}(\boldsymbol{x})$ with respect to a vector yields the Jacobian matrix, and differentiating a scalar function with respect to a matrix $\boldsymbol{X}$ yields the matrix gradient. This is the fundamental idea behind vector and matrix differentiation.

Types of Differentiation and Result Shapes

Function	Variable	Result Shape	Name
Scalar $f$	Vector $\boldsymbol{x} \in \mathbb{R}^n$	$n$-dim vector	Gradient
Vector $\boldsymbol{f} \in \mathbb{R}^m$	Vector $\boldsymbol{x} \in \mathbb{R}^n$	$n \times m$ matrix	Jacobian matrix
Scalar $f$	Vector $\boldsymbol{x}$ (2nd derivative)	$n \times n$ matrix	Hessian matrix
Scalar $f$	Matrix $\boldsymbol{X} \in \mathbb{R}^{m \times n}$	$m \times n$ matrix	Matrix gradient

Why Is It Needed?

Machine Learning: Backpropagation in neural networks is matrix calculus itself — computing the gradient of the loss with respect to weight matrices $\frac{\partial L}{\partial \boldsymbol{W}}$
Statistics: Maximum likelihood estimation requires gradients of log-likelihoods with respect to parameter vectors. Also essential for Fisher information matrices and Gaussian processes
Optimization: Newton's method and quasi-Newton methods use the gradient (1st order) and Hessian (2nd order) to accelerate convergence
Control Engineering: Derivation of LQR (Linear Quadratic Regulator) and Kalman filters requires differentiation of quadratic forms
Physics: Strain energy functions in continuum mechanics, Green's function derivatives in electromagnetism, etc.

Quick-Reference Formula Table (Denominator Layout)

Below are the most frequently used formulas. For the complete collection, see the Formula Sheet.

Vector Differentiation (Scalar by Vector)

Function	Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$	Notes
$\boldsymbol{a}^\top \boldsymbol{x}$	$\boldsymbol{a}$	Linear function
$\boldsymbol{x}^\top \boldsymbol{A} \boldsymbol{x}$	$(\boldsymbol{A} + \boldsymbol{A}^\top)\boldsymbol{x}$	Quadratic form ($2\boldsymbol{A}\boldsymbol{x}$ if $\boldsymbol{A}$ is symmetric)
$\\|\boldsymbol{x}\\|^2 = \boldsymbol{x}^\top\boldsymbol{x}$	$2\boldsymbol{x}$	Squared norm
$\boldsymbol{x}^\top \boldsymbol{A}^\top \boldsymbol{b}$	$\boldsymbol{A}^\top \boldsymbol{b}$	Affine transformation
$\\|\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\\|^2$	$2\boldsymbol{A}^\top(\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b})$	Normal equation (least squares)

Vector Differentiation (Vector by Vector)

Function	Derivative $\dfrac{\partial}{\partial \boldsymbol{x}}$	Notes
$\boldsymbol{A}\boldsymbol{x}$	$\boldsymbol{A}^\top$	Jacobian (denominator layout; numerator layout gives $\boldsymbol{A}$)
$\boldsymbol{x}$	$\boldsymbol{I}$	Identity map

Matrix Differentiation (Scalar by Matrix)

Function	Derivative $\dfrac{\partial}{\partial \boldsymbol{X}}$	Notes
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X})$	$\boldsymbol{A}^\top$	Trace derivative (most fundamental)
$\mathrm{tr}(\boldsymbol{X}^\top\boldsymbol{A}\boldsymbol{X})$	$(\boldsymbol{A}+\boldsymbol{A}^\top)\boldsymbol{X}$	Trace of quadratic form
$\mathrm{tr}(\boldsymbol{A}\boldsymbol{X}^{-1}\boldsymbol{B})$	$-\boldsymbol{X}^{-\top}\boldsymbol{A}^\top\boldsymbol{B}^\top\boldsymbol{X}^{-\top}$	Trace involving inverse
$\log\|\boldsymbol{X}\|$	$\boldsymbol{X}^{-\top}$	Log-determinant (common in MLE)
$\|\boldsymbol{X}\|$	$\|\boldsymbol{X}\|\,\boldsymbol{X}^{-\top}$	Determinant derivative
$\boldsymbol{X}^{-1}$ (matrix-valued)	$d(\boldsymbol{X}^{-1}) = -\boldsymbol{X}^{-1}\, d\boldsymbol{X}\, \boldsymbol{X}^{-1}$	Inverse matrix derivative (differential form; the elementwise $\partial/\partial X_{kl}$ is a 4th-order tensor, so this form is preferred in practice)

All formulas above use the denominator layout (Jacobian formulation). In the numerator layout, results are transposed. See the Introduction for details on choosing a layout.

Denominator Layout vs. Numerator Layout

Two notational conventions coexist in matrix calculus, and different references use different ones. Results differ by a transpose, so mixing them leads to errors.

Aspect	Denominator Layout	Numerator Layout
$\dfrac{\partial f}{\partial \boldsymbol{x}}$ ($f$: scalar, $\boldsymbol{x} \in \mathbb{R}^n$)	$n \times 1$ column vector	$1 \times n$ row vector
Also known as	Jacobian formulation	Hessian formulation
Dominant fields	Statistics, ML, optimization	Some engineering & physics
Chain rule order	Right to left: $\dfrac{\partial z}{\partial \boldsymbol{x}} = \dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}\, \dfrac{\partial z}{\partial \boldsymbol{y}}$	Left to right: $\dfrac{\partial z}{\partial \boldsymbol{x}} = \dfrac{\partial z}{\partial \boldsymbol{y}}\, \dfrac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}$

This series adopts the denominator layout. For a detailed comparison, see the Introduction. For layout trends across ~60 disciplines, see Layout Conventions by Field.

What Makes This Series Unique

16-chapter formula collection — Covers scalar, vector, and matrix differentiation exhaustively (trace, determinant, inverse, eigenvalue, norm, structured matrices, complex matrices)
Full proofs for every formula — Rigorous derivations using component-wise computation and cofactor expansion in the Proofs collection (16 chapters + 3 Lie group chapters)
22 applied fields — Concrete applications from statistics and machine learning to molecular dynamics, robotics, and financial engineering
Both layout conventions — A conversion table between denominator and numerator layouts in Appendix A of the Formula Sheet
Lie group differentiation — Extended coverage of rotation matrices $SO(3)$ and rigid transformations $SE(3)$

Series Contents

Introduction to Vector/Matrix Calculus
Why it is needed, what it can do, and choosing a layout convention
Vector/Matrix Calculus Formula Sheet
Complete 16-chapter collection of differentiation formulas for scalars, vectors, and matrices
Proofs of Vector/Matrix Calculus Formulas
Detailed derivations (16 chapters + 3 Lie group chapters + 4 application chapters)
Applications to Machine Learning
Gradient formulas for neural networks, SVD, Fisher information, reinforcement learning, and NLP
Applications to Statistics (hub)
Foundational distributions, latent variables, variance components, spatial, neural, and information geometry — 60+ formulas across 6 themes
Introduction to Tensor Calculus
Generalization to higher-order tensors
Automatic Differentiation and Optimization
Connection with deep learning frameworks
Layout Conventions by Field
Notation trends across approximately 60 disciplines

Prerequisites

Fundamentals of linear algebra (matrix operations, transpose, inverse, determinant)
Fundamentals of calculus (partial derivatives, chain rule)
Basics of vector analysis (gradient)