Proofs Chapter 1: Scalar Derivatives of a Single Variable
Proofs Chapter 1: Scalar Derivatives of Single Variable
1. Scalar Derivatives of a Single Variable
This chapter rigorously proves differentiation formulas for scalar single-variable functions, from definitions through basic formulas, which form the foundation of matrix calculus. Many formulas in matrix calculus are derived by applying single-variable differentiation results component-wise. Therefore, understanding single-variable differentiation is an essential prerequisite for studying matrix calculus.
This proof series adopts the denominator layout for matrix differentiation in Chapter 2 and beyond. In the denominator layout, differentiating a scalar by a vector yields a column vector, and differentiating a vector by a scalar yields a row vector. See Layout Conventions for details.
Roadmap for This Chapter
This chapter develops the theory of differentiation in the following structure. Each theorem and formula is arranged in a logical order such that it is proved before being used.
- 1.1 Definition and Basic Concepts of Differentiation (1.1--1.3): Definition of the derivative, derivative function, differentiability and continuity
- 1.2 Fundamental Theorems and Identities (1.4--1.10): Pascal's identity, binomial theorem, trigonometric identities and fundamental limits
- 1.3 Fundamental Theorems of Linear Algebra (1.11--1.15): Properties of trace and determinant
- 1.4 Derivatives of Basic Functions (1.16--1.23): Constant, power, exponential, and logarithmic functions
- 1.5 Rules of Differentiation (1.24--1.29): Linearity, product, quotient, chain, and inverse function rules
- 1.6 Derivatives of Trigonometric Functions (1.30--1.33): sin, cos, tan, etc.
- 1.7 Derivatives of Inverse Trigonometric Functions (1.34--1.36): arcsin, arccos, arctan
- 1.8 Derivatives of Hyperbolic Functions (1.37--1.39): sinh, cosh, tanh
- 1.9 Other Important Differentiation Formulas (1.40--1.43): Absolute value, sigmoid, Softplus, Leibniz formula
1.1 Definition and Basic Concepts of Differentiation
The derivative of a function $f(x)$ at a point $x = a$ represents the instantaneous rate of change at that point. This is a fundamental concept in various fields, including velocity in physics (rate of change of position with respect to time) and marginal utility in economics (rate of change of utility with respect to consumption).
1.1 Definition of the Derivative at a Point
Explanation
We explain this definition geometrically.
Consider the slope of the line (secant line) connecting the points $(a, f(a))$ and $(a+h, f(a+h))$.
\begin{equation}\text{Slope of secant} = \frac{f(a+h) - f(a)}{(a+h) - a} = \frac{f(a+h) - f(a)}{h} \label{eq:1-1-1}\end{equation}
As $h \to 0$, the point $(a+h, f(a+h))$ approaches the point $(a, f(a))$ along the curve. The secant line then approaches the tangent line.
\begin{equation}f'(a) = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} \label{eq:1-1-2}\end{equation}
Besides $f'(a)$, the derivative is also written as $\displaystyle \frac{df}{dx}\bigg|_{x=a}$.
1.2 Definition of the Derivative Function
Explanation
The derivative $f'(a)$ was the value at a specific point $a$. By replacing $a$ with a variable $x$, we define the derivative as a function of $x$.
\begin{equation}f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \label{eq:1-2-1}\end{equation}
There are several notational conventions for the derivative function.
Leibniz notation: $\displaystyle \frac{df}{dx}$, $\displaystyle \frac{d}{dx}f(x)$
Lagrange notation: $f'(x)$
Newton notation: $\dot{f}$ (commonly used for time derivatives)
Higher-order derivatives are defined as follows.
\begin{equation}f''(x) = \frac{d^2f}{dx^2} = \frac{d}{dx}\left(\frac{df}{dx}\right) \label{eq:1-2-2}\end{equation}
\begin{equation}f^{(n)}(x) = \frac{d^n f}{dx^n} = \frac{d}{dx}\left(\frac{d^{n-1}f}{dx^{n-1}}\right) \label{eq:1-2-3}\end{equation}
1.3 Differentiability and Continuity
Proof
Premise: This proof uses basic properties of limits (limit laws for sums, products, and scalar multiples) as known results.
Assume that $f$ is differentiable at $a$. By the definition of differentiability, the limit
\begin{equation}f'(a) = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} \label{eq:1-3-1}\end{equation}
exists.
To show continuity, we need to prove that $\lim_{h \to 0} f(a+h) = f(a)$.
For $h \neq 0$, we rewrite $f(a+h) - f(a)$ as follows.
\begin{equation}f(a+h) - f(a) = \frac{f(a+h) - f(a)}{h} \cdot h \label{eq:1-3-2}\end{equation}
Taking the limit as $h \to 0$ on both sides of $\eqref{eq:1-3-2}$, by the product rule for limits $\lim (AB) = (\lim A)(\lim B)$ (when both limits exist),
\begin{equation}\lim_{h \to 0} [f(a+h) - f(a)] = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} \cdot \lim_{h \to 0} h \label{eq:1-3-3}\end{equation}
By $\eqref{eq:1-3-1}$, the first factor converges to $f'(a)$ (a finite value), and the second factor converges to 0.
\begin{equation}\lim_{h \to 0} [f(a+h) - f(a)] = f'(a) \cdot 0 = 0 \label{eq:1-3-4}\end{equation}
From $\eqref{eq:1-3-4}$,
\begin{equation}\lim_{h \to 0} f(a+h) = f(a) \label{eq:1-3-5}\end{equation}
$\eqref{eq:1-3-5}$ means that $f$ is continuous at $a$.
1.2 Fundamental Theorems and Identities
This section proves fundamental theorems and identities needed for deriving differentiation formulas. These serve as the foundation referenced in subsequent proofs.
1.4 Pascal's Identity
Proof
We compute directly from the definition of binomial coefficients.
By the definition of binomial coefficients,
\begin{equation}\binom{n}{k-1} = \frac{n!}{(k-1)!(n-k+1)!}, \quad \binom{n}{k} = \frac{n!}{k!(n-k)!} \label{eq:1-4-1}\end{equation}
We compute the left-hand side. To find a common denominator, we use $k!(n-k+1)!$.
\begin{equation}\binom{n}{k-1} + \binom{n}{k} = \frac{n! \cdot k}{k!(n-k+1)!} + \frac{n! \cdot (n-k+1)}{k!(n-k+1)!} \label{eq:1-4-2}\end{equation}
Simplifying the numerator,
\begin{equation}\binom{n}{k-1} + \binom{n}{k} = \frac{n! (k + n - k + 1)}{k!(n-k+1)!} = \frac{n! (n + 1)}{k!(n-k+1)!} \label{eq:1-4-3}\end{equation}
Since $n! (n + 1) = (n + 1)!$,
\begin{equation}\binom{n}{k-1} + \binom{n}{k} = \frac{(n+1)!}{k!(n+1-k)!} = \binom{n+1}{k} \label{eq:1-4-4}\end{equation}
1.5 Binomial Theorem
Proof
We prove this by mathematical induction.
Base case: For $n = 0$, the left-hand side is $(x + y)^0 = 1$, and the right-hand side is $\sum_{k=0}^{0} \binom{0}{0} x^0 y^0 = 1$, which agree.
Inductive step: Assume the formula holds for $n = m$.
\begin{equation}(x + y)^m = \sum_{k=0}^{m} \binom{m}{k} x^{m-k} y^k \label{eq:1-5-1}\end{equation}
Consider the case $n = m + 1$.
\begin{equation}(x + y)^{m+1} = (x + y)(x + y)^m = (x + y) \sum_{k=0}^{m} \binom{m}{k} x^{m-k} y^k \label{eq:1-5-2}\end{equation}
Expanding $\eqref{eq:1-5-2}$,
\begin{equation}(x + y)^{m+1} = \sum_{k=0}^{m} \binom{m}{k} x^{m+1-k} y^k + \sum_{k=0}^{m} \binom{m}{k} x^{m-k} y^{k+1} \label{eq:1-5-3}\end{equation}
Substituting $j = k + 1$ in the second sum (so $k = j - 1$),
\begin{equation}\sum_{k=0}^{m} \binom{m}{k} x^{m-k} y^{k+1} = \sum_{j=1}^{m+1} \binom{m}{j-1} x^{m+1-j} y^j \label{eq:1-5-4}\end{equation}
Combining $\eqref{eq:1-5-3}$ and $\eqref{eq:1-5-4}$,
\begin{equation}(x + y)^{m+1} = \binom{m}{0} x^{m+1} + \sum_{k=1}^{m} \left[ \binom{m}{k} + \binom{m}{k-1} \right] x^{m+1-k} y^k + \binom{m}{m} y^{m+1} \label{eq:1-5-5}\end{equation}
Using Pascal's identity (1.4) $\binom{m}{k} + \binom{m}{k-1} = \binom{m+1}{k}$, along with $\binom{m}{0} = \binom{m+1}{0} = 1$ and $\binom{m}{m} = \binom{m+1}{m+1} = 1$,
\begin{equation}(x + y)^{m+1} = \sum_{k=0}^{m+1} \binom{m+1}{k} x^{m+1-k} y^k \label{eq:1-5-6}\end{equation}
By mathematical induction, the binomial theorem holds for all non-negative integers $n$.
1.6 Pythagorean Identity
Proof
We prove this from the definition of the unit circle.
The unit circle is a circle of radius 1 centered at the origin, with the equation
\begin{equation}x^2 + y^2 = 1 \label{eq:1-6-1}\end{equation}
The coordinates of the point on the unit circle corresponding to angle $\theta$ are defined as $(\cos\theta, \sin\theta)$.
\begin{equation}(x, y) = (\cos\theta, \sin\theta) \label{eq:1-6-2}\end{equation}
Substituting $\eqref{eq:1-6-2}$ into $\eqref{eq:1-6-1}$,
\begin{equation}\cos^2\theta + \sin^2\theta = 1 \label{eq:1-6-3}\end{equation}
Renaming the variable, for all $x \in \mathbb{R}$,
\begin{equation}\sin^2 x + \cos^2 x = 1 \label{eq:1-6-4}\end{equation}
1.7 Addition Formulas for Trigonometric Functions
Proof
We prove this using the unit circle and rotation matrices.
The point on the unit circle corresponding to angle $\theta$ is $(\cos\theta, \sin\theta)$. This equals the point $(1, 0)$ rotated about the origin by angle $\theta$.
The rotation matrix for angle $\theta$ is
\begin{equation}R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \label{eq:1-7-1}\end{equation}
By the composition of rotations, a rotation by angle $x$ followed by a rotation by angle $y$ equals a rotation by angle $x + y$.
\begin{equation}R(x + y) = R(y) R(x) \label{eq:1-7-2}\end{equation}
Computing the right-hand side of $\eqref{eq:1-7-2}$,
\begin{equation}R(y) R(x) = \begin{pmatrix} \cos y & -\sin y \\ \sin y & \cos y \end{pmatrix} \begin{pmatrix} \cos x & -\sin x \\ \sin x & \cos x \end{pmatrix} \label{eq:1-7-3}\end{equation}
Expanding the matrix product,
\begin{equation}R(y) R(x) = \begin{pmatrix} \cos y \cos x - \sin y \sin x & -\cos y \sin x - \sin y \cos x \\ \sin y \cos x + \cos y \sin x & -\sin y \sin x + \cos y \cos x \end{pmatrix} \label{eq:1-7-4}\end{equation}
The left-hand side of $\eqref{eq:1-7-2}$ is
\begin{equation}R(x + y) = \begin{pmatrix} \cos(x + y) & -\sin(x + y) \\ \sin(x + y) & \cos(x + y) \end{pmatrix} \label{eq:1-7-5}\end{equation}
Comparing the components of $\eqref{eq:1-7-4}$ and $\eqref{eq:1-7-5}$,
\begin{equation}\cos(x + y) = \cos x \cos y - \sin x \sin y \label{eq:1-7-6}\end{equation}
\begin{equation}\sin(x + y) = \sin x \cos y + \cos x \sin y \label{eq:1-7-7}\end{equation}
1.8 Fundamental Limit of the Sine Function
Proof
We give a geometric proof using the unit circle. Consider the case $0 < x < \frac{\pi}{2}$.
On the unit circle, consider the arc, chord, and tangent corresponding to central angle $x$ (in radians). Let $O$ be the origin, $A = (1, 0)$ be a point on the unit circle, $B = (\cos x, \sin x)$ be the point corresponding to angle $x$, and $C = (1, \tan x)$ be the intersection of the tangent to the unit circle at $A$ with the extension of line $OB$.
We compare the areas of these figures.
\begin{equation}\text{Area of } \triangle OAB < \text{Area of sector } OAB < \text{Area of } \triangle OAC \label{eq:1-8-1}\end{equation}
Computing each area,
\begin{equation}\triangle OAB = \frac{1}{2} \cdot 1 \cdot \sin x = \frac{\sin x}{2} \label{eq:1-8-2}\end{equation}
\begin{equation}\text{Sector } OAB = \frac{1}{2} \cdot 1^2 \cdot x = \frac{x}{2} \label{eq:1-8-3}\end{equation}
\begin{equation}\triangle OAC = \frac{1}{2} \cdot 1 \cdot \tan x = \frac{\tan x}{2} \label{eq:1-8-4}\end{equation}
Substituting $\eqref{eq:1-8-2}$, $\eqref{eq:1-8-3}$, $\eqref{eq:1-8-4}$ into $\eqref{eq:1-8-1}$,
\begin{equation}\frac{\sin x}{2} < \frac{x}{2} < \frac{\tan x}{2} = \frac{\sin x}{2 \cos x} \label{eq:1-8-5}\end{equation}
Dividing throughout by $\sin x > 0$ (since $0 < x < \frac{\pi}{2}$) and taking reciprocals,
\begin{equation}1 > \frac{\sin x}{x} > \cos x \label{eq:1-8-6}\end{equation}
That is,
\begin{equation}\cos x < \frac{\sin x}{x} < 1 \label{eq:1-8-7}\end{equation}
As $x \to 0^+$, $\cos x \to 1$, so by the squeeze theorem,
\begin{equation}\lim_{x \to 0^+} \frac{\sin x}{x} = 1 \label{eq:1-8-8}\end{equation}
Since $\frac{\sin x}{x}$ is an even function (because $\sin(-x) = -\sin x$ implies $\frac{\sin(-x)}{-x} = \frac{\sin x}{x}$), the limit is the same as $x \to 0^-$.
\begin{equation}\lim_{x \to 0} \frac{\sin x}{x} = 1 \label{eq:1-8-9}\end{equation}
1.9 Fundamental Limit of the Cosine Function
Proof
We prove this using the half-angle formula and 1.8.
By the half-angle formula,
\begin{equation}1 - \cos x = 2\sin^2\frac{x}{2} \label{eq:1-9-1}\end{equation}
Using $\eqref{eq:1-9-1}$,
\begin{equation}\frac{1 - \cos x}{x} = \frac{2\sin^2\frac{x}{2}}{x} = \frac{2\sin^2\frac{x}{2}}{2 \cdot \frac{x}{2}} = \sin\frac{x}{2} \cdot \frac{\sin\frac{x}{2}}{\frac{x}{2}} \label{eq:1-9-2}\end{equation}
As $x \to 0$, we have $\frac{x}{2} \to 0$, so by 1.8, $\frac{\sin\frac{x}{2}}{\frac{x}{2}} \to 1$. Also $\sin\frac{x}{2} \to 0$, hence
\begin{equation}\lim_{x \to 0} \frac{1 - \cos x}{x} = 0 \cdot 1 = 0 \label{eq:1-9-3}\end{equation}
1.10 Hyperbolic Identity
Proof
We compute directly from the definitions of the hyperbolic functions.
Computing $\cosh^2 x$,
\begin{equation}\cosh^2 x = \left( \frac{e^x + e^{-x}}{2} \right)^2 = \frac{e^{2x} + 2 + e^{-2x}}{4} \label{eq:1-10-1}\end{equation}
Computing $\sinh^2 x$,
\begin{equation}\sinh^2 x = \left( \frac{e^x - e^{-x}}{2} \right)^2 = \frac{e^{2x} - 2 + e^{-2x}}{4} \label{eq:1-10-2}\end{equation}
Subtracting $\eqref{eq:1-10-2}$ from $\eqref{eq:1-10-1}$,
\begin{equation}\cosh^2 x - \sinh^2 x = \frac{e^{2x} + 2 + e^{-2x}}{4} - \frac{e^{2x} - 2 + e^{-2x}}{4} = \frac{4}{4} = 1 \label{eq:1-10-3}\end{equation}
1.3 Fundamental Theorems of Linear Algebra
In matrix calculus, properties of the trace and determinant are frequently used. This section proves these basic properties.
1.11 Linearity of the Trace
Proof
We prove this directly from the definition of the trace.
The trace is defined as the sum of the diagonal elements.
\begin{equation}\text{tr}(\boldsymbol{A}) = \sum_{i=0}^{n-1} A_{ii} \label{eq:1-11-1}\end{equation}
The $(i, i)$ entry of $\alpha \boldsymbol{A} + \beta \boldsymbol{B}$ is $\alpha A_{ii} + \beta B_{ii}$.
\begin{equation}\text{tr}(\alpha \boldsymbol{A} + \beta \boldsymbol{B}) = \sum_{i=0}^{n-1} (\alpha A_{ii} + \beta B_{ii}) \label{eq:1-11-2}\end{equation}
By the linearity of summation,
\begin{equation}\text{tr}(\alpha \boldsymbol{A} + \beta \boldsymbol{B}) = \alpha \sum_{i=0}^{n-1} A_{ii} + \beta \sum_{i=0}^{n-1} B_{ii} = \alpha \text{tr}(\boldsymbol{A}) + \beta \text{tr}(\boldsymbol{B}) \label{eq:1-11-3}\end{equation}
1.12 Cyclic Property of the Trace
Proof
We first prove the two-matrix case $\text{tr}(\boldsymbol{AB}) = \text{tr}(\boldsymbol{BA})$, then extend to three matrices.
Let $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ and $\boldsymbol{B} \in \mathbb{R}^{n \times m}$. By the definition of matrix multiplication,
\begin{equation}(\boldsymbol{AB})_{ij} = \sum_{k=0}^{n-1} A_{ik} B_{kj} \label{eq:1-12-1}\end{equation}
Computing the trace of $\boldsymbol{AB} \in \mathbb{R}^{m \times m}$,
\begin{equation}\text{tr}(\boldsymbol{AB}) = \sum_{i=0}^{m-1} (\boldsymbol{AB})_{ii} = \sum_{i=0}^{m-1} \sum_{k=0}^{n-1} A_{ik} B_{ki} \label{eq:1-12-2}\end{equation}
Similarly, computing the trace of $\boldsymbol{BA} \in \mathbb{R}^{n \times n}$,
\begin{equation}\text{tr}(\boldsymbol{BA}) = \sum_{k=0}^{n-1} (\boldsymbol{BA})_{kk} = \sum_{k=0}^{n-1} \sum_{i=0}^{m-1} B_{ki} A_{ik} \label{eq:1-12-3}\end{equation}
Comparing $\eqref{eq:1-12-2}$ and $\eqref{eq:1-12-3}$, by interchanging the order of summation,
\begin{equation}\text{tr}(\boldsymbol{AB}) = \sum_{i=0}^{m-1} \sum_{k=0}^{n-1} A_{ik} B_{ki} = \sum_{k=0}^{n-1} \sum_{i=0}^{m-1} A_{ik} B_{ki} = \sum_{k=0}^{n-1} \sum_{i=0}^{m-1} B_{ki} A_{ik} = \text{tr}(\boldsymbol{BA}) \label{eq:1-12-4}\end{equation}
We extend to three matrices. Setting $\boldsymbol{D} = \boldsymbol{AB}$,
\begin{equation}\text{tr}(\boldsymbol{ABC}) = \text{tr}(\boldsymbol{DC}) = \text{tr}(\boldsymbol{CD}) = \text{tr}(\boldsymbol{CAB}) \label{eq:1-12-5}\end{equation}
Similarly, setting $\boldsymbol{E} = \boldsymbol{BC}$,
\begin{equation}\text{tr}(\boldsymbol{ABC}) = \text{tr}(\boldsymbol{AE}) = \text{tr}(\boldsymbol{EA}) = \text{tr}(\boldsymbol{BCA}) \label{eq:1-12-6}\end{equation}
1.13 Trace and Transpose
Proof
The diagonal elements of the transpose are the same as those of the original matrix.
By the definition of the transpose, $(\boldsymbol{A}^\top)_{ij} = A_{ji}$. In particular, for diagonal elements,
\begin{equation}(\boldsymbol{A}^\top)_{ii} = A_{ii} \label{eq:1-13-1}\end{equation}
By the definition of the trace,
\begin{equation}\text{tr}(\boldsymbol{A}^\top) = \sum_{i=0}^{n-1} (\boldsymbol{A}^\top)_{ii} = \sum_{i=0}^{n-1} A_{ii} = \text{tr}(\boldsymbol{A}) \label{eq:1-13-2}\end{equation}
1.14 Determinant of a Product
Proof
We prove this using the determinant of a block matrix.
Consider the following block matrix.
\begin{equation}\boldsymbol{M} = \begin{pmatrix} \boldsymbol{A} & \boldsymbol{O} \\ -\boldsymbol{I} & \boldsymbol{B} \end{pmatrix} \label{eq:1-14-1}\end{equation}
Apply the elementary row operation of right-multiplying the first block row by $\boldsymbol{B}$ and adding it to the second block row of $\boldsymbol{M}$. This operation does not change the determinant.
\begin{equation}\begin{pmatrix} \boldsymbol{I} & \boldsymbol{O} \\ \boldsymbol{O} & \boldsymbol{I} \end{pmatrix} \begin{pmatrix} \boldsymbol{A} & \boldsymbol{O} \\ -\boldsymbol{I} & \boldsymbol{B} \end{pmatrix} \begin{pmatrix} \boldsymbol{I} & \boldsymbol{B} \\ \boldsymbol{O} & \boldsymbol{I} \end{pmatrix} = \begin{pmatrix} \boldsymbol{A} & \boldsymbol{AB} \\ -\boldsymbol{I} & \boldsymbol{O} \end{pmatrix} \label{eq:1-14-2}\end{equation}
Further, left-multiply the second block row by $\boldsymbol{A}$ and add to the first block row.
\begin{equation}\begin{pmatrix} \boldsymbol{I} & \boldsymbol{A} \\ \boldsymbol{O} & \boldsymbol{I} \end{pmatrix} \begin{pmatrix} \boldsymbol{A} & \boldsymbol{AB} \\ -\boldsymbol{I} & \boldsymbol{O} \end{pmatrix} = \begin{pmatrix} \boldsymbol{O} & \boldsymbol{AB} \\ -\boldsymbol{I} & \boldsymbol{O} \end{pmatrix} \label{eq:1-14-3}\end{equation}
The determinant of a block triangular matrix equals the product of the determinants of the diagonal blocks.
\begin{equation}\det(\boldsymbol{M}) = \det\begin{pmatrix} \boldsymbol{A} & \boldsymbol{O} \\ -\boldsymbol{I} & \boldsymbol{B} \end{pmatrix} = \det(\boldsymbol{A}) \det(\boldsymbol{B}) \label{eq:1-14-4}\end{equation}
On the other hand, computing the determinant of the matrix in $\eqref{eq:1-14-3}$, by interchanging block rows and columns,
\begin{equation}\det\begin{pmatrix} \boldsymbol{O} & \boldsymbol{AB} \\ -\boldsymbol{I} & \boldsymbol{O} \end{pmatrix} = (-1)^n \det\begin{pmatrix} -\boldsymbol{I} & \boldsymbol{O} \\ \boldsymbol{O} & \boldsymbol{AB} \end{pmatrix} = (-1)^n \cdot (-1)^n \det(\boldsymbol{AB}) = \det(\boldsymbol{AB}) \label{eq:1-14-5}\end{equation}
Since elementary row operations do not change the determinant, from $\eqref{eq:1-14-4}$ and $\eqref{eq:1-14-5}$,
\begin{equation}\det(\boldsymbol{A}) \det(\boldsymbol{B}) = \det(\boldsymbol{AB}) \label{eq:1-14-6}\end{equation}
1.15 Determinant of the Transpose
Proof
We prove this using the Leibniz formula for determinants (A.5).
The Leibniz formula for the determinant is
\begin{equation}\det(\boldsymbol{A}) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=0}^{n-1} A_{i, \sigma(i)} \label{eq:1-15-1}\end{equation}
where $S_n$ is the set of all permutations of $\{0, 1, \ldots, n-1\}$ and $\text{sgn}(\sigma)$ is the sign of the permutation $\sigma$.
Computing the determinant of the transpose $\boldsymbol{A}^\top$, since $(\boldsymbol{A}^\top)_{ij} = A_{ji}$,
\begin{equation}\det(\boldsymbol{A}^\top) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=0}^{n-1} (\boldsymbol{A}^\top)_{i, \sigma(i)} = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=0}^{n-1} A_{\sigma(i), i} \label{eq:1-15-2}\end{equation}
Introducing the substitution $j = \sigma(i)$. Since $\sigma$ is a bijection, as $i$ ranges over $0$ to $n-1$, $j = \sigma(i)$ also takes each value from $0$ to $n-1$ exactly once. Using the inverse permutation $\sigma^{-1}$, we have $i = \sigma^{-1}(j)$.
\begin{equation}\prod_{i=0}^{n-1} A_{\sigma(i), i} = \prod_{j=0}^{n-1} A_{j, \sigma^{-1}(j)} \label{eq:1-15-3}\end{equation}
As $\sigma$ ranges over $S_n$, so does $\sigma^{-1}$. Also, $\text{sgn}(\sigma^{-1}) = \text{sgn}(\sigma)$. Substituting $\tau = \sigma^{-1}$,
\begin{equation}\det(\boldsymbol{A}^\top) = \sum_{\tau \in S_n} \text{sgn}(\tau) \prod_{j=0}^{n-1} A_{j, \tau(j)} = \det(\boldsymbol{A}) \label{eq:1-15-4}\end{equation}
1.4 Derivatives of Basic Functions
Below, we derive the derivatives of basic functions directly from the definition. These results, combined with the rules of differentiation for composite functions, form the foundation for computing derivatives of more complex functions.
1.16 Derivative of a Constant Function
Proof
Let $f(x) = c$ (a constant function). We compute using the definition of the derivative.
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \label{eq:1-16-1}\end{equation}
Substituting $f(x+h) = c$ and $f(x) = c$ into $\eqref{eq:1-16-1}$,
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{c - c}{h} = \lim_{h \to 0} \frac{0}{h} = \lim_{h \to 0} 0 = 0 \label{eq:1-16-2}\end{equation}
1.17 Derivative of the Identity Function
Proof
Let $f(x) = x$. We compute using the definition of the derivative.
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \label{eq:1-17-1}\end{equation}
Substituting $f(x+h) = x + h$ and $f(x) = x$ into $\eqref{eq:1-17-1}$,
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{(x+h) - x}{h} = \lim_{h \to 0} \frac{h}{h} = \lim_{h \to 0} 1 = 1 \label{eq:1-17-2}\end{equation}
1.18 Derivative of the Power Function (Positive Integer)
Proof
Let $f(x) = x^n$ ($n$ is a positive integer). We compute using the definition of the derivative.
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{(x+h)^n - x^n}{h} \label{eq:1-18-1}\end{equation}
Using the binomial theorem (1.5) to expand $(x+h)^n$,
\begin{equation}(x+h)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} h^k = x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \cdots + h^n \label{eq:1-18-2}\end{equation}
Substituting $\eqref{eq:1-18-2}$ into $\eqref{eq:1-18-1}$,
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{x^n + nx^{n-1}h + \binom{n}{2}x^{n-2}h^2 + \cdots + h^n - x^n}{h} \label{eq:1-18-3}\end{equation}
The $x^n$ terms cancel, and dividing each term in the numerator by $h$,
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \left[ nx^{n-1} + \binom{n}{2}x^{n-2}h + \cdots + h^{n-1} \right] \label{eq:1-18-4}\end{equation}
Taking the limit as $h \to 0$, all terms from the second onward contain positive powers of $h$ and converge to 0.
\begin{equation}\frac{d}{dx} x^n = nx^{n-1} \label{eq:1-18-5}\end{equation}
1.19 Derivative of the Power Function (General Real Exponent)
Proof
For $x > 0$, we can write $x^a = e^{a \ln x}$. We differentiate using this representation.
Remark: In analysis, the natural logarithm $\ln$ is defined as the inverse function of $e^y$, and $x^a = e^{a \ln x}$ is adopted as the definition of general real powers. This reduces the derivative of the power function to derivatives of exponential and logarithmic functions.
Let $f(x) = x^a = e^{a \ln x}$. Applying the chain rule (1.26),
\begin{equation}\frac{df}{dx} = \frac{d}{dx} e^{a \ln x} \label{eq:1-19-1}\end{equation}
Setting $u = a \ln x$, we have $f = e^u$, so
\begin{equation}\frac{df}{dx} = \frac{de^u}{du} \cdot \frac{du}{dx} \label{eq:1-19-2}\end{equation}
Using $\displaystyle \frac{d}{du} e^u = e^u$ (1.20) and $\displaystyle \frac{d}{dx}(a \ln x) = \frac{a}{x}$ (1.21),
\begin{equation}\frac{df}{dx} = e^{a \ln x} \cdot \frac{a}{x} = x^a \cdot \frac{a}{x} = a x^{a-1} \label{eq:1-19-3}\end{equation}
1.20 Derivative of the Exponential Function
Proof
Let $f(x) = e^x$. We compute using the definition of the derivative.
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{e^{x+h} - e^x}{h} \label{eq:1-20-1}\end{equation}
Using the exponential law $e^{x+h} = e^x \cdot e^h$, we factor out $e^x$.
\begin{equation}\frac{df}{dx} = \lim_{h \to 0} \frac{e^x \cdot e^h - e^x}{h} = \lim_{h \to 0} e^x \cdot \frac{e^h - 1}{h} = e^x \cdot \lim_{h \to 0} \frac{e^h - 1}{h} \label{eq:1-20-2}\end{equation}
We compute the limit $\lim_{h \to 0} \frac{e^h - 1}{h}$. Here we use the Taylor expansion of $e^h$ as a known result (the convergence of the Taylor series and the validity of term-by-term operations are justified separately in analysis).
\begin{equation}e^h = 1 + h + \frac{h^2}{2!} + \frac{h^3}{3!} + \cdots \label{eq:1-20-3}\end{equation}
From $\eqref{eq:1-20-3}$,
\begin{equation}e^h - 1 = h + \frac{h^2}{2!} + \frac{h^3}{3!} + \cdots \label{eq:1-20-4}\end{equation}
Dividing both sides of $\eqref{eq:1-20-4}$ by $h$,
\begin{equation}\frac{e^h - 1}{h} = 1 + \frac{h}{2!} + \frac{h^2}{3!} + \cdots \label{eq:1-20-5}\end{equation}
Taking the limit as $h \to 0$,
\begin{equation}\lim_{h \to 0} \frac{e^h - 1}{h} = 1 \label{eq:1-20-6}\end{equation}
Substituting $\eqref{eq:1-20-6}$ into $\eqref{eq:1-20-2}$,
\begin{equation}\frac{d}{dx} e^x = e^x \cdot 1 = e^x \label{eq:1-20-7}\end{equation}
1.21 Derivative of the Natural Logarithm
Proof
Let $f(x) = \ln x$. We use the inverse function differentiation formula.
Setting $y = \ln x$, we have $x = e^y$. We differentiate both sides with respect to $x$.
By the inverse function rule (1.27),
\begin{equation}\frac{dy}{dx} = \frac{1}{\frac{dx}{dy}} \label{eq:1-21-1}\end{equation}
Differentiating $x = e^y$ with respect to $y$, by 1.20, $\displaystyle \frac{dx}{dy} = e^y$.
\begin{equation}\frac{dy}{dx} = \frac{1}{e^y} \label{eq:1-21-2}\end{equation}
Substituting $e^y = x$ (by the definition $y = \ln x$) into $\eqref{eq:1-21-2}$,
\begin{equation}\frac{d}{dx} \ln x = \frac{1}{x} \label{eq:1-21-3}\end{equation}
1.22 Derivative of the General Exponential Function
Proof
We rewrite $a^x = e^{x \ln a}$.
\begin{equation}a^x = (e^{\ln a})^x = e^{x \ln a} \label{eq:1-22-1}\end{equation}
Applying the chain rule (1.26), setting $u = x \ln a$,
\begin{equation}\frac{d}{dx} a^x = \frac{d}{dx} e^u = \frac{de^u}{du} \cdot \frac{du}{dx} \label{eq:1-22-2}\end{equation}
Using $\displaystyle \frac{d}{du} e^u = e^u$ (1.20) and $\displaystyle \frac{d}{dx}(x \ln a) = \ln a$ ($\ln a$ is a constant),
\begin{equation}\frac{d}{dx} a^x = e^{x \ln a} \cdot \ln a = a^x \ln a \label{eq:1-22-3}\end{equation}
1.23 Derivative of the General Logarithmic Function
Proof
We express $\log_a x$ in terms of the natural logarithm using the change of base formula.
\begin{equation}\log_a x = \frac{\ln x}{\ln a} \label{eq:1-23-1}\end{equation}
Since $\ln a$ is a constant,
\begin{equation}\frac{d}{dx} \log_a x = \frac{1}{\ln a} \cdot \frac{d}{dx} \ln x \label{eq:1-23-2}\end{equation}
Substituting $\displaystyle \frac{d}{dx} \ln x = \frac{1}{x}$ from 1.21,
\begin{equation}\frac{d}{dx} \log_a x = \frac{1}{\ln a} \cdot \frac{1}{x} = \frac{1}{x \ln a} \label{eq:1-23-3}\end{equation}
1.5 Rules of Differentiation
The rules of differentiation allow us to differentiate complex functions as combinations of basic functions. These rules also hold (in suitably generalized forms) in matrix calculus.
1.24 Linearity (Sum and Scalar Multiple)
Proof
Let $h(x) = af(x) + bg(x)$. We compute using the definition of the derivative.
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{h(x + \Delta x) - h(x)}{\Delta x} \label{eq:1-24-1}\end{equation}
Substituting $h(x + \Delta x) = af(x + \Delta x) + bg(x + \Delta x)$,
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{af(x + \Delta x) + bg(x + \Delta x) - af(x) - bg(x)}{\Delta x} \label{eq:1-24-2}\end{equation}
Rearranging the terms,
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \left[ a \cdot \frac{f(x + \Delta x) - f(x)}{\Delta x} + b \cdot \frac{g(x + \Delta x) - g(x)}{\Delta x} \right] \label{eq:1-24-3}\end{equation}
By the linearity of limits,
\begin{equation}\frac{dh}{dx} = a \cdot \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} + b \cdot \lim_{\Delta x \to 0} \frac{g(x + \Delta x) - g(x)}{\Delta x} \label{eq:1-24-4}\end{equation}
By the definition of the derivative,
\begin{equation}\frac{d}{dx}[af(x) + bg(x)] = a\frac{df}{dx} + b\frac{dg}{dx} \label{eq:1-24-5}\end{equation}
1.25 Product Rule (Leibniz Rule)
Proof
Let $h(x) = f(x)g(x)$. We compute using the definition of the derivative.
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x)g(x + \Delta x) - f(x)g(x)}{\Delta x} \label{eq:1-25-1}\end{equation}
We add and subtract $f(x + \Delta x)g(x)$ ($= 0$) in the numerator.
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x)g(x + \Delta x) - f(x + \Delta x)g(x) + f(x + \Delta x)g(x) - f(x)g(x)}{\Delta x} \label{eq:1-25-2}\end{equation}
Grouping the terms,
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \left[ f(x + \Delta x) \cdot \frac{g(x + \Delta x) - g(x)}{\Delta x} + g(x) \cdot \frac{f(x + \Delta x) - f(x)}{\Delta x} \right] \label{eq:1-25-3}\end{equation}
Applying the limit to each term. Since $f$ is differentiable, it is continuous (1.3), so $\lim_{\Delta x \to 0} f(x + \Delta x) = f(x)$.
\begin{equation}\frac{dh}{dx} = f(x) \cdot \lim_{\Delta x \to 0} \frac{g(x + \Delta x) - g(x)}{\Delta x} + g(x) \cdot \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \label{eq:1-25-4}\end{equation}
By the definition of the derivative,
\begin{equation}\frac{d}{dx}[f(x)g(x)] = f(x)g'(x) + g(x)f'(x) = f'(x)g(x) + f(x)g'(x) \label{eq:1-25-5}\end{equation}
Source: G.W. Leibniz (1684) "Nova methodus pro maximis et minimis", Acta Eruditorum. Known as the "Leibniz rule."
1.26 Chain Rule (Derivative of Composite Functions)
Proof
Let $h(x) = f(g(x))$. Substituting $u = g(x)$, we have $h = f(u)$.
We compute using the definition of the derivative.
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{f(g(x + \Delta x)) - f(g(x))}{\Delta x} \label{eq:1-26-1}\end{equation}
Let $\Delta u = g(x + \Delta x) - g(x)$. Since $g$ is differentiable, it is continuous (1.3), so $\Delta u \to 0$ as $\Delta x \to 0$.
When $\Delta u \neq 0$, we multiply and divide by $\Delta u$ (even if there exist points $\Delta x$ where $\Delta u = 0$, this does not affect the limit value, because we can evaluate the limit using a sequence of points where $\Delta u \neq 0$ as $\Delta x \to 0$).
\begin{equation}\frac{dh}{dx} = \lim_{\Delta x \to 0} \frac{f(g(x) + \Delta u) - f(g(x))}{\Delta u} \cdot \frac{\Delta u}{\Delta x} \label{eq:1-26-2}\end{equation}
Setting $u = g(x)$, the first factor is
\begin{equation}\lim_{\Delta u \to 0} \frac{f(u + \Delta u) - f(u)}{\Delta u} = f'(u) = f'(g(x)) \label{eq:1-26-3}\end{equation}
The second factor is
\begin{equation}\lim_{\Delta x \to 0} \frac{\Delta u}{\Delta x} = \lim_{\Delta x \to 0} \frac{g(x + \Delta x) - g(x)}{\Delta x} = g'(x) \label{eq:1-26-4}\end{equation}
Substituting $\eqref{eq:1-26-3}$ and $\eqref{eq:1-26-4}$ into $\eqref{eq:1-26-2}$,
\begin{equation}\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x) \label{eq:1-26-5}\end{equation}
In Leibniz notation,
\begin{equation}\frac{dh}{dx} = \frac{df}{du} \cdot \frac{du}{dx} \label{eq:1-26-6}\end{equation}
Source: Introduced by G.W. Leibniz (1684) "Nova methodus pro maximis et minimis" along with differential notation. A rigorous proof was given by A.L. Cauchy (1821) "Cours d'analyse."
1.27 Inverse Function Differentiation
Proof
Let $y = f(x)$, and denote the inverse function by $x = f^{-1}(y)$.
By definition, $f(f^{-1}(y)) = y$. Differentiating both sides with respect to $y$,
\begin{equation}\frac{d}{dy} f(f^{-1}(y)) = \frac{d}{dy} y = 1 \label{eq:1-27-1}\end{equation}
Applying the chain rule (1.26) to the left-hand side, setting $u = f^{-1}(y)$,
\begin{equation}\frac{df}{du} \cdot \frac{du}{dy} = 1 \label{eq:1-27-2}\end{equation}
Since $u = f^{-1}(y) = x$, we have $\displaystyle \frac{df}{du} = \frac{dy}{dx} = f'(x)$.
\begin{equation}f'(x) \cdot \frac{dx}{dy} = 1 \label{eq:1-27-3}\end{equation}
When $f'(x) \neq 0$, solving $\eqref{eq:1-27-3}$,
\begin{equation}\frac{dx}{dy} = \frac{1}{f'(x)} = \frac{1}{\frac{dy}{dx}} \label{eq:1-27-4}\end{equation}
1.28 Quotient Rule
Proof
Write $h(x) = \frac{f(x)}{g(x)} = f(x) \cdot [g(x)]^{-1}$ and apply the product rule.
By the product rule (1.25),
\begin{equation}\frac{dh}{dx} = f'(x) \cdot [g(x)]^{-1} + f(x) \cdot \frac{d}{dx}[g(x)]^{-1} \label{eq:1-28-1}\end{equation}
We find the derivative of $[g(x)]^{-1}$. Setting $u = g(x)$, by the chain rule (1.26),
\begin{equation}\frac{d}{dx}[g(x)]^{-1} = \frac{d}{dx} u^{-1} = \frac{d(u^{-1})}{du} \cdot \frac{du}{dx} = (-u^{-2}) \cdot g'(x) = -\frac{g'(x)}{[g(x)]^2} \label{eq:1-28-2}\end{equation}
Substituting $\eqref{eq:1-28-2}$ into $\eqref{eq:1-28-1}$,
\begin{equation}\frac{dh}{dx} = \frac{f'(x)}{g(x)} + f(x) \cdot \left(-\frac{g'(x)}{[g(x)]^2}\right) \label{eq:1-28-3}\end{equation}
Combining over a common denominator,
\begin{equation}\frac{dh}{dx} = \frac{f'(x) \cdot g(x)}{[g(x)]^2} - \frac{f(x) \cdot g'(x)}{[g(x)]^2} = \frac{f'(x)g(x) - f(x)g'(x)}{[g(x)]^2} \label{eq:1-28-4}\end{equation}
1.29 Logarithmic Differentiation
Proof
Let $h(x) = [f(x)]^{g(x)}$. Since $f(x) > 0$, take the natural logarithm of both sides.
\begin{equation}\ln h(x) = g(x) \ln f(x) \label{eq:1-29-1}\end{equation}
Differentiating both sides of $\eqref{eq:1-29-1}$ with respect to $x$. The left-hand side, by the chain rule, is
\begin{equation}\frac{d}{dx} \ln h(x) = \frac{1}{h(x)} \cdot h'(x) = \frac{h'(x)}{h(x)} \label{eq:1-29-2}\end{equation}
The right-hand side, by the product rule, is
\begin{equation}\frac{d}{dx}[g(x) \ln f(x)] = g'(x) \ln f(x) + g(x) \cdot \frac{f'(x)}{f(x)} \label{eq:1-29-3}\end{equation}
From $\eqref{eq:1-29-2}$ and $\eqref{eq:1-29-3}$,
\begin{equation}\frac{h'(x)}{h(x)} = g'(x) \ln f(x) + g(x) \frac{f'(x)}{f(x)} \label{eq:1-29-4}\end{equation}
Multiplying both sides by $h(x) = [f(x)]^{g(x)}$,
\begin{equation}h'(x) = [f(x)]^{g(x)} \left[ g'(x) \ln f(x) + g(x) \frac{f'(x)}{f(x)} \right] \label{eq:1-29-5}\end{equation}
1.6 Derivatives of Trigonometric Functions
We derive the differentiation formulas for trigonometric functions and their inverses. These are extensively used in Fourier analysis and signal processing, and are also needed in matrix calculus when differentiating functions involving trigonometric functions.
1.30 Derivative of the Sine Function
Proof
We compute using the definition of the derivative.
\begin{equation}\frac{d}{dx} \sin x = \lim_{h \to 0} \frac{\sin(x+h) - \sin x}{h} \label{eq:1-30-1}\end{equation}
Using the addition formula (1.7) $\sin(x+h) = \sin x \cos h + \cos x \sin h$,
\begin{equation}\frac{d}{dx} \sin x = \lim_{h \to 0} \frac{\sin x \cos h + \cos x \sin h - \sin x}{h} \label{eq:1-30-2}\end{equation}
Rearranging the terms,
\begin{equation}\frac{d}{dx} \sin x = \lim_{h \to 0} \left[ \sin x \cdot \frac{\cos h - 1}{h} + \cos x \cdot \frac{\sin h}{h} \right] \label{eq:1-30-3}\end{equation}
Using the fundamental limits (1.8, 1.9),
\begin{equation}\lim_{h \to 0} \frac{\sin h}{h} = 1 \label{eq:1-30-4}\end{equation}
\begin{equation}\lim_{h \to 0} \frac{\cos h - 1}{h} = 0 \label{eq:1-30-5}\end{equation}
Substituting $\eqref{eq:1-30-4}$ and $\eqref{eq:1-30-5}$ into $\eqref{eq:1-30-3}$,
\begin{equation}\frac{d}{dx} \sin x = \sin x \cdot 0 + \cos x \cdot 1 = \cos x \label{eq:1-30-7}\end{equation}
1.31 Derivative of the Cosine Function
Proof
We use the identity $\cos x = \sin\left(\frac{\pi}{2} - x\right)$.
Applying the chain rule (1.26), setting $u = \displaystyle \frac{\pi}{2} - x$,
\begin{equation}\frac{d}{dx} \cos x = \frac{d}{dx} \sin u = \frac{d(\sin u)}{du} \cdot \frac{du}{dx} \label{eq:1-31-1}\end{equation}
By 1.30, $\displaystyle \frac{d(\sin u)}{du} = \cos u$. Also, $\displaystyle \frac{du}{dx} = -1$.
\begin{equation}\frac{d}{dx} \cos x = \cos u \cdot (-1) = -\cos\left(\frac{\pi}{2} - x\right) = -\sin x \label{eq:1-31-2}\end{equation}
The last equality uses $\cos\left(\frac{\pi}{2} - x\right) = \sin x$.
1.32 Derivative of the Tangent Function
Proof
We apply the quotient rule (1.28) to $\tan x = \frac{\sin x}{\cos x}$.
\begin{equation}\frac{d}{dx} \tan x = \frac{(\sin x)' \cos x - \sin x (\cos x)'}{\cos^2 x} \label{eq:1-32-1}\end{equation}
Substituting $(\sin x)' = \cos x$ and $(\cos x)' = -\sin x$ from 1.30 and 1.31,
\begin{equation}\frac{d}{dx} \tan x = \frac{\cos x \cdot \cos x - \sin x \cdot (-\sin x)}{\cos^2 x} = \frac{\cos^2 x + \sin^2 x}{\cos^2 x} \label{eq:1-32-2}\end{equation}
By the Pythagorean identity (1.6) $\cos^2 x + \sin^2 x = 1$,
\begin{equation}\frac{d}{dx} \tan x = \frac{1}{\cos^2 x} = \sec^2 x \label{eq:1-32-3}\end{equation}
1.33 Derivatives of Other Trigonometric Functions
$\displaystyle \frac{d}{dx} \cot x = -\csc^2 x$
$\displaystyle \frac{d}{dx} \sec x = \sec x \tan x$
$\displaystyle \frac{d}{dx} \csc x = -\csc x \cot x$
Proof
Derivative of $\cot x$:
Applying the quotient rule to $\cot x = \frac{\cos x}{\sin x}$,
\begin{equation}\frac{d}{dx} \cot x = \frac{-\sin x \cdot \sin x - \cos x \cdot \cos x}{\sin^2 x} = \frac{-(\sin^2 x + \cos^2 x)}{\sin^2 x} = -\frac{1}{\sin^2 x} = -\csc^2 x \label{eq:1-33-1}\end{equation}
Derivative of $\sec x$:
Applying the chain rule to $\sec x = \frac{1}{\cos x} = (\cos x)^{-1}$,
\begin{equation}\frac{d}{dx} \sec x = -(\cos x)^{-2} \cdot (-\sin x) = \frac{\sin x}{\cos^2 x} = \frac{1}{\cos x} \cdot \frac{\sin x}{\cos x} = \sec x \tan x \label{eq:1-33-2}\end{equation}
Derivative of $\csc x$:
Applying the chain rule to $\csc x = \frac{1}{\sin x} = (\sin x)^{-1}$,
\begin{equation}\frac{d}{dx} \csc x = -(\sin x)^{-2} \cdot \cos x = -\frac{\cos x}{\sin^2 x} = -\frac{1}{\sin x} \cdot \frac{\cos x}{\sin x} = -\csc x \cot x \label{eq:1-33-3}\end{equation}
1.7 Derivatives of Inverse Trigonometric Functions
1.34 Derivative of the Arcsine Function
Proof
Let $y = \arcsin x$, so $x = \sin y$ with $-\frac{\pi}{2} \leq y \leq \frac{\pi}{2}$.
Applying the inverse function rule (1.27),
\begin{equation}\frac{dy}{dx} = \frac{1}{\frac{dx}{dy}} = \frac{1}{\cos y} \label{eq:1-34-1}\end{equation}
Expressing $\cos y$ in terms of $x$. From $\sin^2 y + \cos^2 y = 1$,
\begin{equation}\cos y = \pm\sqrt{1 - \sin^2 y} = \pm\sqrt{1 - x^2} \label{eq:1-34-2}\end{equation}
Since $\cos y \geq 0$ for $-\frac{\pi}{2} \leq y \leq \frac{\pi}{2}$, we take the positive square root.
\begin{equation}\cos y = \sqrt{1 - x^2} \label{eq:1-34-3}\end{equation}
Substituting $\eqref{eq:1-34-3}$ into $\eqref{eq:1-34-1}$,
\begin{equation}\frac{d}{dx} \arcsin x = \frac{1}{\sqrt{1 - x^2}} \label{eq:1-34-4}\end{equation}
1.35 Derivative of the Arccosine Function
Proof
Let $y = \arccos x$, so $x = \cos y$ with $0 \leq y \leq \pi$.
Applying the inverse function rule,
\begin{equation}\frac{dy}{dx} = \frac{1}{\frac{dx}{dy}} = \frac{1}{-\sin y} \label{eq:1-35-1}\end{equation}
Expressing $\sin y$ in terms of $x$. From $\sin^2 y + \cos^2 y = 1$,
\begin{equation}\sin y = \pm\sqrt{1 - \cos^2 y} = \pm\sqrt{1 - x^2} \label{eq:1-35-2}\end{equation}
Since $\sin y \geq 0$ for $0 \leq y \leq \pi$, we take the positive square root.
\begin{equation}\sin y = \sqrt{1 - x^2} \label{eq:1-35-3}\end{equation}
Substituting $\eqref{eq:1-35-3}$ into $\eqref{eq:1-35-1}$,
\begin{equation}\frac{d}{dx} \arccos x = -\frac{1}{\sqrt{1 - x^2}} \label{eq:1-35-4}\end{equation}
1.36 Derivative of the Arctangent Function
Proof
Let $y = \arctan x$, so $x = \tan y$ with $-\frac{\pi}{2} < y < \frac{\pi}{2}$.
Applying the inverse function rule,
\begin{equation}\frac{dy}{dx} = \frac{1}{\frac{dx}{dy}} = \frac{1}{\sec^2 y} = \cos^2 y \label{eq:1-36-1}\end{equation}
Expressing $\cos^2 y$ in terms of $x$. From $\sec^2 y = 1 + \tan^2 y$,
\begin{equation}\cos^2 y = \frac{1}{\sec^2 y} = \frac{1}{1 + \tan^2 y} = \frac{1}{1 + x^2} \label{eq:1-36-2}\end{equation}
Substituting $\eqref{eq:1-36-2}$ into $\eqref{eq:1-36-1}$,
\begin{equation}\frac{d}{dx} \arctan x = \frac{1}{1 + x^2} \label{eq:1-36-3}\end{equation}
1.8 Derivatives of Hyperbolic Functions
1.37 Derivative of the Hyperbolic Sine
Proof
We differentiate using the definition $\sinh x = \frac{e^x - e^{-x}}{2}$.
\begin{equation}\frac{d}{dx} \sinh x = \frac{d}{dx} \frac{e^x - e^{-x}}{2} = \frac{1}{2} \left( \frac{d}{dx} e^x - \frac{d}{dx} e^{-x} \right) \label{eq:1-37-1}\end{equation}
By 1.20 and the chain rule, $\displaystyle \frac{d}{dx} e^x = e^x$ and $\displaystyle \frac{d}{dx} e^{-x} = -e^{-x}$.
\begin{equation}\frac{d}{dx} \sinh x = \frac{1}{2} (e^x - (-e^{-x})) = \frac{e^x + e^{-x}}{2} = \cosh x \label{eq:1-37-2}\end{equation}
1.38 Derivative of the Hyperbolic Cosine
Proof
We differentiate using the definition $\cosh x = \frac{e^x + e^{-x}}{2}$.
\begin{equation}\frac{d}{dx} \cosh x = \frac{d}{dx} \frac{e^x + e^{-x}}{2} = \frac{1}{2} \left( \frac{d}{dx} e^x + \frac{d}{dx} e^{-x} \right) \label{eq:1-38-1}\end{equation}
\begin{equation}\frac{d}{dx} \cosh x = \frac{1}{2} (e^x + (-e^{-x})) = \frac{e^x - e^{-x}}{2} = \sinh x \label{eq:1-38-2}\end{equation}
1.39 Derivative of the Hyperbolic Tangent
Proof
We apply the quotient rule to $\tanh x = \frac{\sinh x}{\cosh x}$.
\begin{equation}\frac{d}{dx} \tanh x = \frac{(\sinh x)' \cosh x - \sinh x (\cosh x)'}{\cosh^2 x} \label{eq:1-39-1}\end{equation}
Substituting $(\sinh x)' = \cosh x$ and $(\cosh x)' = \sinh x$ from 1.37 and 1.38,
\begin{equation}\frac{d}{dx} \tanh x = \frac{\cosh x \cdot \cosh x - \sinh x \cdot \sinh x}{\cosh^2 x} = \frac{\cosh^2 x - \sinh^2 x}{\cosh^2 x} \label{eq:1-39-2}\end{equation}
By the hyperbolic identity (1.10) $\cosh^2 x - \sinh^2 x = 1$,
\begin{equation}\frac{d}{dx} \tanh x = \frac{1}{\cosh^2 x} = \text{sech}^2 x \label{eq:1-39-3}\end{equation}
Also, $\text{sech}^2 x = 1 - \tanh^2 x$ holds (since $\frac{1}{\cosh^2 x} = \frac{\cosh^2 x - \sinh^2 x}{\cosh^2 x}$).
1.9 Other Important Differentiation Formulas
1.40 Derivative of the Absolute Value Function
Proof
We can write $|x| = \sqrt{x^2}$. Applying the chain rule,
setting $u = x^2$, we have $|x| = u^{1/2}$.
\begin{equation}\frac{d}{dx}|x| = \frac{d(u^{1/2})}{du} \cdot \frac{du}{dx} = \frac{1}{2}u^{-1/2} \cdot 2x = \frac{x}{\sqrt{x^2}} = \frac{x}{|x|} \label{eq:1-40-1}\end{equation}
For $x > 0$, $\frac{x}{|x|} = \frac{x}{x} = 1$; for $x < 0$, $\frac{x}{|x|} = \frac{x}{-x} = -1$.
\begin{equation}\frac{d}{dx}|x| = \text{sgn}(x) = \begin{cases} 1 & (x > 0) \\ -1 & (x < 0) \end{cases} \label{eq:1-40-2}\end{equation}
1.41 Derivative of the Sigmoid Function
Proof
We differentiate $\sigma(x) = \frac{1}{1 + e^{-x}} = (1 + e^{-x})^{-1}$ using the chain rule.
Setting $u = 1 + e^{-x}$, we have $\sigma = u^{-1}$.
\begin{equation}\frac{d\sigma}{dx} = \frac{d(u^{-1})}{du} \cdot \frac{du}{dx} = (-u^{-2}) \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2} \label{eq:1-41-1}\end{equation}
We express this in terms of $\sigma(x)$. Since $\sigma = \frac{1}{1 + e^{-x}}$,
\begin{equation}1 - \sigma = 1 - \frac{1}{1 + e^{-x}} = \frac{e^{-x}}{1 + e^{-x}} \label{eq:1-41-2}\end{equation}
Therefore,
\begin{equation}\sigma(1 - \sigma) = \frac{1}{1 + e^{-x}} \cdot \frac{e^{-x}}{1 + e^{-x}} = \frac{e^{-x}}{(1 + e^{-x})^2} \label{eq:1-41-3}\end{equation}
Comparing $\eqref{eq:1-41-1}$ and $\eqref{eq:1-41-3}$,
\begin{equation}\frac{d\sigma}{dx} = \sigma(1 - \sigma) \label{eq:1-41-4}\end{equation}
1.42 Derivative of the Softplus Function
Proof
We differentiate $f(x) = \ln(1 + e^x)$ (the Softplus function) using the chain rule.
Setting $u = 1 + e^x$, we have $f = \ln u$.
\begin{equation}\frac{df}{dx} = \frac{d(\ln u)}{du} \cdot \frac{du}{dx} = \frac{1}{u} \cdot e^x = \frac{e^x}{1 + e^x} \label{eq:1-42-1}\end{equation}
Multiplying numerator and denominator by $e^{-x}$,
\begin{equation}\frac{df}{dx} = \frac{e^x \cdot e^{-x}}{(1 + e^x) \cdot e^{-x}} = \frac{1}{e^{-x} + 1} = \frac{1}{1 + e^{-x}} = \sigma(x) \label{eq:1-42-2}\end{equation}
1.43 Leibniz Formula (Product of Higher-Order Derivatives)
Proof
We prove this by mathematical induction.
Base case ($n = 1$):
\begin{equation}(fg)' = f'g + fg' = \binom{1}{0}f^{(0)}g^{(1)} + \binom{1}{1}f^{(1)}g^{(0)} \label{eq:1-43-1}\end{equation}
This agrees with the product rule (1.25).
Inductive step:
Assume the formula holds for $n = m$.
\begin{equation}(fg)^{(m)} = \sum_{k=0}^{m} \binom{m}{k} f^{(k)} g^{(m-k)} \label{eq:1-43-2}\end{equation}
We show the case $n = m + 1$. Differentiating both sides of $\eqref{eq:1-43-2}$,
\begin{equation}(fg)^{(m+1)} = \sum_{k=0}^{m} \binom{m}{k} \left( f^{(k+1)} g^{(m-k)} + f^{(k)} g^{(m-k+1)} \right) \label{eq:1-43-3}\end{equation}
Rearranging $\eqref{eq:1-43-3}$ and applying Pascal's identity (1.4) $\binom{m}{k-1} + \binom{m}{k} = \binom{m+1}{k}$,
\begin{equation}(fg)^{(m+1)} = \sum_{k=0}^{m+1} \binom{m+1}{k} f^{(k)} g^{(m+1-k)} \label{eq:1-43-4}\end{equation}
References
- Petersen, K. B., & Pedersen, M. S. (2012). The Matrix Cookbook. Technical University of Denmark.
- Magnus, J. R., & Neudecker, H. (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley.
- Matrix calculus - Wikipedia