Optimization Glossary
Glossary of Mathematical Optimization
This glossary organizes key terms that frequently appear in optimization theory, grouped by topic. Each term includes a definition and links to related pages.
Fundamentals
- Objective function
- The function $f(\mathbf{x})$ to be minimized (or maximized). Also called a cost function or loss function.
- Feasible region
- The set of all points that satisfy every constraint. Also called the feasible set.
- Global optimum
- A point that minimizes the objective function over the entire feasible region.
- Local optimum
- A point that minimizes the objective function within some neighborhood. It is not necessarily a global optimum.
- Gradient
- The vector of partial derivatives of a multivariate function: $\nabla f(x) = \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)^T$. It points in the direction of steepest ascent. → Introduction: Gradient
- Hessian matrix
- The symmetric matrix of second-order partial derivatives: $\nabla^2 f(x) = \left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]$. It captures curvature information and is used in second-order optimality conditions and Newton's method. → Intermediate: Newton's Method
- Jacobian matrix
- The $m \times n$ matrix of first-order partial derivatives of a vector-valued function $\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^m$. The gradient is the transpose of the Jacobian of a scalar function.
- Stationary point
- A point satisfying $\nabla f(x^*) = 0$. It may be a local minimum, a local maximum, or a saddle point.
- Saddle point
- A stationary point that is a local minimum in some directions and a local maximum in others. It corresponds to the case where the Hessian is indefinite (having both positive and negative eigenvalues).
Convex Analysis
- Convex set
- A set in which the line segment between any two points lies within the set: $\theta x + (1-\theta)y \in C$ ($0 \le \theta \le 1$). → Introduction: Convex Sets and Convex Functions
- Convex function
- A function satisfying $f(\theta x + (1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$. Any local optimum is also a global optimum.
- Strongly convex function
- A function satisfying $f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{m}{2}\|y-x\|^2$ for some parameter $m > 0$. It guarantees the existence of a unique global minimizer and ensures convergence rate bounds for gradient descent.
- Convex cone
- A set that is convex and satisfies $\alpha x \in C$ for all $x \in C$ and $\alpha \ge 0$. The set of positive semidefinite matrices $\mathbb{S}^n_+$ is a representative example.
- Epigraph
- The region above the graph of a function $f$: $\text{epi}\,f = \{(x, t) \mid f(x) \le t\}$. $f$ is convex $\Leftrightarrow$ $\text{epi}\,f$ is a convex set. It is a fundamental tool in convex analysis.
- Subgradient
- For a non-smooth convex function $f$, a vector $g$ satisfying $f(y) \ge f(x) + g^T(y-x)$ ($\forall y$). At differentiable points, the subgradient coincides with the gradient. The subdifferential $\partial f(x)$ is the set of all subgradients.
- Conjugate function (Fenchel conjugate)
- $f^*(y) = \sup_x \{y^T x - f(x)\}$. A dual representation of a convex function that forms the basis of duality theory.
- Proximal operator
- $\text{prox}_f(x) = \arg\min_u \left\{f(u) + \frac{1}{2}\|u - x\|^2\right\}$. It is used in optimization problems with non-smooth regularization terms (ISTA, ADMM, etc.).
- Lipschitz continuity
- A function $f$ is said to be $L$-Lipschitz continuous if there exists a constant $L > 0$ such that $\|f(x) - f(y)\| \le L\|x - y\|$. Lipschitz continuity of the gradient is important in convergence rate analysis.
Optimality Conditions
- First-order necessary condition
- For an unconstrained problem, if $x^*$ is a local minimizer, then $\nabla f(x^*) = 0$.
- Second-order sufficient condition
- If $\nabla f(x^*) = 0$ and $\nabla^2 f(x^*) \succ 0$ (positive definite), then $x^*$ is a strict local minimizer.
- KKT conditions (Karush-Kuhn-Tucker conditions)
- First-order necessary conditions for constrained optimization. They consist of four conditions:
- Stationarity: $\nabla f + \sum_i \lambda_i \nabla g_i + \sum_j \mu_j \nabla h_j = 0$
- Primal feasibility: $g_i(x) \le 0,\; h_j(x) = 0$
- Dual feasibility: $\lambda_i \ge 0$
- Complementary slackness: $\lambda_i g_i(x) = 0$
- Constraint qualification
- A regularity condition on the constraints under which the KKT conditions are necessary. Examples include the Linear Independence Constraint Qualification (LICQ) and Slater's condition.
- Complementary slackness
- $\lambda_i g_i(x^*) = 0$: at the optimal solution, if an inequality constraint is not active ($g_i < 0$), then its multiplier is $\lambda_i = 0$. Conversely, if $\lambda_i > 0$, the constraint holds with equality (active constraint).
Duality Theory
- Lagrangian
- $L(x, \lambda, \mu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \mu_j h_j(x)$. A tool for incorporating constraints into the objective function. → Introduction: Lagrange Method
- Lagrange multiplier
- Dual variables $\lambda_i, \mu_j$ corresponding to the constraints. They represent the rate of change of the optimal objective value when the constraint is slightly relaxed (shadow price).
- Dual problem
- $\max_{\lambda \ge 0,\, \mu} \min_x L(x, \lambda, \mu)$. It provides a lower bound on the primal problem. → Basic: Duality Theory
- Weak duality
- The optimal value of the dual problem $d^*$ is at most the optimal value of the primal problem $p^*$: $d^* \le p^*$. This always holds.
- Strong duality
- $d^* = p^*$ (the duality gap is zero). It holds for convex problems satisfying Slater's condition.
- Duality gap
- $\eta = p^* - d^* \ge 0$. The difference between the optimal values of the primal and dual problems. It is used for convergence checks and stopping criteria. Under strong duality, $\eta = 0$.
- Slater's condition
- For a convex problem, it holds when there exists an interior point that strictly satisfies all inequality constraints. It is the most widely used sufficient condition for strong duality.
- Fenchel duality
- A dual transformation using conjugate functions. The dual of $\min_x \{f(x) + g(Ax)\}$ is $\max_y \{-f^*(-A^Ty) - g^*(y)\}$. It provides the theoretical foundation for decomposition algorithms such as ADMM.
Algorithms
- Gradient descent
- $x_{k+1} = x_k - \alpha_k \nabla f(x_k)$. The most fundamental iterative optimization method. → Intermediate: Gradient Descent
- Stochastic gradient descent (SGD)
- Uses gradient estimates from mini-batches. Suitable for large-scale data. → Intermediate: SGD
- Newton's method
- $x_{k+1} = x_k - [\nabla^2 f(x_k)]^{-1}\nabla f(x_k)$. Achieves quadratic convergence but requires computation and inversion of the Hessian matrix. → Intermediate: Newton's Method
- Quasi-Newton method
- A method that approximates the Hessian matrix from gradient information without computing it directly. BFGS and L-BFGS are representative examples. Achieves superlinear convergence.
- Conjugate gradient method
- A method that sequentially selects search directions conjugate to the previous ones. It solves an $n$-dimensional quadratic function in at most $n$ steps. → Intermediate: Conjugate Gradient
- Trust region method
- A method that minimizes a quadratic model within a neighborhood (trust region) of the current point. It has more robust convergence properties than line search methods. → Intermediate: Trust Region
- Nelder-Mead method
- A derivative-free simplex-based direct search method. Effective for low-dimensional non-smooth problems. → Intermediate: Nelder-Mead
- Direct search method
- A general term for optimization methods that use only function values without gradients. Examples include pattern search and MADS. → Intermediate: Direct Search
- Interior point method
- A method that follows a path through the interior of the feasible region. It uses a barrier function and can solve linear programs in polynomial time. → Intermediate: Interior Point Method
- Step size (learning rate)
- The magnitude of the update $\alpha_k$ in an iterative method. Too large causes divergence; too small causes slow convergence. Appropriate values are chosen via the Armijo condition or Wolfe conditions.
- Line search
- After the search direction $d_k$ is determined, a procedure to approximately find $\alpha_k = \arg\min_\alpha f(x_k + \alpha d_k)$.
- Convergence rate
- The speed at which an iterative method approaches the optimal solution. Classified as linear ($\|x_{k+1} - x^*\| \le c\|x_k - x^*\|$), superlinear, or quadratic ($\|x_{k+1} - x^*\| \le c\|x_k - x^*\|^2$).
Constrained Optimization
- Active constraint
- An inequality constraint that holds with equality at the optimal solution $x^*$: $g_i(x^*) = 0$. The collection of active constraints is called the active set.
- Barrier function
- A function that goes to infinity at the boundary of the feasible region. The logarithmic barrier $-\sum_i \log(-g_i(x))$ is the most common example. It is used in interior point methods.
- Penalty method
- A method that imposes a penalty $\rho\sum_i \max(0, g_i(x))^2$ for constraint violations. As $\rho \to \infty$, it converges to the solution of the constrained problem.
- Augmented Lagrangian method
- A method that adds a penalty term to the Lagrangian: $L_\rho(x,\lambda) = L(x,\lambda) + \frac{\rho}{2}\sum_i g_i(x)^2$. Numerically more stable than the penalty method.
- ADMM (Alternating Direction Method of Multipliers)
- A variant of the augmented Lagrangian method that splits the problem into separable subproblems and solves them alternately. Widely used in large-scale distributed optimization.
- Simplex method
- An algorithm that solves linear programming problems by traversing the vertices of a polyhedron. Although its worst-case complexity is exponential, it is very fast in practice. → Basic: Simplex Method
- Slack variable
- A nonnegative variable that converts an inequality constraint $g_i(x) \le 0$ into an equality constraint $g_i(x) + s_i = 0,\; s_i \ge 0$. → Basic: Slack Variables
Special Problem Classes
- Linear programming (LP)
- An optimization problem where the objective function and all constraints are linear. → Basic: Linear Programming
- Quadratic programming (QP)
- A problem with a quadratic objective function and linear constraints. → Basic: Quadratic Programming
- Semidefinite programming (SDP)
- A convex optimization problem where the variable is a positive semidefinite matrix $X \succeq 0$. Used in combinatorial optimization relaxations, control theory, and quantum information.
- Second-order cone programming (SOCP)
- A convex problem with constraints of the form $\|Ax + b\| \le c^Tx + d$. It lies between LP and SDP. Used in robust optimization and portfolio optimization.
- DC programming
- A problem where the objective function is expressed as the difference of two convex functions: $f(x) = g(x) - h(x)$. The DCA (DC Algorithm) finds local solutions. It encompasses a broad subclass of nonconvex problems.
- Integer programming (IP)
- A problem where some or all variables are constrained to be integers. Generally NP-hard. → Advanced: Integer Programming
- Combinatorial optimization
- A problem of finding an optimal solution from a discrete set of candidates. The traveling salesman problem (TSP) and the knapsack problem are representative examples. → Advanced: Combinatorial Optimization
- Multi-objective optimization
- A problem of simultaneously optimizing multiple objective functions. The goal is to find the set of Pareto optimal solutions. → Advanced: Multi-Objective Optimization
- Robust optimization
- Optimization that guarantees worst-case performance against parameter uncertainty. → Advanced: Robust Optimization
- Stochastic optimization
- An optimization problem where the objective function or constraints involve random variables. Includes expected value optimization and chance constraints. → Advanced: Optimization Under Uncertainty
Metaheuristics
- Genetic algorithm (GA)
- An optimization method inspired by biological evolution. It improves solutions through repeated selection, crossover, and mutation. → Advanced: Genetic Algorithm
- Simulated annealing (SA)
- A method inspired by the annealing process of metals. It stochastically explores solutions while lowering the temperature, enabling escape from local optima. → Advanced: Simulated Annealing
- Particle swarm optimization (PSO)
- A method inspired by the flocking behavior of birds. Each particle searches while being attracted to both its personal best and the global best of the swarm. → Advanced: Swarm Intelligence
- Differential evolution (DE)
- An evolutionary method that generates new candidate solutions using difference vectors between individuals. It is easy to tune and highly practical. → Advanced: Evolutionary Strategies
- CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
- An evolution strategy that adaptively updates the parameters (mean and covariance matrix) of a multivariate normal distribution. It is a standard method for black-box optimization. → Advanced: Evolutionary Strategies
- Bayesian optimization
- A method that approximates the objective function with a Gaussian process and selects the next evaluation point using an acquisition function. Effective for optimizing expensive-to-evaluate functions. → Advanced: Bayesian Optimization
- Tabu search
- A method that records recently visited solutions in a tabu list and forbids revisiting them while performing local search. Widely used in combinatorial optimization.
References
- S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.
- J. Nocedal, S. Wright, Numerical Optimization, 2nd ed., Springer, 2006.
- Wikipedia: Mathematical optimization