Constrained Optimization, LP, and Convex Programming
Overview
Constrained optimisation solves
$$\min_{\mathbf{x}} f(\mathbf{x}) \quad \text{subject to} \quad \mathbf{g}(\mathbf{x}) \leq 0, \;\; \mathbf{h}(\mathbf{x}) = 0,$$
with $\mathbf{x} \in \mathbb{R}^n$. Algorithms differ along linearity, convexity, smoothness, and the availability
of derivatives. sangi ships linear programming, nonlinear least squares, bound-constrained L-BFGS-B,
penalty, augmented Lagrangian, and proximal convex optimisation (FISTA, ADMM) in
optimization_nd.hpp / linear_programming.hpp / ConvexOptimization.hpp.
See Optimization-unconstrained for unconstrained methods and Optimization-derivative-free for derivative-free approaches.
Linear programming (LP)
$$\min \mathbf{c}^T \mathbf{x} \quad \text{subject to} \quad A \mathbf{x} \;\{\leq, =, \geq\}\; \mathbf{b}, \quad \mathbf{x} \geq 0.$$
Transportation, production planning, portfolio optimisation, and network flow are classical applications.
Simplex method
Dantzig (1947) walks along vertices of the feasible polytope. Each iteration:
- Compute reduced costs from the current basis via dual variables.
- Pick an incoming non-basic variable with negative reduced cost; pick an outgoing basic variable using a pivot rule (Bland, largest improvement, etc.).
- Update the basis and move to the next vertex.
Worst-case complexity is exponential (Klee-Minty), but in practice the method is polynomial. The revised simplex stores an LU factorisation of the basis instead of its inverse, scaling gracefully on sparse large LPs.
Two-phase method
When no obvious basic feasible solution exists, Phase I solves an auxiliary LP with artificial variables to
obtain one, and Phase II minimises the original objective. sangi
simplex implements two-phase revised simplex with
$\leq, =, \geq$ constraints.
Duality
The primal $\min \mathbf{c}^T \mathbf{x}$ s.t. $A \mathbf{x} \leq \mathbf{b}, \mathbf{x} \geq 0$ has dual $\max \mathbf{b}^T \mathbf{y}$ s.t. $A^T \mathbf{y} \leq \mathbf{c}, \mathbf{y} \geq 0$, and strong duality makes their optima coincide. Duals provide sensitivity analysis and the economic interpretation of shadow prices.
See also: Linear programming / Simplex method / Duality
Interior-point and convex programming
Where simplex walks the boundary, interior-point methods cut through the interior toward the optimum.
Primal-dual path follower
Relax $\mathbf{x} \geq 0$ with a logarithmic barrier $-\mu \sum_i \log x_i$:
$$\min \mathbf{c}^T \mathbf{x} - \mu \sum_i \log x_i \quad \text{subject to} \quad A \mathbf{x} = \mathbf{b}.$$
The minimiser converges to the LP optimum as $\mu \to 0$. Newton iterations chase the central path. Karmarkar (1984) established polynomial time, and Mehrotra (1992) predictor-corrector dramatically improved practical performance. On large dense LPs interior-point dominates.
Convex programming, SOCP, SDP
Interior-point extends beyond LP:
- QP: $\min \tfrac{1}{2} \mathbf{x}^T Q \mathbf{x} + \mathbf{c}^T \mathbf{x}$, $A \mathbf{x} \leq \mathbf{b}$ (SVMs, MPC, etc.).
- SOCP: $\|A_i \mathbf{x} + \mathbf{b}_i\| \leq \mathbf{c}_i^T \mathbf{x} + d_i$ (robust LP, portfolio optimisation).
- SDP: matrix variables with positive semidefinite constraints (LMI control, combinatorial relaxations).
The self-concordant barrier theory (Nesterov-Nemirovski 1994) unifies all of these. The current sangi ships only LP simplex; QP, SOCP, and SDP are future work.
See also: Interior-point methods / Convex optimisation / Quadratic programming
KKT conditions and Lagrange multipliers
For the constrained problem with $\mathbf{h}(\mathbf{x}) = 0$ and $\mathbf{g}(\mathbf{x}) \leq 0$, define the Lagrangian
$$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = f(\mathbf{x}) + \boldsymbol{\lambda}^T \mathbf{h}(\mathbf{x}) + \boldsymbol{\mu}^T \mathbf{g}(\mathbf{x}).$$
If $\mathbf{x}^*$ is a local minimiser and constraint qualifications (e.g. LICQ) hold, there exist $\boldsymbol{\lambda}^*, \boldsymbol{\mu}^*$ such that the KKT conditions hold:
| Condition | Statement |
|---|---|
| Stationarity | $\nabla_x \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*) = 0$ |
| Primal feasibility | $\mathbf{h}(\mathbf{x}^*) = 0, \;\; \mathbf{g}(\mathbf{x}^*) \leq 0$ |
| Dual feasibility | $\boldsymbol{\mu}^* \geq 0$ |
| Complementary slackness | $\mu_i^* \cdot g_i(\mathbf{x}^*) = 0 \;\; \forall i$ |
Under convexity ($f, g_i$ convex, $h_j$ affine) the conditions are also sufficient. Most constrained algorithms (SQP, interior-point, active-set) iteratively satisfy the KKT system.
See also: KKT conditions / Lagrange multipliers / Constrained optimisation
SQP and active-set methods
SQP (Sequential Quadratic Programming)
SQP replaces the nonlinear problem with a sequence of QP sub-problems. Each iteration solves
$$\min_{\mathbf{p}} \; \nabla f(\mathbf{x}_k)^T \mathbf{p} + \tfrac{1}{2} \mathbf{p}^T H_k \mathbf{p}$$ $$\text{s.t.} \;\; \nabla h_j(\mathbf{x}_k)^T \mathbf{p} + h_j(\mathbf{x}_k) = 0, \;\; \nabla g_i(\mathbf{x}_k)^T \mathbf{p} + g_i(\mathbf{x}_k) \leq 0,$$
where $H_k$ is the Lagrangian Hessian or its BFGS approximation. The method achieves quadratic convergence and underlies SNOPT, NLPQL, and similar solvers. SQP is not yet in sangi; it is a candidate addition.
Active-set methods
Assume the inequalities currently active at the iterate, solve the resulting equality-constrained sub-problem, and update the active set based on Lagrange multiplier signs or KKT residuals. Active-set methods are classical for QP and excel on small-to-medium problems with sparse constraint structure.
Penalty and augmented Lagrangian
Penalty methods
Fold constraint violations into the objective and minimise it unconstrained:
$$\Phi_\mu(\mathbf{x}) = f(\mathbf{x}) + \mu \sum_i \max(0, g_i(\mathbf{x}))^2 + \mu \sum_j h_j(\mathbf{x})^2.$$
Constraints are satisfied in the limit $\mu \to \infty$. Simple to implement but the conditioning of the inner problem deteriorates with $\mu$.
sangi penaltyMinimize handles inequality constraints by
increasing $\mu$ in stages.
Augmented Lagrangian
Add Lagrange multiplier estimates to the penalty:
$$\mathcal{L}_A(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}; \rho) = f(\mathbf{x}) + \boldsymbol{\lambda}^T \mathbf{h}(\mathbf{x}) + \tfrac{\rho}{2} \|\mathbf{h}(\mathbf{x})\|^2 + \cdots$$
The outer loop updates $\boldsymbol{\lambda} \leftarrow \boldsymbol{\lambda} + \rho \, \mathbf{h}(\mathbf{x}_k)$. Constraints become tight without forcing $\rho \to \infty$, sidestepping the ill-conditioning of pure penalty methods. Also known as Powell-Hestenes-Rockafellar.
sangi augmentedLagrangianMinimize handles both equality
and inequality constraints.
Box constraints (L-BFGS-B)
When each variable has an independent lower and upper bound $l_i \leq x_i \leq u_i$, gradient projection combined with L-BFGS gives an efficient solver. L-BFGS-B (Byrd, Lu, Nocedal, Zhu 1995) is the standard implementation.
Each iteration:
- Compute the Cauchy point by projecting the gradient onto the feasible set; determine the active set.
- Take an L-BFGS step on the free variables ($l_i < x_i < u_i$).
- Project the resulting step into the box via line search.
Use $l_i = -\infty$ or $u_i = +\infty$ to leave a variable unbounded. sangi
lbfgsbMinimize implements this scheme.
Nonlinear least squares (Gauss-Newton, LM)
For residuals $\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m$ ($m \geq n$), minimise $\tfrac{1}{2} \|\mathbf{r}(\mathbf{x})\|^2$. Curve fitting, robotics, and bundle adjustment all reduce to NLS.
Gauss-Newton
Linearise the residuals as $\mathbf{r}(\mathbf{x} + \mathbf{p}) \approx \mathbf{r}(\mathbf{x}) + J \mathbf{p}$ and solve the normal equations:
$$J^T J \, \mathbf{p} = -J^T \mathbf{r}.$$
Quadratic convergence on small residuals. The omitted second-order Hessian term hurts when $J^T J$ is ill-conditioned or rank-deficient.
Levenberg-Marquardt
Add damping:
$$(J^T J + \lambda I) \mathbf{p} = -J^T \mathbf{r}.$$
Large $\lambda$ behaves like steepest descent; small $\lambda$ matches Gauss-Newton. Update $\lambda$ from the gain ratio $\rho = \frac{\Delta f_{\text{actual}}}{\Delta f_{\text{predicted}}}$ in trust-region style (Marquardt 1963).
sangi gaussNewtonMinimize uses fixed $\lambda$ with Armijo
line search; leastSquaresFit implements the standard
adaptive LM. curveFit provides a high-level model-fit API.
Covariance and statistics
After convergence the parameter covariance is $\mathrm{Cov}(\hat{\mathbf{p}}) = s^2 (J^T J)^{-1}$ with
$s^2 = \|\mathbf{r}\|^2 / (m - n)$. Standard errors, $\chi^2$, and reduced $\chi^2$ are returned by
LeastSquaresFitResult.
See also: Nonlinear least squares
FISTA and ADMM
Modern machine learning and signal processing rely on non-smooth convex optimisation (L1 regularisation, total variation, etc.). Two algorithms dominate this regime.
FISTA
Beck and Teboulle (2009) handle the composite problem
$$\min_{\mathbf{x}} \; f(\mathbf{x}) + g(\mathbf{x}),$$
where $f$ is smooth convex with $L$-Lipschitz gradient and $g$ is a convex function with cheap proximal operator $\mathrm{prox}_{\alpha g}(v) = \arg\min_u \{g(u) + \frac{1}{2\alpha} \|u - v\|^2\}$.
Iteration:
$$\mathbf{y}_k = \mathbf{x}_k + \frac{t_{k-1} - 1}{t_k} (\mathbf{x}_k - \mathbf{x}_{k-1})$$ $$\mathbf{x}_{k+1} = \mathrm{prox}_{(1/L) g}\bigl(\mathbf{y}_k - \tfrac{1}{L} \nabla f(\mathbf{y}_k)\bigr)$$ $$t_{k+1} = \tfrac{1 + \sqrt{1 + 4 t_k^2}}{2}.$$
Nesterov-style momentum gives $O(1/k^2)$ convergence. Lasso ($g = \lambda \|\cdot\|_1$, prox = soft thresholding), TV restoration, and low-rank matrix recovery all run through FISTA.
sangi fista is the general routine, and
lassoFISTA is the Lasso specialisation (with $L$
estimated by power iteration).
ADMM
Going back to Glowinski-Marrocco (1975) and Gabay-Mercier (1976), ADMM solves
$$\min \; f(\mathbf{x}) + g(\mathbf{z}) \quad \text{subject to} \quad A \mathbf{x} + B \mathbf{z} = \mathbf{c}$$
by alternating proximal minimisations of an augmented Lagrangian and updating a dual variable:
$$\mathbf{x}_{k+1} = \arg\min_{\mathbf{x}} \; f(\mathbf{x}) + \tfrac{\rho}{2} \|A \mathbf{x} + B \mathbf{z}_k - \mathbf{c} + \mathbf{u}_k\|^2$$ $$\mathbf{z}_{k+1} = \arg\min_{\mathbf{z}} \; g(\mathbf{z}) + \tfrac{\rho}{2} \|A \mathbf{x}_{k+1} + B \mathbf{z} - \mathbf{c} + \mathbf{u}_k\|^2$$ $$\mathbf{u}_{k+1} = \mathbf{u}_k + A \mathbf{x}_{k+1} + B \mathbf{z}_{k+1} - \mathbf{c}.$$
Variable splitting decomposes complex problems into two proximal subproblems, exploiting any structure in
$f$ and $g$. ADMM is the workhorse of distributed and consensus optimisation. sangi
admmLasso is the Lasso specialisation.
For a thorough survey see Boyd et al. (2011), Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.
Dispatch in sangi
| Problem shape | Recommended function (sangi) |
|---|---|
| Linear objective and constraints (LP) | simplex / simplexMaximize |
| Box constraints $l \leq x \leq u$ only | lbfgsbMinimize |
| Inequalities only, gradient available | penaltyMinimize |
| Mixed equality + inequality, gradient | augmentedLagrangianMinimize |
| Nonlinear least squares $\sum r_i(x)^2$ | gaussNewtonMinimize / leastSquaresFit |
| Curve fitting $y = f(x, p)$ | curveFit |
| Lasso $\frac{1}{2}\|Ax - b\|^2 + \lambda \|x\|_1$ | lassoFISTA / admmLasso |
| General composite convex $f + g$ (prox-able $g$) | fista |
| Constrained, no gradient | cmaesMinimize with manual penalty (derivative-free) |
Scaling matters
Constrained optimisation is sensitive to variable and constraint scaling. Normalising variables to roughly $[-1, 1]$ and matching constraint magnitudes drastically improves conditioning. In LP, row/column equilibration is standard; in NLS, Jacobian column scaling pays off.
References
- Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2nd ed. Springer.
- Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
- Bertsekas, D. P. (1999). Nonlinear Programming, 2nd ed. Athena Scientific.
- Marquardt, D. W. (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". SIAM J. Appl. Math., 11, 431–441.
- Beck, A. and Teboulle, M. (2009). "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems". SIAM J. Imaging Sci., 2, 183–202.
- Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers". Foundations and Trends in Machine Learning, 3, 1–122.