Optimization
Overview
The sangi optimization module provides a wide range of algorithms, from 1D golden-section search through metaheuristics, linear programming, and convex optimization (FISTA/ADMM). It is header-only — including the relevant *_impl.hpp is sufficient.
- 1D minimization — golden section, Brent's method
- Multi-D minimization (gradient-based) — steepest descent, BFGS, Newton-CG, conjugate gradient, L-BFGS
- Multi-D minimization (derivative-free) — Nelder-Mead, Hooke-Jeeves
- Metaheuristics — genetic algorithm, simulated annealing, differential evolution, CMA-ES, jSO
- Nonlinear least squares — Gauss-Newton (with LM regularization), curve fitting
- Constrained optimization — L-BFGS-B, penalty method, augmented Lagrangian
- Linear programming — two-phase revised simplex method
- Convex optimization — FISTA, ADMM
Build
// 1D minimization
#include <math/optimization/optimization_1d.hpp>
// Multi-D minimization (gradient-based + derivative-free + metaheuristics + least squares + constrained)
#include <math/optimization/optimization_nd.hpp>
// Linear programming
#include <math/optimization/linear_programming.hpp>
// Convex optimization (FISTA, ADMM)
#include <math/optimization/ConvexOptimization.hpp>
Declaration/implementation split: each header contains only function declarations. The implementations live in the corresponding *_impl.hpp, which the user includes when template instantiation is needed.
CMake
target_include_directories(my_app PRIVATE <sangi>/include)
Direct compilation (MSVC)
Because of template instantiation, the user must include *_impl.hpp.
// To use 1D minimization
#include <math/optimization/optimization_1d_impl.hpp>
// To use multi-D minimization
#include <math/optimization/optimization_nd_impl.hpp>
Common Types
Header: optimization_base.hpp
OptimizationOptions<T>
| Member | Type | Default | Description |
|---|---|---|---|
tolerance | T | eps * 100 | Convergence tolerance |
max_iterations | size_t | 1000 | Maximum number of iterations |
verbose | bool | false | Whether to print verbose output |
OptimizationResult<P, V>
| Member | Type | Description |
|---|---|---|
point | P | The optimized point |
value | V | The optimized function value |
success | bool | Convergence flag |
iterations | size_t | Number of iterations |
For 1D problems, OptimizationResult1D<T> = OptimizationResult<T, T> is used.
LineSearchParams<T>
| Member | Type | Default | Description |
|---|---|---|---|
alpha_init | T | 0.1 | Initial step size |
armijo_const | T | 1e-4 | Armijo rule constant |
reduction_factor | T | 0.5 | Backtracking reduction factor |
max_iterations | size_t | 20 | Maximum number of backtracking iterations |
1D Minimization
Header: optimization_1d.hpp
golden_section_search
template<concepts::OrderedField T>
OptimizationResult1D<T> golden_section_search(
const std::function<T(T)>& f,
T a, T b,
const OptimizationOptions<T>& options = {});
Golden-section minimization on the interval $[a, b]$. Iteratively shrinks the interval by the golden ratio. Guaranteed convergence, but only linear.
| Argument | Description |
|---|---|
f | Function to minimize |
a, b | Lower and upper bounds of the search interval |
options | Convergence options |
Returns: OptimizationResult1D<T> containing the minimizer and the minimum value.
brent_minimize
template<concepts::OrderedField T>
OptimizationResult1D<T> brent_minimize(
const std::function<T(T)>& f,
T a, T b,
const OptimizationOptions<T>& options = {});
Brent's method on the interval $[a, b]$. Combines parabolic interpolation with golden-section search to achieve superlinear convergence. In practice the best general-purpose choice for 1D minimization.
Multi-D Minimization (Gradient-Based)
Header: optimization_nd.hpp
steepest_descent
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> steepest_descent(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const V& x0,
const OptimizationOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Steepest descent. Steps along $-\nabla f$ with Armijo backtracking line search. Simple to implement but slow to converge — mainly useful as a baseline.
bfgs_minimize
template<concepts::OrderedField T, typename V, typename M>
requires concepts::VectorOf<V, T> && concepts::MatrixOf<M, T>
OptimizationResult<V, T> bfgs_minimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const V& x0,
const OptimizationOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
BFGS quasi-Newton method. Approximates the inverse Hessian $H$ via rank-2 updates and achieves superlinear convergence. The standard choice for medium-scale problems. Note that $O(n^2)$ memory is required.
newton_cg_minimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> newton_cg_minimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const std::function<V(const V&, const V&)>& d2f,
const V& x0,
const OptimizationOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Newton-CG (truncated Newton). Solves the Hessian system $H\mathbf{p} = -\mathbf{g}$ approximately with an inner CG iteration, so the Hessian itself is never stored. d2f(x, v) returns the Hessian-vector product $H(x) v$.
conjugateGradientMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> conjugateGradientMinimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const V& x0,
const ConjugateGradientOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Nonlinear conjugate gradient. Polak-Ribiere+ (default) or Fletcher-Reeves is selectable. Automatically restarts every $n$ steps. Uses $O(n)$ memory, making it well-suited to large-scale problems.
| ConjugateGradientOptions member | Default | Description |
|---|---|---|
restart_interval | 0 (= n) | Restart interval |
use_polak_ribiere | true | false selects Fletcher-Reeves |
lbfgsMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> lbfgsMinimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const V& x0,
const LBFGSOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
L-BFGS (Limited-memory BFGS). Stores the past $m$ gradient differences and computes $H^{-1}\mathbf{g}$ in $O(mn)$ via the two-loop recursion. The standard choice for large-scale problems (1000+ variables).
| LBFGSOptions member | Default | Description |
|---|---|---|
memory_size | 10 | Number of past vectors kept ($m$) |
Multi-D Minimization (Derivative-Free)
Header: optimization_nd.hpp
nelderMeadMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> nelderMeadMinimize(
const std::function<T(const V&)>& f,
const V& x0,
const NelderMeadOptions<T>& options = {});
Nelder-Mead simplex method. Deforms a simplex of $n+1$ vertices via reflection, expansion, contraction, and shrinkage to improve the objective value. Derivative-free and robust, though it becomes less efficient in high dimensions. Convergence is checked using both the function-value spread and the simplex diameter.
| NelderMeadOptions member | Default | Description |
|---|---|---|
initial_step | 1 | Step size for the initial simplex |
reflection | 1 | Reflection coefficient $\alpha$ |
expansion | 2 | Expansion coefficient $\gamma$ |
contraction | 0.5 | Contraction coefficient $\rho$ |
shrink | 0.5 | Shrinkage coefficient $\sigma$ |
hookeJeevesMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> hookeJeevesMinimize(
const std::function<T(const V&)>& f,
const V& x0,
const HookeJeevesOptions<T>& options = {});
Hooke-Jeeves pattern search. Probes each coordinate direction by $\pm\Delta x$ (exploratory move), then makes a pattern move along the improving direction. The step is scaled by $\sqrt{2}$, and the algorithm terminates when $\Delta x$ has dropped below min_step for every variable.
| HookeJeevesOptions member | Default | Description |
|---|---|---|
initial_step | 1 | Initial step size |
min_step | 1e-10 | Convergence step threshold |
Metaheuristics
Header: optimization_nd.hpp
A family of stochastic methods for global optimization. All are derivative-free and require a search box [lower, upper].
geneticAlgorithmMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> geneticAlgorithmMinimize(
const std::function<T(const V&)>& f,
const V& lower, const V& upper,
const GeneticAlgorithmOptions<T>& options = {});
Real-coded genetic algorithm. Uses tournament selection, BLX-$\alpha$ crossover ($\alpha = 0.5$), Gaussian mutation, and elitism.
| GeneticAlgorithmOptions member | Default | Description |
|---|---|---|
population_size | 50 | Population size |
crossover_rate | 0.8 | Crossover rate |
mutation_rate | 0.1 | Mutation rate |
tournament_size | 3 | Tournament size |
elite_count | 2 | Number of elite individuals preserved |
seed | 42 | Random seed |
simulatedAnnealingMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> simulatedAnnealingMinimize(
const std::function<T(const V&)>& f,
const V& lower, const V& upper,
const SimulatedAnnealingOptions<T>& options = {});
Simulated annealing. The Metropolis criterion enables escape from local minima. Uses an exponential cooling schedule $T_{k+1} = r \cdot T_k$.
| SimulatedAnnealingOptions member | Default | Description |
|---|---|---|
initial_temperature | 100 | Initial temperature |
cooling_rate | 0.995 | Cooling rate $r$ |
neighbor_scale | 0.1 | Standard deviation of the neighborhood noise (as a fraction of the range) |
differentialEvolutionMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> differentialEvolutionMinimize(
const std::function<T(const V&)>& f,
const V& lower, const V& upper,
const DifferentialEvolutionOptions<T>& options = {});
Differential evolution (DE/rand/1/bin). A population-based stochastic optimizer, derivative-free, with a random base vector, one difference vector, and binomial crossover.
| DifferentialEvolutionOptions member | Default | Description |
|---|---|---|
mutation_factor | 0.8 | Differential weight $F$ |
crossover_rate | 0.9 | Crossover probability $CR$ |
cmaesMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> cmaesMinimize(
const std::function<T(const V&)>& f,
const V& lower, const V& upper,
const CMAESOptions<T>& options = {});
CMA-ES (Covariance Matrix Adaptation Evolution Strategy). Adaptively updates the covariance matrix $C$ to learn the shape of the search distribution. Robust on ill-conditioned and non-separable problems. Suited to medium-scale problems ($n <$ a few hundred).
| CMAESOptions member | Default | Description |
|---|---|---|
population_size | 0 (auto) | $\lambda$ (0 means $4 + \lfloor 3 \ln n \rfloor$) |
initial_sigma | 0.3 | Initial step size $\sigma$ |
condition_limit | 1e14 | Upper bound on the condition number of $C$ |
jsoMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> jsoMinimize(
const std::function<T(const V&)>& f,
const V& lower, const V& upper,
const JSOOptions<T>& options = {});
jSO (improved adaptive differential evolution). Combines an L-SHADE refinement with the jSO-specific $F/CR$ initialization. Key features include current-to-pbest/1 mutation, success-history-based adaptation, and linear population-size reduction.
| JSOOptions member | Default | Description |
|---|---|---|
max_evaluations | 0 (auto) | Maximum number of function evaluations (0 = $10000D$) |
population_size | 0 (auto) | Initial population size (0 = $25D$) |
history_size | 5 | Length of the success-parameter history $H$ |
p_best_rate | 0.11 | Top-fraction for pbest selection |
Nonlinear Least Squares
Header: optimization_nd.hpp
gaussNewtonMinimize
template<concepts::OrderedField T, typename V, typename M>
requires concepts::VectorOf<V, T> && concepts::MatrixOf<M, T>
OptimizationResult<V, T> gaussNewtonMinimize(
const std::function<V(const V&)>& residuals,
const std::function<M(const V&)>& jacobian,
const V& x0,
const GaussNewtonOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Gauss-Newton with Levenberg-Marquardt regularization. Minimizes the sum of squares $\sum r_i^2$ of the residual function $\mathbf{r}(\mathbf{x})$. Solves the normal equation $(J^T J + \lambda I)\delta = -J^T\mathbf{r}$ and steps with Armijo line search. value returns the sum of squared residuals.
| GaussNewtonOptions member | Default | Description |
|---|---|---|
damping | 1e-6 | LM regularization parameter $\lambda$ |
use_line_search | true | Whether to use Armijo line search |
leastSquaresFit
template<concepts::OrderedField T>
LeastSquaresFitResult<T> leastSquaresFit(
const std::function<Vector<T>(const Vector<T>&)>& residuals,
const std::function<Matrix<T>(const Vector<T>&)>& jacobian,
const Vector<T>& p0,
const LeastSquaresOptions<T>& options = {});
Levenberg-Marquardt nonlinear least-squares fitting. Uses gain-ratio-based adaptive control of $\lambda$ and returns statistics such as the parameter covariance matrix and standard errors.
LeastSquaresFitResult<T>
| Member | Type | Description |
|---|---|---|
parameters | Vector<T> | Optimal parameters |
residuals | Vector<T> | Final residual vector |
covariance | Matrix<T> | Parameter covariance matrix $s^2 (J^T J)^{-1}$ |
standardErrors | Vector<T> | Standard error of each parameter |
chiSquared | T | Sum of squared residuals $\sum r_i^2$ |
reducedChiSquared | T | $\chi^2 / (m - n)$ |
curveFit
template<concepts::OrderedField T>
LeastSquaresFitResult<T> curveFit(
const std::function<T(T, const Vector<T>&)>& model,
const std::vector<T>& xdata,
const std::vector<T>& ydata,
const Vector<T>& p0,
const LeastSquaresOptions<T>& options = {});
High-level curve fitting (analogous to scipy.optimize.curve_fit). Fits a model $y = f(x, \mathbf{p})$ to observed data. The residual and Jacobian are constructed internally via central differences.
| Argument | Description |
|---|---|
model | Model function $f(x, \mathbf{p}) \to y$ |
xdata | Independent variable (length $m$) |
ydata | Dependent variable (length $m$) |
p0 | Initial parameters ($n$ dimensions) |
Constrained Optimization
Header: optimization_nd.hpp
lbfgsbMinimize (box constraints)
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> lbfgsbMinimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const V& x0,
const V& lower, const V& upper,
const LBFGSBOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
L-BFGS-B. Optimization with per-variable bound constraints $l_i \le x_i \le u_i$. Projects onto the feasible set via the gradient projection method and applies the L-BFGS step on the free variables. To leave a variable unconstrained, set lower[i] = -max() and upper[i] = max().
penaltyMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> penaltyMinimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const std::vector<std::function<T(const V&)>>& constraints,
const V& x0,
const ConstrainedOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Penalty method. Converts inequality constraints $c_i(\mathbf{x}) \le 0$ into a penalty term, then performs unconstrained minimization of $\Phi(\mathbf{x}) = f(\mathbf{x}) + \mu \sum \max(0, c_i)^2$. $\mu$ is increased gradually to suppress constraint violation.
augmentedLagrangianMinimize
template<concepts::OrderedField T, typename V>
requires concepts::VectorOf<V, T>
OptimizationResult<V, T> augmentedLagrangianMinimize(
const std::function<T(const V&)>& f,
const std::function<V(const V&)>& df,
const std::vector<std::function<T(const V&)>>& eq_constraints,
const std::vector<std::function<T(const V&)>>& ineq_constraints,
const V& x0,
const ConstrainedOptions<T>& options = {},
const LineSearchParams<T>& line_search_params = {});
Augmented Lagrangian method. Handles both equality constraints $h_i(\mathbf{x}) = 0$ and inequality constraints $g_i(\mathbf{x}) \le 0$. Estimates and updates Lagrange multipliers to avoid the ill-conditioning of a pure penalty method.
ConstrainedOptions<T>
| Member | Default | Description |
|---|---|---|
max_iterations | 100 | Outer-loop iterations |
inner_max_iterations | 1000 | Inner solver iterations |
initial_penalty | 1 | Initial penalty parameter $\mu$ |
penalty_growth | 10 | Growth factor for $\mu$ |
Linear Programming
Header: linear_programming.hpp
simplex
template<typename T>
LPResult<T> simplex(
const LinearProgram<T>& lp,
T eps = T(1e-10),
size_t maxIterations = 10000);
Two-phase revised simplex method. Solves $\min \mathbf{c}^T \mathbf{x}$ subject to linear constraints. Equality (=) and inequality (<=, >=) constraints are all supported.
simplexMaximize
template<typename T>
LPResult<T> simplexMaximize(
const LinearProgram<T>& lp,
T eps = T(1e-10),
size_t maxIterations = 10000);
Maximization wrapper. Negates the objective and calls simplex.
LinearProgram<T>
| Member | Type | Description |
|---|---|---|
objective | std::vector<T> | Coefficients of the (minimization) objective |
objectiveConstant | T | Constant term of the objective |
constraints | std::vector<LinearConstraint<T>> | List of constraints |
LinearConstraint<T>
| Member | Type | Description |
|---|---|---|
coefficients | std::vector<T> | Left-hand-side coefficients |
type | ConstraintType | LessEqual, GreaterEqual, Equal |
rhs | T | Right-hand-side constant |
LPResult<T>
| Member | Type | Description |
|---|---|---|
status | LPStatus | Optimal, Infeasible, Unbounded, MaxIterations |
objectiveValue | T | Optimal objective value |
solution | std::vector<T> | Optimal solution (one value per variable) |
iterations | size_t | Total number of pivot iterations |
Convex Optimization
Header: ConvexOptimization.hpp
fista
template<typename T>
FISTAResult<T> fista(
std::function<Vector<T>(const Vector<T>&)> gradF,
std::function<Vector<T>(const Vector<T>&, T)> proxG,
const Vector<T>& x0,
T L,
size_t maxIter = 1000, T tol = T{1e-8});
FISTA (Fast Iterative Shrinkage-Thresholding Algorithm). Solves $\min f(\mathbf{x}) + g(\mathbf{x})$ where $f$ is a differentiable convex function (Lipschitz constant $L$) and $g$ is a convex function whose proximal operator can be evaluated. Nesterov acceleration yields $O(1/k^2)$ convergence.
| Argument | Description |
|---|---|
gradF | Gradient of $f$ |
proxG | Proximal operator of $g$, $\mathrm{prox}_{\alpha g}(v)$ |
L | Lipschitz constant of $\nabla f$ |
lassoFISTA
template<typename T>
FISTAResult<T> lassoFISTA(
const Matrix<T>& A, const Vector<T>& b,
T lambda, size_t maxIter = 1000, T tol = T{1e-8});
FISTA implementation for Lasso regression. Solves $\min \frac{1}{2}\|A\mathbf{x} - \mathbf{b}\|^2 + \lambda\|\mathbf{x}\|_1$. The Lipschitz constant is estimated automatically via the power method.
admmLasso
template<typename T>
ADMMResult<T> admmLasso(
const Matrix<T>& A, const Vector<T>& b,
T lambda, T rho = T{1},
size_t maxIter = 1000, T tol = T{1e-6});
ADMM (Alternating Direction Method of Multipliers) for Lasso regression. Decomposes the problem via variable splitting $\mathbf{x} = \mathbf{z}$ and penalty $\rho$.
Code Examples
BFGS minimization of the Rosenbrock function
#include <math/optimization/optimization_nd.hpp>
#include <iostream>
using namespace sangi;
int main() {
// Rosenbrock: f(x,y) = (1-x)^2 + 100*(y-x^2)^2
auto f = [](const Vector<double>& x) {
double a = 1.0 - x[0];
double b = x[1] - x[0] * x[0];
return a * a + 100.0 * b * b;
};
auto df = [](const Vector<double>& x) -> Vector<double> {
double dx = -2.0 * (1.0 - x[0]) - 400.0 * x[0] * (x[1] - x[0] * x[0]);
double dy = 200.0 * (x[1] - x[0] * x[0]);
return {dx, dy};
};
Vector<double> x0 = {-1.0, 1.0};
auto result = bfgs_minimize<double, Vector<double>, Matrix<double>>(f, df, x0);
std::cout << "Minimum point: (" << result.point[0] << ", " << result.point[1] << ")\n";
std::cout << "Minimum value: " << result.value << "\n";
std::cout << "Iterations: " << result.iterations << "\n";
// Minimum point: (1, 1), minimum value: 0
}
Curve fitting
#include <math/optimization/optimization_nd.hpp>
#include <iostream>
using namespace sangi;
int main() {
// Model: y = a * exp(-b * x)
auto model = [](double x, const Vector<double>& p) {
return p[0] * std::exp(-p[1] * x);
};
// Observed data
std::vector<double> xdata = {0.0, 0.5, 1.0, 1.5, 2.0, 2.5};
std::vector<double> ydata = {5.0, 3.1, 1.8, 1.2, 0.7, 0.4};
Vector<double> p0 = {4.0, 0.5}; // initial guess
auto result = curveFit<double>(model, xdata, ydata, p0);
std::cout << "a = " << result.parameters[0]
<< " +/- " << result.standardErrors[0] << "\n";
std::cout << "b = " << result.parameters[1]
<< " +/- " << result.standardErrors[1] << "\n";
std::cout << "chi^2/dof = " << result.reducedChiSquared << "\n";
}
Linear programming
#include <math/optimization/linear_programming.hpp>
#include <iostream>
using namespace sangi;
int main() {
// maximize x + 2y subject to x + y <= 4, x <= 3, y <= 3
LinearProgram<double> lp;
lp.objective = {1.0, 2.0}; // simplexMaximize takes the maximization form
lp.constraints = {
{{1.0, 1.0}, ConstraintType::LessEqual, 4.0},
{{1.0, 0.0}, ConstraintType::LessEqual, 3.0},
{{0.0, 1.0}, ConstraintType::LessEqual, 3.0}
};
auto result = simplexMaximize(lp);
if (result.status == LPStatus::Optimal) {
std::cout << "Optimal value: " << result.objectiveValue << "\n";
std::cout << "x = " << result.solution[0]
<< ", y = " << result.solution[1] << "\n";
// Optimal value: 7, x = 1, y = 3
}
}