Optimization

Overview

The sangi optimization module provides a wide range of algorithms, from 1D golden-section search through metaheuristics, linear programming, and convex optimization (FISTA/ADMM). It is header-only — including the relevant *_impl.hpp is sufficient.

  • 1D minimization — golden section, Brent's method
  • Multi-D minimization (gradient-based) — steepest descent, BFGS, Newton-CG, conjugate gradient, L-BFGS
  • Multi-D minimization (derivative-free) — Nelder-Mead, Hooke-Jeeves
  • Metaheuristics — genetic algorithm, simulated annealing, differential evolution, CMA-ES, jSO
  • Nonlinear least squares — Gauss-Newton (with LM regularization), curve fitting
  • Constrained optimization — L-BFGS-B, penalty method, augmented Lagrangian
  • Linear programming — two-phase revised simplex method
  • Convex optimization — FISTA, ADMM

Build

// 1D minimization
#include <math/optimization/optimization_1d.hpp>

// Multi-D minimization (gradient-based + derivative-free + metaheuristics + least squares + constrained)
#include <math/optimization/optimization_nd.hpp>

// Linear programming
#include <math/optimization/linear_programming.hpp>

// Convex optimization (FISTA, ADMM)
#include <math/optimization/ConvexOptimization.hpp>

Declaration/implementation split: each header contains only function declarations. The implementations live in the corresponding *_impl.hpp, which the user includes when template instantiation is needed.

CMake

target_include_directories(my_app PRIVATE <sangi>/include)

Direct compilation (MSVC)

Because of template instantiation, the user must include *_impl.hpp.

// To use 1D minimization
#include <math/optimization/optimization_1d_impl.hpp>
// To use multi-D minimization
#include <math/optimization/optimization_nd_impl.hpp>

Common Types

Header: optimization_base.hpp

OptimizationOptions<T>

MemberTypeDefaultDescription
toleranceTeps * 100Convergence tolerance
max_iterationssize_t1000Maximum number of iterations
verboseboolfalseWhether to print verbose output

OptimizationResult<P, V>

MemberTypeDescription
pointPThe optimized point
valueVThe optimized function value
successboolConvergence flag
iterationssize_tNumber of iterations

For 1D problems, OptimizationResult1D<T> = OptimizationResult<T, T> is used.

LineSearchParams<T>

MemberTypeDefaultDescription
alpha_initT0.1Initial step size
armijo_constT1e-4Armijo rule constant
reduction_factorT0.5Backtracking reduction factor
max_iterationssize_t20Maximum number of backtracking iterations

1D Minimization

Header: optimization_1d.hpp

golden_section_search

template<concepts::OrderedField T>
OptimizationResult1D<T> golden_section_search(
    const std::function<T(T)>& f,
    T a, T b,
    const OptimizationOptions<T>& options = {});

Golden-section minimization on the interval $[a, b]$. Iteratively shrinks the interval by the golden ratio. Guaranteed convergence, but only linear.

ArgumentDescription
fFunction to minimize
a, bLower and upper bounds of the search interval
optionsConvergence options

Returns: OptimizationResult1D<T> containing the minimizer and the minimum value.

brent_minimize

template<concepts::OrderedField T>
OptimizationResult1D<T> brent_minimize(
    const std::function<T(T)>& f,
    T a, T b,
    const OptimizationOptions<T>& options = {});

Brent's method on the interval $[a, b]$. Combines parabolic interpolation with golden-section search to achieve superlinear convergence. In practice the best general-purpose choice for 1D minimization.

Multi-D Minimization (Gradient-Based)

Header: optimization_nd.hpp

steepest_descent

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> steepest_descent(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const V& x0,
    const OptimizationOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Steepest descent. Steps along $-\nabla f$ with Armijo backtracking line search. Simple to implement but slow to converge — mainly useful as a baseline.

bfgs_minimize

template<concepts::OrderedField T, typename V, typename M>
    requires concepts::VectorOf<V, T> && concepts::MatrixOf<M, T>
OptimizationResult<V, T> bfgs_minimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const V& x0,
    const OptimizationOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

BFGS quasi-Newton method. Approximates the inverse Hessian $H$ via rank-2 updates and achieves superlinear convergence. The standard choice for medium-scale problems. Note that $O(n^2)$ memory is required.

newton_cg_minimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> newton_cg_minimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const std::function<V(const V&, const V&)>& d2f,
    const V& x0,
    const OptimizationOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Newton-CG (truncated Newton). Solves the Hessian system $H\mathbf{p} = -\mathbf{g}$ approximately with an inner CG iteration, so the Hessian itself is never stored. d2f(x, v) returns the Hessian-vector product $H(x) v$.

conjugateGradientMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> conjugateGradientMinimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const V& x0,
    const ConjugateGradientOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Nonlinear conjugate gradient. Polak-Ribiere+ (default) or Fletcher-Reeves is selectable. Automatically restarts every $n$ steps. Uses $O(n)$ memory, making it well-suited to large-scale problems.

ConjugateGradientOptions memberDefaultDescription
restart_interval0 (= n)Restart interval
use_polak_ribieretruefalse selects Fletcher-Reeves

lbfgsMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> lbfgsMinimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const V& x0,
    const LBFGSOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

L-BFGS (Limited-memory BFGS). Stores the past $m$ gradient differences and computes $H^{-1}\mathbf{g}$ in $O(mn)$ via the two-loop recursion. The standard choice for large-scale problems (1000+ variables).

LBFGSOptions memberDefaultDescription
memory_size10Number of past vectors kept ($m$)

Multi-D Minimization (Derivative-Free)

Header: optimization_nd.hpp

nelderMeadMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> nelderMeadMinimize(
    const std::function<T(const V&)>& f,
    const V& x0,
    const NelderMeadOptions<T>& options = {});

Nelder-Mead simplex method. Deforms a simplex of $n+1$ vertices via reflection, expansion, contraction, and shrinkage to improve the objective value. Derivative-free and robust, though it becomes less efficient in high dimensions. Convergence is checked using both the function-value spread and the simplex diameter.

NelderMeadOptions memberDefaultDescription
initial_step1Step size for the initial simplex
reflection1Reflection coefficient $\alpha$
expansion2Expansion coefficient $\gamma$
contraction0.5Contraction coefficient $\rho$
shrink0.5Shrinkage coefficient $\sigma$

hookeJeevesMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> hookeJeevesMinimize(
    const std::function<T(const V&)>& f,
    const V& x0,
    const HookeJeevesOptions<T>& options = {});

Hooke-Jeeves pattern search. Probes each coordinate direction by $\pm\Delta x$ (exploratory move), then makes a pattern move along the improving direction. The step is scaled by $\sqrt{2}$, and the algorithm terminates when $\Delta x$ has dropped below min_step for every variable.

HookeJeevesOptions memberDefaultDescription
initial_step1Initial step size
min_step1e-10Convergence step threshold

Metaheuristics

Header: optimization_nd.hpp

A family of stochastic methods for global optimization. All are derivative-free and require a search box [lower, upper].

geneticAlgorithmMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> geneticAlgorithmMinimize(
    const std::function<T(const V&)>& f,
    const V& lower, const V& upper,
    const GeneticAlgorithmOptions<T>& options = {});

Real-coded genetic algorithm. Uses tournament selection, BLX-$\alpha$ crossover ($\alpha = 0.5$), Gaussian mutation, and elitism.

GeneticAlgorithmOptions memberDefaultDescription
population_size50Population size
crossover_rate0.8Crossover rate
mutation_rate0.1Mutation rate
tournament_size3Tournament size
elite_count2Number of elite individuals preserved
seed42Random seed

simulatedAnnealingMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> simulatedAnnealingMinimize(
    const std::function<T(const V&)>& f,
    const V& lower, const V& upper,
    const SimulatedAnnealingOptions<T>& options = {});

Simulated annealing. The Metropolis criterion enables escape from local minima. Uses an exponential cooling schedule $T_{k+1} = r \cdot T_k$.

SimulatedAnnealingOptions memberDefaultDescription
initial_temperature100Initial temperature
cooling_rate0.995Cooling rate $r$
neighbor_scale0.1Standard deviation of the neighborhood noise (as a fraction of the range)

differentialEvolutionMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> differentialEvolutionMinimize(
    const std::function<T(const V&)>& f,
    const V& lower, const V& upper,
    const DifferentialEvolutionOptions<T>& options = {});

Differential evolution (DE/rand/1/bin). A population-based stochastic optimizer, derivative-free, with a random base vector, one difference vector, and binomial crossover.

DifferentialEvolutionOptions memberDefaultDescription
mutation_factor0.8Differential weight $F$
crossover_rate0.9Crossover probability $CR$

cmaesMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> cmaesMinimize(
    const std::function<T(const V&)>& f,
    const V& lower, const V& upper,
    const CMAESOptions<T>& options = {});

CMA-ES (Covariance Matrix Adaptation Evolution Strategy). Adaptively updates the covariance matrix $C$ to learn the shape of the search distribution. Robust on ill-conditioned and non-separable problems. Suited to medium-scale problems ($n <$ a few hundred).

CMAESOptions memberDefaultDescription
population_size0 (auto)$\lambda$ (0 means $4 + \lfloor 3 \ln n \rfloor$)
initial_sigma0.3Initial step size $\sigma$
condition_limit1e14Upper bound on the condition number of $C$

jsoMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> jsoMinimize(
    const std::function<T(const V&)>& f,
    const V& lower, const V& upper,
    const JSOOptions<T>& options = {});

jSO (improved adaptive differential evolution). Combines an L-SHADE refinement with the jSO-specific $F/CR$ initialization. Key features include current-to-pbest/1 mutation, success-history-based adaptation, and linear population-size reduction.

JSOOptions memberDefaultDescription
max_evaluations0 (auto)Maximum number of function evaluations (0 = $10000D$)
population_size0 (auto)Initial population size (0 = $25D$)
history_size5Length of the success-parameter history $H$
p_best_rate0.11Top-fraction for pbest selection

Nonlinear Least Squares

Header: optimization_nd.hpp

gaussNewtonMinimize

template<concepts::OrderedField T, typename V, typename M>
    requires concepts::VectorOf<V, T> && concepts::MatrixOf<M, T>
OptimizationResult<V, T> gaussNewtonMinimize(
    const std::function<V(const V&)>& residuals,
    const std::function<M(const V&)>& jacobian,
    const V& x0,
    const GaussNewtonOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Gauss-Newton with Levenberg-Marquardt regularization. Minimizes the sum of squares $\sum r_i^2$ of the residual function $\mathbf{r}(\mathbf{x})$. Solves the normal equation $(J^T J + \lambda I)\delta = -J^T\mathbf{r}$ and steps with Armijo line search. value returns the sum of squared residuals.

GaussNewtonOptions memberDefaultDescription
damping1e-6LM regularization parameter $\lambda$
use_line_searchtrueWhether to use Armijo line search

leastSquaresFit

template<concepts::OrderedField T>
LeastSquaresFitResult<T> leastSquaresFit(
    const std::function<Vector<T>(const Vector<T>&)>& residuals,
    const std::function<Matrix<T>(const Vector<T>&)>& jacobian,
    const Vector<T>& p0,
    const LeastSquaresOptions<T>& options = {});

Levenberg-Marquardt nonlinear least-squares fitting. Uses gain-ratio-based adaptive control of $\lambda$ and returns statistics such as the parameter covariance matrix and standard errors.

LeastSquaresFitResult<T>

MemberTypeDescription
parametersVector<T>Optimal parameters
residualsVector<T>Final residual vector
covarianceMatrix<T>Parameter covariance matrix $s^2 (J^T J)^{-1}$
standardErrorsVector<T>Standard error of each parameter
chiSquaredTSum of squared residuals $\sum r_i^2$
reducedChiSquaredT$\chi^2 / (m - n)$

curveFit

template<concepts::OrderedField T>
LeastSquaresFitResult<T> curveFit(
    const std::function<T(T, const Vector<T>&)>& model,
    const std::vector<T>& xdata,
    const std::vector<T>& ydata,
    const Vector<T>& p0,
    const LeastSquaresOptions<T>& options = {});

High-level curve fitting (analogous to scipy.optimize.curve_fit). Fits a model $y = f(x, \mathbf{p})$ to observed data. The residual and Jacobian are constructed internally via central differences.

ArgumentDescription
modelModel function $f(x, \mathbf{p}) \to y$
xdataIndependent variable (length $m$)
ydataDependent variable (length $m$)
p0Initial parameters ($n$ dimensions)

Constrained Optimization

Header: optimization_nd.hpp

lbfgsbMinimize (box constraints)

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> lbfgsbMinimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const V& x0,
    const V& lower, const V& upper,
    const LBFGSBOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

L-BFGS-B. Optimization with per-variable bound constraints $l_i \le x_i \le u_i$. Projects onto the feasible set via the gradient projection method and applies the L-BFGS step on the free variables. To leave a variable unconstrained, set lower[i] = -max() and upper[i] = max().

penaltyMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> penaltyMinimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const std::vector<std::function<T(const V&)>>& constraints,
    const V& x0,
    const ConstrainedOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Penalty method. Converts inequality constraints $c_i(\mathbf{x}) \le 0$ into a penalty term, then performs unconstrained minimization of $\Phi(\mathbf{x}) = f(\mathbf{x}) + \mu \sum \max(0, c_i)^2$. $\mu$ is increased gradually to suppress constraint violation.

augmentedLagrangianMinimize

template<concepts::OrderedField T, typename V>
    requires concepts::VectorOf<V, T>
OptimizationResult<V, T> augmentedLagrangianMinimize(
    const std::function<T(const V&)>& f,
    const std::function<V(const V&)>& df,
    const std::vector<std::function<T(const V&)>>& eq_constraints,
    const std::vector<std::function<T(const V&)>>& ineq_constraints,
    const V& x0,
    const ConstrainedOptions<T>& options = {},
    const LineSearchParams<T>& line_search_params = {});

Augmented Lagrangian method. Handles both equality constraints $h_i(\mathbf{x}) = 0$ and inequality constraints $g_i(\mathbf{x}) \le 0$. Estimates and updates Lagrange multipliers to avoid the ill-conditioning of a pure penalty method.

ConstrainedOptions<T>

MemberDefaultDescription
max_iterations100Outer-loop iterations
inner_max_iterations1000Inner solver iterations
initial_penalty1Initial penalty parameter $\mu$
penalty_growth10Growth factor for $\mu$

Linear Programming

Header: linear_programming.hpp

simplex

template<typename T>
LPResult<T> simplex(
    const LinearProgram<T>& lp,
    T eps = T(1e-10),
    size_t maxIterations = 10000);

Two-phase revised simplex method. Solves $\min \mathbf{c}^T \mathbf{x}$ subject to linear constraints. Equality (=) and inequality (<=, >=) constraints are all supported.

simplexMaximize

template<typename T>
LPResult<T> simplexMaximize(
    const LinearProgram<T>& lp,
    T eps = T(1e-10),
    size_t maxIterations = 10000);

Maximization wrapper. Negates the objective and calls simplex.

LinearProgram<T>

MemberTypeDescription
objectivestd::vector<T>Coefficients of the (minimization) objective
objectiveConstantTConstant term of the objective
constraintsstd::vector<LinearConstraint<T>>List of constraints

LinearConstraint<T>

MemberTypeDescription
coefficientsstd::vector<T>Left-hand-side coefficients
typeConstraintTypeLessEqual, GreaterEqual, Equal
rhsTRight-hand-side constant

LPResult<T>

MemberTypeDescription
statusLPStatusOptimal, Infeasible, Unbounded, MaxIterations
objectiveValueTOptimal objective value
solutionstd::vector<T>Optimal solution (one value per variable)
iterationssize_tTotal number of pivot iterations

Convex Optimization

Header: ConvexOptimization.hpp

fista

template<typename T>
FISTAResult<T> fista(
    std::function<Vector<T>(const Vector<T>&)> gradF,
    std::function<Vector<T>(const Vector<T>&, T)> proxG,
    const Vector<T>& x0,
    T L,
    size_t maxIter = 1000, T tol = T{1e-8});

FISTA (Fast Iterative Shrinkage-Thresholding Algorithm). Solves $\min f(\mathbf{x}) + g(\mathbf{x})$ where $f$ is a differentiable convex function (Lipschitz constant $L$) and $g$ is a convex function whose proximal operator can be evaluated. Nesterov acceleration yields $O(1/k^2)$ convergence.

ArgumentDescription
gradFGradient of $f$
proxGProximal operator of $g$, $\mathrm{prox}_{\alpha g}(v)$
LLipschitz constant of $\nabla f$

lassoFISTA

template<typename T>
FISTAResult<T> lassoFISTA(
    const Matrix<T>& A, const Vector<T>& b,
    T lambda, size_t maxIter = 1000, T tol = T{1e-8});

FISTA implementation for Lasso regression. Solves $\min \frac{1}{2}\|A\mathbf{x} - \mathbf{b}\|^2 + \lambda\|\mathbf{x}\|_1$. The Lipschitz constant is estimated automatically via the power method.

admmLasso

template<typename T>
ADMMResult<T> admmLasso(
    const Matrix<T>& A, const Vector<T>& b,
    T lambda, T rho = T{1},
    size_t maxIter = 1000, T tol = T{1e-6});

ADMM (Alternating Direction Method of Multipliers) for Lasso regression. Decomposes the problem via variable splitting $\mathbf{x} = \mathbf{z}$ and penalty $\rho$.

Code Examples

BFGS minimization of the Rosenbrock function

#include <math/optimization/optimization_nd.hpp>
#include <iostream>
using namespace sangi;

int main() {
    // Rosenbrock: f(x,y) = (1-x)^2 + 100*(y-x^2)^2
    auto f = [](const Vector<double>& x) {
        double a = 1.0 - x[0];
        double b = x[1] - x[0] * x[0];
        return a * a + 100.0 * b * b;
    };

    auto df = [](const Vector<double>& x) -> Vector<double> {
        double dx = -2.0 * (1.0 - x[0]) - 400.0 * x[0] * (x[1] - x[0] * x[0]);
        double dy = 200.0 * (x[1] - x[0] * x[0]);
        return {dx, dy};
    };

    Vector<double> x0 = {-1.0, 1.0};
    auto result = bfgs_minimize<double, Vector<double>, Matrix<double>>(f, df, x0);

    std::cout << "Minimum point: (" << result.point[0] << ", " << result.point[1] << ")\n";
    std::cout << "Minimum value: " << result.value << "\n";
    std::cout << "Iterations:    " << result.iterations << "\n";
    // Minimum point: (1, 1), minimum value: 0
}

Curve fitting

#include <math/optimization/optimization_nd.hpp>
#include <iostream>
using namespace sangi;

int main() {
    // Model: y = a * exp(-b * x)
    auto model = [](double x, const Vector<double>& p) {
        return p[0] * std::exp(-p[1] * x);
    };

    // Observed data
    std::vector<double> xdata = {0.0, 0.5, 1.0, 1.5, 2.0, 2.5};
    std::vector<double> ydata = {5.0, 3.1, 1.8, 1.2, 0.7, 0.4};

    Vector<double> p0 = {4.0, 0.5};  // initial guess
    auto result = curveFit<double>(model, xdata, ydata, p0);

    std::cout << "a = " << result.parameters[0]
              << " +/- " << result.standardErrors[0] << "\n";
    std::cout << "b = " << result.parameters[1]
              << " +/- " << result.standardErrors[1] << "\n";
    std::cout << "chi^2/dof = " << result.reducedChiSquared << "\n";
}

Linear programming

#include <math/optimization/linear_programming.hpp>
#include <iostream>
using namespace sangi;

int main() {
    // maximize x + 2y  subject to  x + y <= 4,  x <= 3,  y <= 3
    LinearProgram<double> lp;
    lp.objective = {1.0, 2.0};  // simplexMaximize takes the maximization form
    lp.constraints = {
        {{1.0, 1.0}, ConstraintType::LessEqual, 4.0},
        {{1.0, 0.0}, ConstraintType::LessEqual, 3.0},
        {{0.0, 1.0}, ConstraintType::LessEqual, 3.0}
    };
    auto result = simplexMaximize(lp);

    if (result.status == LPStatus::Optimal) {
        std::cout << "Optimal value: " << result.objectiveValue << "\n";
        std::cout << "x = " << result.solution[0]
                  << ", y = " << result.solution[1] << "\n";
        // Optimal value: 7, x = 1, y = 3
    }
}