Skip to content

Minimisation¤

In addition to the following, note that the Optax library offers an extensive collection of minimisers via first-order gradient methods -- as are in widespread use for neural networks. If you would like to use these through the Optimistix API then an optimistix.OptaxMinimiser wrapper is provided.

optimistix.minimise(fn: Callable[[~Y, Any], tuple[Shaped[Array, ''], ~Aux]] | Callable[[~Y, Any], Shaped[Array, '']], solver: optimistix.AbstractMinimiser, y0: ~Y, args: PyTree[Any] = None, options: dict[str, Any] | None = None, *, has_aux: bool = False, max_steps: int | None = 256, adjoint: optimistix.AbstractAdjoint = optimistix.ImplicitAdjoint(), throw: bool = True, tags: frozenset[object] = frozenset()) -> optimistix.Solution[~Y, ~Aux] ¤

Minimise a function.

This minimises a nonlinear function fn(y, args) which returns a scalar value.

Arguments:

  • fn: The objective function. This should take two arguments: fn(y, args) and return a scalar.
  • solver: The minimiser solver to use. This should be an optimistix.AbstractMinimiser.
  • y0: An initial guess for what y may be.
  • args: Passed as the args of fn(y, args).
  • options: Individual solvers may accept additional runtime arguments. See each individual solver's documentation for more details.
  • has_aux: If True, then fn may return a pair, where the first element is its function value, and the second is just auxiliary data. Keyword only argument.
  • max_steps: The maximum number of steps the solver can take. Keyword only argument.
  • adjoint: The adjoint method used to compute gradients through the fixed-point solve. Keyword only argument.
  • throw: How to report any failures. (E.g. an iterative solver running out of steps, or encountering divergent iterates.) If True then a failure will raise an error. If False then the returned solution object will have a result field indicating whether any failures occured. (See optimistix.Solution.) Keyword only argument.
  • tags: Lineax tags describing any structure of the Hessian of fn with respect to y. Used with optimistix.ImplicitAdjoint to implement the implicit function theorem as efficiently as possible. Keyword only argument.

Returns:

An optimistix.Solution object.


optimistix.minimise supports any of the following minimisers.

optimistix.AbstractMinimiser

optimistix.AbstractMinimiser ¤

Abstract base class for all minimisers.

init(fn: Callable[[~Y, Any], tuple[~Out, ~Aux]], y: ~Y, args: PyTree, options: dict[str, Any], f_struct: PyTree[jax.ShapeDtypeStruct], aux_struct: PyTree[jax.ShapeDtypeStruct], tags: frozenset[object]) -> ~SolverState ¤

Perform all initial computation needed to initialise the solver state.

For example, the optimistix.Chord method computes the Jacobian df/dy with respect to the initial guess y, and then uses it throughout the computation.

Arguments:

  • fn: The function to iterate over. This is expected to take two argumetns fn(y, args) and return a pytree of arrays in the first element, and any auxiliary data in the second argument.
  • y: The value of y at the current (first) iteration.
  • args: Passed as the args of fn(y, args).
  • options: Individual solvers may accept additional runtime arguments. See each individual solver's documentation for more details.
  • f_struct: A pytree of jax.ShapeDtypeStructs of the same shape as the output of fn. This is used to initialise any information in the state which may rely on the pytree structure, array shapes, or dtype of the output of fn.
  • aux_struct: A pytree of jax.ShapeDtypeStructs of the same shape as the auxiliary data returned by fn.
  • tags: exact meaning depends on whether this is a fixed point, root find, least squares, or minimisation problem; see their relevant entry points.

Returns:

A PyTree representing the initial state of the solver.

step(fn: Callable[[~Y, Any], tuple[~Out, ~Aux]], y: ~Y, args: PyTree, options: dict[str, Any], state: ~SolverState, tags: frozenset[object]) -> tuple[~Y, ~SolverState, ~Aux] ¤

Perform one step of the iterative solve.

Arguments:

  • fn: The function to iterate over. This is expected to take two argumetns fn(y, args) and return a pytree of arrays in the first element, and any auxiliary data in the second argument.
  • y: The value of y at the current (first) iteration.
  • args: Passed as the args of fn(y, args).
  • options: Individual solvers may accept additional runtime arguments. See each individual solver's documentation for more details.
  • state: A pytree representing the state of a solver. The shape of this pytree is solver-dependent.
  • tags: exact meaning depends on whether this is a fixed point, root find, least squares, or minimisation problem; see their relevant entry points.

Returns:

A 3-tuple containing the new y value in the first element, the next solver state in the second element, and the aux output of fn(y, args) in the third element.

terminate(fn: Callable[[~Y, Any], tuple[~Out, ~Aux]], y: ~Y, args: PyTree, options: dict[str, Any], state: ~SolverState, tags: frozenset[object]) -> tuple[Bool[Array, ''], optimistix.RESULTS] ¤

Determine whether or not to stop the iterative solve.

Arguments:

  • fn: The function to iterate over. This is expected to take two argumetns fn(y, args) and return a pytree of arrays in the first element, and any auxiliary data in the second argument.
  • y: The value of y at the current iteration.
  • args: Passed as the args of fn(y, args).
  • options: Individual solvers may accept additional runtime arguments. See each individual solver's documentation for more details.
  • state: A pytree representing the state of a solver. The shape of this pytree is solver-dependent.
  • tags: exact meaning depends on whether this is a fixed point, root find, least squares, or minimisation problem; see their relevant entry points.

Returns:

A 2-tuple containing a bool indicating whether or not to stop iterating in the first element, and an optimistix.RESULTS object in the second element.

postprocess(fn: Callable[[~Y, Any], tuple[~Out, ~Aux]], y: ~Y, aux: ~Aux, args: PyTree, options: dict[str, Any], state: ~SolverState, tags: frozenset[object], result: optimistix.RESULTS) -> tuple[~Y, ~Aux, dict[str, Any]] ¤

Any final postprocessing to perform on the result of the solve.

Arguments:

  • fn: The function to iterate over. This is expected to take two argumetns fn(y, args) and return a pytree of arrays in the first element, and any auxiliary data in the second argument.
  • y: The value of y at the last iteration.
  • aux: The auxiliary output at the last iteration.
  • args: Passed as the args of fn(y, args).
  • options: Individual solvers may accept additional runtime arguments. See each individual solver's documentation for more details.
  • state: A pytree representing the final state of a solver. The shape of this pytree is solver-dependent.
  • tags: exact meaning depends on whether this is a fixed point, root find, least squares, or minimisation problem; see their relevant entry points.
  • result: as returned by the final call to terminate.

Returns:

A 3-tuple of:

  • final_y: the final y to return as the solution of the solve.
  • final_aux: the final aux to return as the auxiliary output of the solve.
  • stats: any additional information to place in the sol.stats dictionary.

Info

Most solvers will not need to use this, so that this method may be defined as:

def postprocess(self, fn, y, aux, args, options, state, tags, result):
    return y, aux, {}

optimistix.AbstractGradientDescent

optimistix.AbstractGradientDescent(optimistix.AbstractMinimiser) ¤

The gradient descent method for unconstrained minimisation.

At every step, this algorithm performs a line search along the steepest descent direction. You should subclass this to provide it with a particular choice of line search. (E.g. optimistix.GradientDescent uses a simple learning rate step.)

Subclasses must provide the following abstract attributes, with the following types:

  • rtol: float
  • atol: float
  • norm: Callable[[PyTree], Scalar]
  • descent: AbstractDescent
  • search: AbstractSearch

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.

optimistix.GradientDescent(optimistix.AbstractGradientDescent) ¤

Classic gradient descent with a learning rate learning_rate.

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
__init__(learning_rate: float, rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>) ¤

Arguments:

  • learning_rate: Specifies a constant learning rate to use at each step.
  • rtol: Relative tolerance for terminating the solve.
  • atol: Absolute tolerance for terminating the solve.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm.

optimistix.AbstractQuasiNewton

optimistix.AbstractQuasiNewton(optimistix.AbstractMinimiser) ¤

Abstract quasi-Newton minimisation algorithm.

Base class for quasi-Newton solvers, which create approximations to the Hessian or the inverse Hessian by accumulating gradient information over multiple iterations. Optimistix currently includes the following three variants: optimistix.BFGS, optimistix.DFP and optimistix.LBFGS, each of which may be used to either approximate the Hessian or its inverse. The concrete classes may be subclassed to choose alternative descents and searches.

Alternative flavors of quasi-Newton approximations may be implemented by subclassing AbstractQuasiNewton and providing implementations for the abstract methods update_init and update_call. The former is called to initialize the Hessian structure and the Hessian update state, while the latter is called to compute an update to the approximation of the Hessian or the inverse Hessian.

Already supported schemes to form inverse Hessian and Hessian approximations are implemented in optimistix.AbstractBFGS, optimistix.AbstractDFP and optimistix.AbstractLBFGS.

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
init_hessian(y: ~Y, f: Shaped[Array, ''], grad: ~Y) -> tuple[~_Hessian, ~HessianUpdateState] ¤

Initialize the Hessian structure and Hessian update state.

Set up a template structure of the Hessian to be used (with dummy values), as well as the state of the update method, which can be used to store past gradients for limited-memory Hessian approximations.

update_hessian(y: ~Y, y_eval: ~Y, f_info: ~_Hessian, f_eval_info: optimistix._search.FunctionInfo.EvalGrad, hessian_update_state: ~HessianUpdateState) -> tuple[~_Hessian, ~HessianUpdateState] ¤

Update the Hessian approximation.

This is called in the step method to update the Hessian approximation based on the current and previous iterates, their gradients, and the previous Hessian, whenever a step has been accepted and we query the descent for a new direction.

Implementations should provide an update for the Hessian approximation or its inverse, and toggle updates as appropriate to maintain positive-definiteness of the operator.

optimistix.AbstractBFGS

optimistix.AbstractBFGS(optimistix.AbstractQuasiNewton) ¤

Abstract version of the BFGS (Broyden–Fletcher–Goldfarb–Shanno) minimisation algorithm. This class may be subclassed to implement custom solvers with alternative searches and descent methods that use the BFGS update to approximate the Hessian or the inverse Hessian.

optimistix.BFGS(optimistix.AbstractBFGS) ¤

BFGS (Broyden–Fletcher–Goldfarb–Shanno) minimisation algorithm.

This is a quasi-Newton optimisation algorithm, whose defining feature is the way it progressively builds up a Hessian approximation using multiple steps of gradient information. Uses the Broyden-Fletcher-Goldfarb-Shanno formula to compute the updates to the Hessian and or to the Hessian inverse. See https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm.

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
__init__(rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>, use_inverse: bool = True, verbose: frozenset[str] = frozenset()) ¤

Arguments:

  • rtol: Relative tolerance for terminating the solve.
  • atol: Absolute tolerance for terminating the solve.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm.
  • use_inverse: The BFGS algorithm involves computing matrix-vector products of the form B^{-1} g, where B is an approximation to the Hessian of the function to be minimised. This means we can either (a) store the approximate Hessian B, and do a linear solve on every step, or (b) store the approximate Hessian inverse B^{-1}, and do a matrix-vector product on every step. Option (a) is generally cheaper for sparse Hessians (as the inverse may be dense). Option (b) is generally cheaper for dense Hessians (as matrix-vector products are cheaper than linear solves). The default is (b), denoted via use_inverse=True. Note that this is incompatible with searches like optimistix.ClassicalTrustRegion, which use the Hessian approximation B as part of their computations.
  • verbose: Whether to print out extra information about how the solve is proceeding. Should be a frozenset of strings, specifying what information to print. Valid entries are step_size, loss, y. For example verbose=frozenset({"step_size", "loss"}).
optimistix.AbstractDFP

optimistix.AbstractDFP(optimistix.AbstractQuasiNewton) ¤

Abstract version of the DFP (Davidon–Fletcher–Powell) minimisation algorithm. This class may be subclassed to implement custom solvers with alternative searches and descent methods that use the DFP update to approximate the Hessian or the inverse Hessian.

optimistix.DFP(optimistix.AbstractDFP) ¤

DFP (Davidon–Fletcher–Powell) minimisation algorithm.

This is a quasi-Newton optimisation algorithm, whose defining feature is the way it progressively builds up a Hessian approximation using multiple steps of gradient information. Uses the Davidon-Fletcher-Powell formula to compute the updates to the Hessian and or to the Hessian inverse. See https://en.wikipedia.org/wiki/Davidon–Fletcher–Powell_formula.

optimistix.BFGS is generally preferred, since it is more numerically stable on most problems.

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
__init__(rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>, use_inverse: bool = True, verbose: frozenset[str] = frozenset()) ¤

Arguments:

  • rtol: Relative tolerance for terminating the solve.
  • atol: Absolute tolerance for terminating the solve.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm.
  • use_inverse: The DFP algorithm involves computing matrix-vector products of the form B^{-1} g, where B is an approximation to the Hessian of the function to be minimised. This means we can either (a) store the approximate Hessian B, and do a linear solve on every step, or (b) store the approximate Hessian inverse B^{-1}, and do a matrix-vector product on every step. Option (a) is generally cheaper for sparse Hessians (as the inverse may be dense). Option (b) is generally cheaper for dense Hessians (as matrix-vector products are cheaper than linear solves). The default is (b), denoted via use_inverse=True. Note that this is incompatible with searches like optimistix.ClassicalTrustRegion, which use the Hessian approximation B as part of their computations.
  • verbose: Whether to print out extra information about how the solve is proceeding. Should be a frozenset of strings, specifying what information to print. Valid entries are step_size, loss, y. For example verbose=frozenset({"step_size", "loss"}).
optimistix.AbstractLBFGS

optimistix.AbstractLBFGS(optimistix.AbstractQuasiNewton) ¤

Abstract version of the L-BFGS (limited memory Broyden–Fletcher–Goldfarb–Shanno) solver. This class may be subclassed to implement custom solvers with alternative searches or descent methods that use the L-BFGS update to approximate the Hessian or the inverse Hessian.

optimistix.LBFGS(optimistix.AbstractLBFGS) ¤

L-BFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno) minimisation algorithm.

This is a quasi-Newton optimisation algorithm that approximates the inverse Hessian using a limited history of gradient and parameter updates. Unlike full BFGS, which stores a dense matrix, L-BFGS maintains a memory-efficient representation suitable for large-scale problems.

For a brief outline, see https://en.wikipedia.org/wiki/Limited-memory_BFGS. Our implementation follows Byrd et al. 1994.

References
@article{byrd_representations_1994,
    author = {Byrd, Richard H. and Nocedal, Jorge and Schnabel, Robert B.},
    title = {Representations of quasi-{Newton} matrices and their use in
             limited memory methods},
    journal = {Mathematical Programming},
    volume = {63},
    number = {1},
    pages = {129--156},
    year = {1994},
    doi = {10.1007/BF01582063},
}

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
__init__(rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>, use_inverse: bool = True, history_length: int = 10, verbose: frozenset[str] = frozenset()) ¤

Arguments:

  • rtol: Relative tolerance for terminating the solve.
  • atol: Absolute tolerance for terminating the solve.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm.
  • use_inverse: Whether to use the inverse Hessian approximation (default) or the Hessian approximation. If True, the L-BFGS update will use the inverse Hessian approximation, and the step is computed as a single matrix-vector product, without materialising the matrix. If False, then the limited-memory approximation to the Hessian is computed instead, and the step is computed by solving a linear system.
  • history_length: Number of parameter and gradient residuals to retain in the L-BFGS history. Larger values can improve accuracy of the inverse Hessian approximation, while smaller values reduce memory and computation. The default is 10.
  • verbose: Whether to print out extra information about how the solve is proceeding. Should be a frozenset of strings, specifying what information to print. Valid entries are step_size, loss, y. For example verbose=frozenset({"step_size", "loss"}).

optimistix.OptaxMinimiser(optimistix.AbstractMinimiser) ¤

A wrapper to use Optax first-order gradient-based optimisers with optimistix.minimise.

__init__(optim, rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>, verbose: frozenset[str] = frozenset()) ¤

Arguments:

  • optim: The Optax optimiser to use.
  • rtol: Relative tolerance for terminating the solve. Keyword only argument.
  • atol: Absolute tolerance for terminating the solve. Keyword only argument.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm. Keyword only argument.
  • verbose: Whether to print out extra information about how the solve is proceeding. Should be a frozenset of strings, specifying what information to print out. Valid entries are step, loss, y. For example verbose=frozenset({"step", "loss"}).

optim in optimistix.OptaxMinimiser is an instance of an Optax minimiser. For example, correct usage is optimistix.OptaxMinimiser(optax.adam(...), ...), not optimistix.OptaxMinimiser(optax.adam, ...).


optimistix.NonlinearCG(optimistix.AbstractGradientDescent) ¤

The nonlinear conjugate gradient method.

Supports the following options:

  • autodiff_mode: whether to use forward- or reverse-mode autodifferentiation to compute the gradient. Can be either "fwd" or "bwd". Defaults to "bwd", which is usually more efficient. Changing this can be useful when the target function does not support reverse-mode automatic differentiation.
__init__(rtol: float, atol: float, norm: Callable[[PyTree[Array]], Shaped[Array, '']] = <function max_norm>, method: Callable[[~Y, ~Y, ~Y], Shaped[Array, '']] = <function polak_ribiere>, search: optimistix.AbstractSearch[~Y, optimistix._search.FunctionInfo.EvalGrad, optimistix._search.FunctionInfo.Eval, Any] = optimistix.BacktrackingArmijo(decrease_factor=0.5, slope=0.1)) ¤

Arguments:

optimistix.NonlinearCG supports several different methods for computing its β parameter. If you are trying multiple solvers to see which works best on your problem, then you may wish to try all four versions of nonlinear CG. These can each be passed as NonlinearCG(..., method=...).

optimistix.polak_ribiere(grad_vector: ~Y, grad_prev: ~Y, y_diff_prev: ~Y) -> Shaped[Array, ''] ¤

The Polak--Ribière formula for β. Used with optimistix.NonlinearCG and optimistix.NonlinearCGDescent.

optimistix.fletcher_reeves(grad: ~Y, grad_prev: ~Y, y_diff_prev: ~Y) -> Shaped[Array, ''] ¤

The Fletcher--Reeves formula for β. Used with optimistix.NonlinearCG and optimistix.NonlinearCGDescent.

optimistix.hestenes_stiefel(grad: ~Y, grad_prev: ~Y, y_diff_prev: ~Y) -> Shaped[Array, ''] ¤

The Hestenes--Stiefel formula for β. Used with optimistix.NonlinearCG and optimistix.NonlinearCGDescent.

optimistix.dai_yuan(grad: ~Y, grad_prev: ~Y, y_diff_prev: ~Y) -> Shaped[Array, ''] ¤

The Dai--Yuan formula for β. Used with optimistix.NonlinearCG and optimistix.NonlinearCGDescent.


optimistix.NelderMead(optimistix.AbstractMinimiser) ¤

The Nelder-Mead minimisation algorithm. (Downhill simplex derivative-free method.)

This algorithm is notable in that it only uses function evaluations, and does not need gradient evaluations.

This is usually an "algorithm of last resort". Gradient-based algorithms are usually much faster, and are more likely to converge to a minima.

Comparable to scipy.optimize.minimize(method="Nelder-Mead").

__init__(rtol: float, atol: float, norm: Callable[[PyTree], Shaped[Array, '']] = <function max_norm>, rdelta: float = 0.05, adelta: float = 0.00025) ¤

Arguments:

  • rtol: Relative tolerance for terminating the solve.
  • atol: Absolute tolerance for terminating the solve.
  • norm: The norm used to determine the difference between two iterates in the convergence criteria. Should be any function PyTree -> Scalar. Optimistix includes three built-in norms: optimistix.max_norm, optimistix.rms_norm, and optimistix.two_norm.
  • rdelta: Nelder-Mead creates an initial simplex by appending a scaled identity matrix to y. The ith element of this matrix is rdelta * y_i + adelta. That is, this is the relative size for creating the initial simplex.
  • adelta: Nelder-Mead creates an initial simplex by appending a scaled identity matrix to y. The ith element of this matrix is rdelta * y_i + adelta. That is, this is the absolute size for creating the initial simplex.

optimistix.GoldenSearch(optimistix.AbstractMinimiser) ¤

Golden-section search for finding the minimum of a univariate function in a given interval. It does not use gradients and is not particularly fast, but it is very robust.

This solver maintains a set of three reference points, defining the lower and upper boundaries of the interval, as well as a midpoint chosen to divide the interval into two sections by the golden ratio. At each step, the reference points are updated by dropping one of the outer points and updating the midpoint, such that the interval shrinks monotonously until the solver has converged.

If the function is unimodal (has just one minimum inside the interval), then this minimum is always found. If the function has several minima, then a local minimum is identified, depending on the initial choice of interval bounds.

This solver requires the following options:

  • lower: The lower bound on the interval which contains the minimum.
  • upper: The upper bound on the interval which contains the minimum.

Note that the initial value y0 is ignored to guarantee that the golden ratio between interval segments is always maintained.

__init__(rtol: float, atol: float) ¤

Arguments:

  • rtol: The relative tolerance for terminating the solve.
  • atol: The absolute tolerance for terminating the solve.

optimistix.BestSoFarMinimiser(optimistix.AbstractMinimiser) ¤

Wraps another minimiser, to return the best-so-far value. That is, it makes a copy of the best y seen, and returns that.

__init__(solver: optimistix.AbstractMinimiser[~Y, tuple[Shaped[Array, ''], ~Aux], Any]) ¤

Arguments:

  • solver: the minimiser to wrap.