Descents¤
optimistix.AbstractDescent
optimistix.AbstractDescent
¤
The abstract base class for descents. A descent consumes a scalar (e.g. a step
size), and returns the diff
to take at point y
, so that y + diff
is the next
iterate in a nonlinear optimisation problem.
See this documentation for more information.
¤
optimistix.SteepestDescent (AbstractDescent)
¤
The descent direction given by locally following the gradient.
__init__(self, norm: Optional[Callable[[PyTree], Array]] = None)
¤
Arguments:
norm
: If passed, then normalise the gradient using this norm. (The returned step will have lengthstep_size
with respect to this norm.) Optimistix includes three built-in norms:optimistix.max_norm
,optimistix.rms_norm
, andoptimistix.two_norm
.
optimistix.NonlinearCGDescent (AbstractDescent)
¤
The nonlinear conjugate gradient step.
__init__(self, method: Callable[[~Y, ~Y, ~Y], Array])
¤
Arguments:
method
: A callablemethod(vector, vector_prev, diff_prev)
describing how to calculate the beta parameter of nonlinear CG. Nonlinear CG uses the previous search direction, scaled by beta, and subtracts the gradient to find the next search direction. This parameter, in the nonlinear case, is the same as the parameter β_n described e.g. on Wikipedia for the linear case. Defaults topolak_ribiere
. Optimistix includes four built-in methods:optimistix.polak_ribiere
,optimistix.fletcher_reeves
,optimistix.hestenes_stiefel
, andoptimistix.dai_yuan
.
optimistix.NewtonDescent (AbstractDescent)
¤
Newton descent direction.
Given a quadratic bowl x -> x^T Hess x
-- a local quadratic approximation
to the target function -- this corresponds to moving in the direction of the bottom
of the bowl. (Which is not the same as steepest descent.)
This is done by solving a linear system of the form Hess^{-1} grad
.
__init__(self, norm: Optional[Callable[[PyTree], Array]] = None, linear_solver: AbstractLinearSolver = AutoLinearSolver(well_posed=None))
¤
Arguments:
norm
: If passed, then normalise the gradient using this norm. (The returned step will have lengthstep_size
with respect to this norm.) Optimistix includes three built-in norms:optimistix.max_norm
,optimistix.rms_norm
, andoptimistix.two_norm
.linear_solver
: The linear solver used to compute the Newton step.
optimistix.DampedNewtonDescent (AbstractDescent)
¤
The damped Newton (Levenberg--Marquardt) descent.
That is: gradient descent is about moving in the direction of -grad
.
(Quasi-)Newton descent is about moving in the direction of -Hess^{-1} grad
. Damped
Newton interpolates between these two regimes, by moving in the direction of
-(Hess + λI)^{-1} grad
.
The value λ is often referred to as a the "Levenberg--Marquardt" parameter, and in version is handled directly, as λ = 1/step_size. Larger step sizes correspond to Newton directions; smaller step sizes correspond to gradient directions. (And additionally also reduces the step size, hence the term "damping".) This is because a line search expects the step to be smaller as the step size decreases.
__init__(self, linear_solver: AbstractLinearSolver = AutoLinearSolver(well_posed=None))
¤
Arguments:
linear_solver
: The linear solver used to compute the Newton step.
optimistix.IndirectDampedNewtonDescent (AbstractDescent)
¤
The indirect damped Newton (Levenberg--Marquardt) trust-region descent.
If the above line just looks like technical word soup, here's what's going on:
Gradient descent is about moving in the direction of -grad
. (Quasi-)Newton descent
is about moving in the direction of -Hess^{-1} grad
. Damped Newton interpolates
between these two regimes, by moving in the direction of
-(Hess + λI)^{-1} grad
.
This can be derived as the dual problem of a trust region method, see Conn, Gould, Toint: "Trust-Region Methods" section 7.3. λ is interpreted as a Lagrange multiplier. This involves solving a one-dimensional root-finding problem for λ at each descent.
__init__(self, lambda_0: Union[Array, ndarray, numpy.bool, numpy.number, bool, int, float, complex] = 1.0, linear_solver: AbstractLinearSolver = AutoLinearSolver(well_posed=False), root_finder: AbstractRootFinder = Newton(rtol=0.01,atol=0.01,norm=<function max_norm>,kappa=0.01,linear_solver=AutoLinearSolver(well_posed=None),cauchy_termination=True), trust_region_norm: Callable[[PyTree], Array] = <function two_norm>)
¤
Arguments:
lambda_0
: The initial value of the Levenberg--Marquardt parameter used in the root- find to hit the trust-region radius. IfIndirectDampedNewtonDescent
is failing, this value may need to be increased.linear_solver
: The linear solver used to compute the Newton step.root_finder
: The root finder used to find the Levenberg--Marquardt parameter which hits the trust-region radius.trust_region_norm
: The norm used to determine the trust-region shape.
optimistix.DoglegDescent (AbstractDescent)
¤
The Dogleg descent step, which switches between the Cauchy and the Newton descent directions.
__init__(self, linear_solver: AbstractLinearSolver = AutoLinearSolver(well_posed=None), root_finder: AbstractRootFinder[Array, Array, NoneType, Any] = Bisection(rtol=0.001, atol=0.001, flip='detect', expand_if_necessary=False), trust_region_norm: Callable[[PyTree], Array] = <function two_norm>)
¤
Arguments:
linear_solver
: The linear solver used to compute the Newton step.root_finder
: The root finder used to find the point where the trust-region intersects the dogleg path. This is ignored iftrust_region_norm=optimistix.two_norm
, for which there is an analytic formula instead.trust_region_norm
: The norm used to determine the trust-region shape.