Descents¤
optimistix.AbstractDescent
optimistix.AbstractDescent
¤
The abstract base class for descents. A descent consumes a scalar (e.g. a step
size), and returns the diff
to take at point y
, so that y + diff
is the next
iterate in a nonlinear optimisation problem.
See this documentation for more information.
¤
optimistix.SteepestDescent (AbstractDescent)
¤
The descent direction given by locally following the gradient.
optimistix.NonlinearCGDescent (AbstractDescent)
¤
The nonlinear conjugate gradient step.
optimistix.NewtonDescent (AbstractDescent)
¤
Newton descent direction.
Given a quadratic bowl x -> x^T Hess x
-- a local quadratic approximation
to the target function -- this corresponds to moving in the direction of the bottom
of the bowl. (Which is not the same as steepest descent.)
This is done by solving a linear system of the form Hess^{-1} grad
.
optimistix.DampedNewtonDescent (AbstractDescent)
¤
The damped Newton (Levenberg--Marquardt) descent.
That is: gradient descent is about moving in the direction of -grad
.
(Quasi-)Newton descent is about moving in the direction of -Hess^{-1} grad
. Damped
Newton interpolates between these two regimes, by moving in the direction of
-(Hess + λI)^{-1} grad
.
The value λ is often referred to as a the "Levenberg--Marquardt" parameter, and in version is handled directly, as λ = 1/step_size. Larger step sizes correspond to Newton directions; smaller step sizes correspond to gradient directions. (And additionally also reduces the step size, hence the term "damping".) This is because a line search expects the step to be smaller as the step size decreases.
optimistix.IndirectDampedNewtonDescent (AbstractDescent)
¤
The indirect damped Newton (Levenberg--Marquardt) trust-region descent.
If the above line just looks like technical word soup, here's what's going on:
Gradient descent is about moving in the direction of -grad
. (Quasi-)Newton descent
is about moving in the direction of -Hess^{-1} grad
. Damped Newton interpolates
between these two regimes, by moving in the direction of
-(Hess + λI)^{-1} grad
.
This can be derived as the dual problem of a trust region method, see Conn, Gould, Toint: "Trust-Region Methods" section 7.3. λ is interpreted as a Lagrange multiplier. This involves solving a one-dimensional root-finding problem for λ at each descent.
optimistix.DoglegDescent (AbstractDescent)
¤
The Dogleg descent step, which switches between the Cauchy and the Newton descent directions.