Skip to content

Descents¤

optimistix.AbstractDescent

optimistix.AbstractDescent ¤

The abstract base class for descents. A descent consumes a scalar (e.g. a step size), and returns the diff to take at point y, so that y + diff is the next iterate in a nonlinear optimisation problem.

See this documentation for more information.

¤

optimistix.SteepestDescent (AbstractDescent) ¤

The descent direction given by locally following the gradient.


optimistix.NonlinearCGDescent (AbstractDescent) ¤

The nonlinear conjugate gradient step.


optimistix.NewtonDescent (AbstractDescent) ¤

Newton descent direction.

Given a quadratic bowl x -> x^T Hess x -- a local quadratic approximation to the target function -- this corresponds to moving in the direction of the bottom of the bowl. (Which is not the same as steepest descent.)

This is done by solving a linear system of the form Hess^{-1} grad.


optimistix.DampedNewtonDescent (AbstractDescent) ¤

The damped Newton (Levenberg--Marquardt) descent.

That is: gradient descent is about moving in the direction of -grad. (Quasi-)Newton descent is about moving in the direction of -Hess^{-1} grad. Damped Newton interpolates between these two regimes, by moving in the direction of -(Hess + λI)^{-1} grad.

The value λ is often referred to as a the "Levenberg--Marquardt" parameter, and in version is handled directly, as λ = 1/step_size. Larger step sizes correspond to Newton directions; smaller step sizes correspond to gradient directions. (And additionally also reduces the step size, hence the term "damping".) This is because a line search expects the step to be smaller as the step size decreases.


optimistix.IndirectDampedNewtonDescent (AbstractDescent) ¤

The indirect damped Newton (Levenberg--Marquardt) trust-region descent.

If the above line just looks like technical word soup, here's what's going on:

Gradient descent is about moving in the direction of -grad. (Quasi-)Newton descent is about moving in the direction of -Hess^{-1} grad. Damped Newton interpolates between these two regimes, by moving in the direction of -(Hess + λI)^{-1} grad.

This can be derived as the dual problem of a trust region method, see Conn, Gould, Toint: "Trust-Region Methods" section 7.3. λ is interpreted as a Lagrange multiplier. This involves solving a one-dimensional root-finding problem for λ at each descent.


optimistix.DoglegDescent (AbstractDescent) ¤

The Dogleg descent step, which switches between the Cauchy and the Newton descent directions.