# Predictors for Diagonal Implicit Runge--Kutta (DIRK) methods¤

## Linear-combination-predictors¤

Each stage of a diagonal implicit RK method involves solving (via Newton iterations) a nonlinear system

$$f_i = f(y_0 + \sum_{j=1}^i a_{ij} f_j Δt)$$

for $$i = 1, \ldots, s$$.

How do we initialise our guess for $$f_i$$? Given the information available to us at each stage, a natural choice is to take some linear combination of the $$f_j$$ already available to us. (This feels a bit like using an explicit RK (ERK) method, although we don't use $$y_0$$ and don't make any extra function evaluations.)

Making a good choice of predictor is important, and can have a strong impact on whether the Newton solver converges. One anecdotal example: for a particular stiff ODE, naively always predicting $$f_i=0$$ meant the Kvaerno5 solver could solve about 34% of initial conditions. (And otherwise failures of the Newton solver, even over adaptively-chosen small timesteps, meant that the ODE could not be resolved in a timely manner.) Switching to a linear-combination-predictor increased this dramatically, to 98% of initial conditions studied.

What is an appropriate choice of linear combination? This is something that seems to have frequently been left out of the literature, e.g.:

• Kvaerno's paper presents only the implicit tableaus.
• Hairer and Wanner only seems to discuss predictors for fully-implicit RK (FIRK) methods (by polynomially interpolating the stages of the previous step).
• Mistakes are made in practice, e.g. OrdinaryDiffEq.jl use zero as the predictor for a second stage of an ESDIRK method -- when it would make sense to at least use the result of the first stage.

As a result, an additional part of each DIRK method's Butcher tableau is some collection of values $$α_{ij}$$ for $$i > j$$. At each stage we will initialise

$$f_i = \sum_{j = 1}^{i-1} α_{ij} f_j$$

as our prediction for the nonlinear system. Looking at this, it is natural to require that

$$1 = \sum_{j=1}^{i-1} α_{ij}$$.

[The value $$f_i$$ is likely close to each value of $$f_j$$; we have no reason to suppose that it is e.g. half their values.]

## Implications¤

Now, this implies that the choice of $$α_{ij}$$ is actually quite constrained.

### First stage¤

First, note that the entire discussion above only applies to the second stage and beyond. For the first stage, we don't have access to any $$f_j$$.

• If the method satifies the FSAL property then we don't care: we already have $$f_1 = f(y_0)$$ because we got it from the previous step.
• If the method has an explicit first stage (as with ESDIRK methods) then we also don't care: we're just going to make an explicit step anyway.
• If the method has an implicit first stage (and necessarily fails the FSAL property) then we're out of luck. Bite the bullet, make an extra function evaluation, and evaluate $$f(y_0)$$ as an estimate for $$f_1$$. [Although perhaps something could be cooked up using the evaluations of the previous step?]

### Second stage¤

For the second stage: our sum-to-1 property means that we necessarily have $$1 = α_{21}$$. [This was the mistake made in OrdinaryDiffEq.jl, above: they just take this value to be zero.]

### Third stage¤

For the third stage: the sum-to-1 property means that we have $$1 = α_{31} + α_{32}$$. Now we only have two values $$f_1$$ and $$f_2$$ available to us, so there's not a lot we can do other than assume that $$t \mapsto f(y(t))$$ is locally linear, and extrapolate. Letting $$F(t) = ζt + η$$ be this local linear approximation, we have $$F(c_1 Δt) = f_1$$ and $$F(c_2 Δt) = f_2$$, giving

$$F(c_3 Δt) = f_2 + (f_2 - f_1) (c_3 - c_2) / (c_2 - c_1)$$

and thus $$α_{31} = (c_2 - c_3) / (c_2 - c_1)$$ and $$α_{32} = (c_3 - c_1) / (c_2 - c_1)$$.

These may be negative: for example given an explicit first stage ($$c_1 = 0$$) then necessarily $$α_{31} < 0$$.

### Higher stages¤

For higher stages: now things start to get a bit more interesting / unexplored. One reasonable choice is to fit a higher-order polynomial to $$t \mapsto f(y(t))$$. Another reasonable choice is to keep using a linear approximation, using your two favourite stages from before. A final choice is to take $$α_{ij} = a_{(i-1)j}$$, although this is only appropriate for those stages for which $$1 = \sum_{j=1}^{i-1} a_{(i-1)j}$$; this typically true of the last few stages (which often have $$\sum_{j=1}^i a_{ij} = c_i = 1$$).

At time of writing (and to the best of my knowledge), the best choice for higher stages isn't really known.

• For their higher-order interpolations it's not 100% clear what they're doing. They appear to be extrapolating the solution $$y$$ rather than the vector field $$f$$, which is bit a confusing. (Their implementation still solves the nonlinear system for $$f_i = f(y_i)$$, rather than the equivalent nonlinear system for $$y_i$$.) Should follow up on this.