# Higher-order commutative noise produces commutative noise during BacksolveAdjoint¤

## Statement¤

Consider solving the Ito SDE

$$\mathrm{d}y(t) = μ(t, y(t))\mathrm{d}t + σ(t, y(t)) \mathrm{d}w(t)$$

or the Stratonovich SDE

$$\mathrm{d}y(t) = μ(t, y(t))\mathrm{d}t + σ(t, y(t))\circ\mathrm{d}w(t)$$

in either case assumed to satisfy the commutativity condition

$$σ_{i\, j_2} \frac{\partial σ_{k\, j_1}}{\partial y_i} = σ_{i\, j_1} \frac{\partial σ_{k\, j_2}}{\partial y_i}$$.

Then the backward pass solved during diffrax.BacksolveAdjoint will also satisfy the commutativity condition if and only if the following higher-order commutativity condition is satisfied.

$$σ_{i\, j_2} \frac{\partial^2 σ_{k\, j_1}}{\partial y_i \partial y_m} = σ_{i\, j_1} \frac{\partial^2 σ_{k\, j_2}}{\partial y_i \partial y_m}$$

Note

The commutativity condition is a common prerequisite for solving an SDE with a higher-order solver.

Note

The higher-order commutativity condition is satisfied by all the dominant subclasses of commutative noise: additive noise, diagonal noise, scalar noise. It is also satisfied by noise that is affine in the state $$y$$. But it is not obviously satisfied by commutative noise in general?

As far as I know the higher-order commutativity condition is new here.

## Proof¤

Without loss of generality we consider specifically the reverse-time adjoint SDE (formally justified using rough path theory, see [1, Appendix C.3.3])

$$\mathrm{d}a_i(t) = -a_j(t) \frac{\partial μ_j}{\partial y_i}(t, y(t))\mathrm{d}t - a_j(t) \frac{\partial σ_{j\, k}}{\partial y_i}(t, y(t)) \circ \mathrm{d}w_k.$$

This is without loss of generality as:

• If the SDE is Ito then we convert it to Stratonovich; this incurs a correction term in the drift but does not affect the diffusion, and it is only the diffusion we are interested in.
• We do not consider the derivatives with respect to any parameters $$θ$$ as these may be treated as derivatives with respect to $$y(0)$$ in the usual way.
• We do not consider solving the original SDE for $$y$$ backwards-in-time. In isolation then by assumption this already has commutative noise. Then, taking any individual path $$y$$, we may treat the reverse-time adjoint SDE in isolation. (Note that the coupling between $$y$$ and $$w$$ is irrelevant: by rough path theory we may place ourselves in the deterministic setting.)

Let $$Σ_{i\, k}(t, a) = -a_j \frac{\partial σ_{j\, k}}{\partial y_i}(t, y(t))$$.

Then

$$Σ_{i\, j_2} \frac{\partial Σ_{k\, j_1}}{\partial a_i} = a_m \frac{\partial σ_{m\, j_2}}{\partial y_i} δ_{i\, n} \frac{\partial σ_{n\, j_1}}{\partial y_k} = a_m \frac{\partial σ_{m\, j_2}}{\partial y_i} \frac{\partial σ_{i\, j_1}}{\partial y_k}$$

Now differentiate the commutativity condition for $$σ$$, with respect to $$y_m$$, to obtain

$$\frac{\partial σ_{i\, j_2}}{\partial y_m} \frac{\partial σ_{k\, j_1}}{\partial y_i} + σ_{i\, j_2} \frac{\partial^2 σ_{k\, j_1}}{\partial y_i \partial y_m} = \frac{\partial σ_{i\, j_1}}{\partial y_m} \frac{\partial σ_{k\, j_2}}{\partial y_i} + σ_{i\, j_1} \frac{\partial^2 σ_{k\, j_2}}{\partial y_i \partial y_m}$$

which may be substituted into the previous equation to obtain

$$a_m \frac{\partial σ_{m\, j_1}}{\partial y_i} \frac{\partial σ_{i j_2}}{\partial y_k} + a_m \left[ σ_{i\, j_2} \frac{\partial^2 σ_{m\, j_1}}{\partial y_i \partial y_k} - σ_{i\, j_1}\frac{\partial^2 σ_{m\, j_2}}{\partial y_i \partial y_k}\right]$$

We recognise the first term as the desired commutativity relation for $$Σ$$; that is we will satisfy the commutativity relation if and only if the second term is zero. Now $$a_m$$ is arbitrary so by taking it to equal every basis vector in turn, we find that the the higher-order commmutativty condition for $$σ$$ is precisely the condition needed for $$Σ$$ to satisfy the commutativity condition.

## References¤

[1] Kidger, On Neural Differential Equations, PhD Thesis, University of Oxford, 2021