Recurrent Neural Network (RNN) ============================== When training recurrent neural networks for sequence modelling, we can easily obtain one loss per element of the output sequences. If the gradients of these losses are likely to conflict, Jacobian descent can be leveraged to enhance optimization. .. code-block:: python :emphasize-lines: 5-6, 10, 17, 20 import torch from torch.nn import RNN from torch.optim import SGD from torchjd.aggregation import UPGrad from torchjd.autojac import backward rnn = RNN(input_size=10, hidden_size=20, num_layers=2) optimizer = SGD(rnn.parameters(), lr=0.1) aggregator = UPGrad() inputs = torch.randn(8, 5, 3, 10) # 8 batches of 3 sequences of length 5 and of dim 10. targets = torch.randn(8, 5, 3, 20) # 8 batches of 3 sequences of length 5 and of dim 20. for input, target in zip(inputs, targets): output, _ = rnn(input) # output is of shape [5, 3, 20]. losses = ((output - target) ** 2).mean(dim=[1, 2]) # 1 loss per sequence element. optimizer.zero_grad() backward(losses, aggregator, parallel_chunk_size=1) optimizer.step() .. note:: At the time of writing, there seems to be an incompatibility between ``torch.vmap`` and ``torch.nn.RNN`` when running on CUDA (see `this issue `_ for more info), so we advise to set the ``parallel_chunk_size`` to ``1`` to avoid using ``torch.vmap``. To improve performance, you can check whether ``parallel_chunk_size=None`` (maximal parallelization) works on your side.