backward¶
- torchjd.autojac.backward(tensors, jac_tensors=None, inputs=None, retain_graph=False, parallel_chunk_size=None)[source]¶
Computes the Jacobians of
tensorswith respect toinputs, left-multiplied byjac_tensors(or identity ifjac_tensorsisNone), and accumulates the results in the.jacfields of theinputs.- Parameters:
tensors (
Sequence[Tensor] |Tensor) – The tensor or tensors to differentiate. Should be non-empty.jac_tensors (
Sequence[Tensor] |Tensor|None) – The initial Jacobians to backpropagate, analog to thegrad_tensorsparameter oftorch.autograd.backward(). If provided, it must have the same structure astensorsand each tensor injac_tensorsmust match the shape of the corresponding tensor intensors, with an extra leading dimension representing the number of rows of the resulting Jacobian (e.g. the number of losses). All tensors injac_tensorsmust have the same first dimension. IfNone, defaults to the identity matrix. In this case, the standard Jacobian oftensorsis computed, with one row for each value in thetensors.inputs (
Iterable[Tensor] |None) – The tensors with respect to which the Jacobians must be computed. These must have theirrequires_gradflag set toTrue. If not provided, defaults to the leaf tensors that were used to compute thetensorsparameter.retain_graph (
bool) – IfFalse, the graph used to compute the grad will be freed. Defaults toFalse.parallel_chunk_size (
int|None) – The number of scalars to differentiate simultaneously in the backward pass. If set toNone, all coordinates oftensorswill be differentiated in parallel at once. If set to1, all coordinates will be differentiated sequentially. A larger value results in faster differentiation, but also higher memory usage. Defaults toNone.
- Return type:
Example
This example shows a simple usage of
backward.>>> import torch >>> >>> from torchjd.autojac import backward >>> >>> param = torch.tensor([1., 2.], requires_grad=True) >>> # Compute arbitrary quantities that are function of param >>> y1 = torch.tensor([-1., 1.]) @ param >>> y2 = (param ** 2).sum() >>> >>> backward([y1, y2]) >>> >>> param.jac tensor([[-1., 1.], [ 2., 4.]])
The
.jacfield ofparamnow contains the Jacobian of \(\begin{bmatrix}y_1 \\ y_2\end{bmatrix}\) with respect toparam.Example
This is the same example as before, except that we explicitly specify
jac_tensorsas the rows of the identity matrix (which is equivalent to using the defaultNone).>>> import torch >>> >>> from torchjd.autojac import backward >>> >>> param = torch.tensor([1., 2.], requires_grad=True) >>> # Compute arbitrary quantities that are function of param >>> y1 = torch.tensor([-1., 1.]) @ param >>> y2 = (param ** 2).sum() >>> >>> J1 = torch.tensor([1.0, 0.0]) >>> J2 = torch.tensor([0.0, 1.0]) >>> >>> backward([y1, y2], jac_tensors=[J1, J2]) >>> >>> param.jac tensor([[-1., 1.], [ 2., 4.]])
Instead of using the identity
jac_tensors, you can backpropagate some Jacobians obtained by a call totorchjd.autojac.jac()on a later part of the computation graph.Warning
To differentiate in parallel,
backwardrelies ontorch.vmap, which has some limitations: it does not work on the output of compiled functions, when some tensors haveretains_grad=Trueor when using an RNN on CUDA, for instance. If you experience issues withbackwardtry to useparallel_chunk_size=1to avoid relying ontorch.vmap.