mtl_backward

torchjd.autojac.mtl_backward.mtl_backward(losses, features, tasks_params, shared_params, A, retain_graph=False, parallel_chunk_size=None)

In the context of Multi-Task Learning (MTL), we often have a shared feature extractor followed by several task-specific heads. A loss can then be computed for each task.

This function computes the gradient of each task-specific loss with respect to its task-specific parameters and accumulates it in their .grad fields. Then, it computes the Jacobian of all losses with respect to the shared parameters, aggregates it and accumulates the result in their .grad fields.

Parameters:
  • losses (Sequence[Tensor]) – The task losses. The Jacobian matrix will have one row per loss.

  • features (Union[Sequence[Tensor], Tensor]) – The last shared representation used for all tasks, as given by the feature extractor. Should be non-empty.

  • tasks_params (Sequence[Iterable[Tensor]]) – The parameters of each task-specific head. Their requires_grad flags must be set to True.

  • shared_params (Iterable[Tensor]) – The parameters of the shared feature extractor. The Jacobian matrix will have one column for each value in these tensors. Their requires_grad flags must be set to True.

  • A (Aggregator) – Aggregator used to reduce the Jacobian into a vector.

  • retain_graph (bool) – If False, the graph used to compute the grad will be freed. Defaults to False.

  • parallel_chunk_size (int | None) – The number of scalars to differentiate simultaneously in the backward pass. If set to None, all coordinates of tensors will be differentiated in parallel at once. If set to 1, all coordinates will be differentiated sequentially. A larger value results in faster differentiation, but also higher memory usage. Defaults to None. If parallel_chunk_size is not large enough to differentiate all tensors simultaneously, retain_graph has to be set to True.

Return type:

None

Example

A usage example of mtl_backward is provided in Multi-Task Learning (MTL).