jac_to_grad¶
- torchjd.autojac.jac_to_grad(tensors, /, aggregator, *, retain_jac=False, optimize_gramian_computation=False)[source]¶
Aggregates the Jacobians stored in the
.jacfields oftensorsand accumulates the result into their.gradfields.- Parameters:
tensors (
Iterable[Tensor]) – The tensors whose.jacfields should be aggregated. All Jacobians must have the same first dimension (e.g. number of losses).aggregator (
Aggregator) – The aggregator used to reduce the Jacobians into gradients. If it uses aWeightingto combine the rows of the Jacobians,jac_to_gradwill also return the computed weights.retain_jac (
bool) – Whether to preserve the.jacfields of the tensors after they have been used. Defaults toFalse.optimize_gramian_computation (
bool) – When theaggregatorcomputes weights based on the Gramian of the Jacobian, it’s possible to skip the concatenation of the Jacobians and to instead compute the Gramian as the sum of the Gramians of the individual Jacobians. This saves memory (up to 50% memory saving) but can be slightly slower (up to 15%) on CUDA. We advise to try this optimization if memory is an issue for you. Defaults toFalse.
- Return type:
Note
When
optimize_gramian_computation=False, this function starts by “flattening” the.jacfields into matrices (i.e. flattening all of their dimensions except the first one), then concatenates those matrices into a combined Jacobian matrix. Theaggregatoris then used on this matrix, which returns a combined gradient vector, that is split and reshaped to fit into the.gradfields of the tensors.Note
When
optimize_gramian_computation=True, this function computes and sums the Gramian of each individual.jacfield, iteratively. The inner weighting of theaggregatoris then used to extract some weights from the obtained Gramian, used to compute a linear combination of the rows of each.jacfield, to be stored into the corresponding.gradfield. This is mathematically equivalent to the approach withoptimize_gramian_computation=False, but saves memory by not having to hold the concatenated Jacobian matrix in memory at any time.Example
This example shows how to use
jac_to_gradafter a call tobackward>>> import torch >>> >>> from torchjd.aggregation import UPGrad >>> from torchjd.autojac import backward, jac_to_grad >>> >>> param = torch.tensor([1., 2.], requires_grad=True) >>> # Compute arbitrary quantities that are function of param >>> y1 = torch.tensor([-1., 1.]) @ param >>> y2 = (param ** 2).sum() >>> >>> backward([y1, y2]) # param now has a .jac field >>> weights = jac_to_grad([param], UPGrad()) # param now has a .grad field >>> param.grad tensor([0.5000, 2.5000]) >>> weights tensor([0.5, 0.5])
The
.gradfield ofparamnow contains the aggregation (by UPGrad) of the Jacobian of \(\begin{bmatrix}y_1 \\ y_2\end{bmatrix}\) with respect toparam. In this case, the weights used to combine the Jacobian are equal because there was no conflict.