UW¶
- class torchjd.scalarization.UW(shape)[source]¶
StatefulScalarizerthat combines the input tensor of values using learned per-task uncertainties.UWis short for Uncertainty Weighting, the method proposed in Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics.Each value \(L_i\) is assigned a learnable log-variance \(s_i\), and the values are combined as
\[\sum_i \left( \frac{1}{2} e^{-s_i} L_i + \frac{1}{2} s_i \right)\]where:
\(L_i\) is the \(i\)-th value (typically the loss of task \(i\));
\(s_i = \log \sigma_i^2\) is the learnable log-variance of task \(i\).
Following the paper, the log-variance \(s_i\) is learned rather than the variance \(\sigma_i^2\) directly: this is numerically more stable (the combination never divides by zero) and keeps \(s_i\) unconstrained, since \(e^{-s_i}\) is always positive. The \(s_i\) are stored as an
nn.Parameter, so the parameters of this scalarizer must be passed to the optimizer to be learned jointly with the model.- Parameters:
shape (
int|Sequence[int]) – The shape of the values to scalarize, used to create one log-variance per value. Anintnis interpreted as the shape(n,).
The following example shows train a model with Uncertainty Weighting, as described in the paper.
>>> import torch >>> from torch.nn import Linear >>> >>> from torchjd.scalarization import UW >>> >>> model = Linear(3, 2) >>> scalarizer = UW(2) # Move to the right device with e.g. UW(2).to(device="cuda") >>> optimizer = torch.optim.SGD([*model.parameters(), *scalarizer.parameters()], lr=0.1) >>> >>> features = torch.randn(8, 3) >>> # Compute some dummy losses just for the sake of the example >>> losses = model(features).pow(2).mean(dim=0) # One loss per output dimension. >>> loss = scalarizer(losses) >>> loss.backward() >>> optimizer.step()
Note
The log-variances are initialized to
0(i.e. \(\sigma_i^2 = 1\)), which gives uniform weights at the start of training. The paper reports that the result is robust to this initialization. (LibMTL initializes them to-0.5instead.)