DLR Loss and Zero Gradient

Hi, I am trying to understand how DLR loss works, and in particular, I am running some tests with Batch Size = 1. 

The DLR loss defined in https://github.com/fra31/auto-attack/blob/master/autoattack/autopgd_base.py is as follows: 

```
def dlr_loss(self, x, y):
    x_sorted, ind_sorted = x.sort(dim=1)
    ind = (ind_sorted[:, -1] == y).float()
    u = torch.arange(x.shape[0])
   
    return -(x[u, y] - x_sorted[:, -2] * ind - x_sorted[:, -1] * (1. - ind)) / (x_sorted[:, -1] - x_sorted[:, -3] + 1e-12)
```

When using Batch Size 1, x are the logits of the sample (shape 1, N classes) while y is the ground truth label of shape 1. 

Now, let's suppose the logits (x) are something like the following (N=4 classes): 
[[3, 2, 5, 1]]
and that GT label (y) = 1
then: 
`x_sorted = [1, 2, 3, 5]`
`ind_sorted = [3, 1, 0, 2]`
`ind = 0.`

which leads to: 
`x[u, y] = 2`
`x_sorted[:,-2] * ind = 0`
`x_sorted[:,-1] * (1 - ind) = 5`
`x_sorted[:, -1] = 5`
`x_sorted[:, -3] = 2`

This causes the loss to be: 
`-(2-5) / (5-2 +1e-12) = 3 / 3 = 1.`

Which, in turn, causes the gradient to be zero, which may lead to unrealiable behavior. However, the `check_zero_gradients` method is called only at lines 285-287, and not in the main loop (e.g. at line 380).

In shorter terms, if the ground truth label is the third largest predicted logit, the loss goes to one and the gradient vanishes. 
Is my understanding correct? Is this a known issue? 
Also, to prevent this, one solution may be to replace x_sorted[:, -3] with x_sorted[:, -4] in case where GT = the third highest logit. Does this make sense?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLR Loss and Zero Gradient #108

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DLR Loss and Zero Gradient #108

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions