Skip to content

DLR Loss and Zero Gradient #108

@SerezD

Description

@SerezD

Hi, I am trying to understand how DLR loss works, and in particular, I am running some tests with Batch Size = 1.

The DLR loss defined in https://github.com/fra31/auto-attack/blob/master/autoattack/autopgd_base.py is as follows:

def dlr_loss(self, x, y):
    x_sorted, ind_sorted = x.sort(dim=1)
    ind = (ind_sorted[:, -1] == y).float()
    u = torch.arange(x.shape[0])
   
    return -(x[u, y] - x_sorted[:, -2] * ind - x_sorted[:, -1] * (1. - ind)) / (x_sorted[:, -1] - x_sorted[:, -3] + 1e-12)

When using Batch Size 1, x are the logits of the sample (shape 1, N classes) while y is the ground truth label of shape 1.

Now, let's suppose the logits (x) are something like the following (N=4 classes):
[[3, 2, 5, 1]]
and that GT label (y) = 1
then:
x_sorted = [1, 2, 3, 5]
ind_sorted = [3, 1, 0, 2]
ind = 0.

which leads to:
x[u, y] = 2
x_sorted[:,-2] * ind = 0
x_sorted[:,-1] * (1 - ind) = 5
x_sorted[:, -1] = 5
x_sorted[:, -3] = 2

This causes the loss to be:
-(2-5) / (5-2 +1e-12) = 3 / 3 = 1.

Which, in turn, causes the gradient to be zero, which may lead to unrealiable behavior. However, the check_zero_gradients method is called only at lines 285-287, and not in the main loop (e.g. at line 380).

In shorter terms, if the ground truth label is the third largest predicted logit, the loss goes to one and the gradient vanishes.
Is my understanding correct? Is this a known issue?
Also, to prevent this, one solution may be to replace x_sorted[:, -3] with x_sorted[:, -4] in case where GT = the third highest logit. Does this make sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions