Skip to content
This repository was archived by the owner on Jun 13, 2024. It is now read-only.
This repository was archived by the owner on Jun 13, 2024. It is now read-only.

Calculation of alpha loss in SAC is different from the original paper #28

@CloudyDory

Description

@CloudyDory

Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha is:

J(alpha) = E[-alpha * (log(pi) + H)]

However, in your implementation, the calculation of the loss of alpha is instead (line 109 of "trainer.py"):

J(alpha) = E[-log(alpha) * (log(pi) + H)]

I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha) in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha), but when calculating the loss J(alpha) they use exp(log(alpha)) (which is alpha) instead of log(alpha).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions