Calculation of alpha loss in SAC is different from the original paper

Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of `alpha` is:

```
J(alpha) = E[-alpha * (log(pi) + H)]
```

However, in your implementation, the calculation of the loss of `alpha` is instead (line 109 of "trainer.py"):

```
J(alpha) = E[-log(alpha) * (log(pi) + H)]
```

I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the `J(alpha)` in the same way as the SAC paper ([https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py](https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py)). TensorFlow implementations still calculate the gradient with respect to `log(alpha)`, but when calculating the loss `J(alpha)` they use `exp(log(alpha))` (which is `alpha`) instead of `log(alpha)`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of alpha loss in SAC is different from the original paper #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calculation of alpha loss in SAC is different from the original paper #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions