-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi,
Thank you for the amazing work!
While experimenting with your code, despite running the training multiple times, we're observing stability issues. Here is an example of one of the rew_total graphs:

Is this behavior expected or indicative of an underlying problem? Is the maximum total reward achieved here (around 350) the same as you got? Additionally, if you could share the graphs from one of your runs it might help us to track down the issue and understand the expected behavior.
Thanks!
Metadata
Metadata
Assignees
Labels
No labels