Training an agent now still takes a long time. The particular experiment in #36 took 4d 9h 11m 14s to finish.
Looking at the reward chart, it appears the agent could achieve 70% of the final performance in just 50M steps (or about 10 hours into training)

We should try to optimize based on the 10 hours time computational budget.
Training an agent now still takes a long time. The particular experiment in #36 took 4d 9h 11m 14s to finish.
Looking at the reward chart, it appears the agent could achieve 70% of the final performance in just 50M steps (or about 10 hours into training)
We should try to optimize based on the 10 hours time computational budget.