Unable to Train with FID metrics Enabled

When training, the code hangs while calculating the FID on real images after epoch 0.

I let the thing run overnight on without any progress.  Still seeing 12% CPU utilization in the morning, GPU memory full, no GPU utilization.

Training proceeds with the "--metrics none" flag added to the command line.

The dataset is around 88k images, but I do not see this behavior on the vanilla NVIDIA code.

Your metric_base.py seems identical to NVIDIA's code as does how you invoke it in train.py.

I wonder if the "top_k" modifications to dnnlib might have something to do with this behavior?  I see that while metric_base.py is unmodified, it does import dnnlib.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Train with FID metrics Enabled #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to Train with FID metrics Enabled #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions