When training, the code hangs while calculating the FID on real images after epoch 0.
I let the thing run overnight on without any progress. Still seeing 12% CPU utilization in the morning, GPU memory full, no GPU utilization.
Training proceeds with the "--metrics none" flag added to the command line.
The dataset is around 88k images, but I do not see this behavior on the vanilla NVIDIA code.
Your metric_base.py seems identical to NVIDIA's code as does how you invoke it in train.py.
I wonder if the "top_k" modifications to dnnlib might have something to do with this behavior? I see that while metric_base.py is unmodified, it does import dnnlib.
When training, the code hangs while calculating the FID on real images after epoch 0.
I let the thing run overnight on without any progress. Still seeing 12% CPU utilization in the morning, GPU memory full, no GPU utilization.
Training proceeds with the "--metrics none" flag added to the command line.
The dataset is around 88k images, but I do not see this behavior on the vanilla NVIDIA code.
Your metric_base.py seems identical to NVIDIA's code as does how you invoke it in train.py.
I wonder if the "top_k" modifications to dnnlib might have something to do with this behavior? I see that while metric_base.py is unmodified, it does import dnnlib.