Skip to content

Unable to Train with FID metrics Enabled #15

@botty-mc-bot-face

Description

@botty-mc-bot-face

When training, the code hangs while calculating the FID on real images after epoch 0.

I let the thing run overnight on without any progress. Still seeing 12% CPU utilization in the morning, GPU memory full, no GPU utilization.

Training proceeds with the "--metrics none" flag added to the command line.

The dataset is around 88k images, but I do not see this behavior on the vanilla NVIDIA code.

Your metric_base.py seems identical to NVIDIA's code as does how you invoke it in train.py.

I wonder if the "top_k" modifications to dnnlib might have something to do with this behavior? I see that while metric_base.py is unmodified, it does import dnnlib.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions