Skip to content

Better compiler caching strategy on Windows #462

@zzjjbb

Description

@zzjjbb

Is your feature request related to a problem? Please describe.
This problem is annoying me for years: I find pycuda runs extremely slow on Windows but not on Linux. My program contains ~20 ElementwiseKernels and ReductionKernels. I find that the SourceModule is used to compile the code, and it will save the cubin files to the cache_dir. It works well on any Linux machine as I tested, which only have ~1s overhead to load the functions later. However, running my code on Windows for the first time costs ~2min, and later it still costs ~1min. This is because it always need to preprocess the code since the source code always contains #include <pycuda-complex.hpp>:

if "#include" in source:
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))

As I tested, on any Windows computer, running nvcc --preprocess "empty_file.cu" --compiler-options -EP takes several seconds. In other words, the condition of whether using cache takes a very long time to compute.

Describe the solution you'd like
I tried to monkey patch this to remove the preprocess call above, and it works well. I'd like to find a better way to do it. The easiest way I can think of is adding an option to force ignoring the #include check (though it should not be used by default, since the user must know the potential risk)

Describe alternatives you've considered
Is there any nvcc options to speed-up the preprocessing? I don't know.

Additional context
The link below is one of the examples I worked on, but I guess any simple functionality of the GPUArray relies on the SourceModule is impacted by this.
https://github.com/bu-cisl/SSNP-IDT/blob/master/examples/forward_model.py

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions