Better compiler caching strategy on Windows

**Is your feature request related to a problem? Please describe.**
This problem is annoying me for years: I find pycuda runs extremely slow on Windows but not on Linux. My program contains ~20 `ElementwiseKernel`s and `ReductionKernel`s. I find that the `SourceModule` is used to compile the code, and it will save the cubin files to the `cache_dir`. It works well on any Linux machine as I tested, which only have ~1s overhead to load the functions later. However, running my code on Windows for the first time costs ~2min, and later it still costs ~1min. This is because it always need to preprocess the code since the source code always contains `#include <pycuda-complex.hpp>`:
https://github.com/inducer/pycuda/blob/96aab3f4762eb90d9b32c04bbe88bd3aefdc5cc8/pycuda/compiler.py#L89-L90
As I tested, on any Windows computer, running `nvcc --preprocess "empty_file.cu" --compiler-options -EP` takes several seconds. In other words, the condition of whether using cache takes a very long time to compute.

**Describe the solution you'd like**
I tried to monkey patch this to remove the preprocess call above, and it works well. I'd like to find a better way to do it. The easiest way I can think of is adding an option to force ignoring the `#include` check (though it should not be used by default, since the user must know the potential risk)

**Describe alternatives you've considered**
Is there any nvcc options to speed-up the preprocessing? I don't know.

**Additional context**
The link below is one of the examples I worked on, but I guess any simple functionality of the `GPUArray` relies on the `SourceModule` is impacted by this.
https://github.com/bu-cisl/SSNP-IDT/blob/master/examples/forward_model.py


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better compiler caching strategy on Windows #462

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	if "#include" in source:
	checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))

Better compiler caching strategy on Windows #462

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions