Reduce build times in CUDA and C++ for complex processes (split kernels and more)

This is a followup of #346: on ggttggg it is clear that build times start becoming very long again (20 minutes or more, mainly in CUDA, but also in clang/C++ the situation looks bad).

The issue is clearly related to inlining of FFV functions (hence to their templating in PR #328) and more generally to LTO/RDC/inlining optimizations over very large code bases (#229 et al).

Removing inlining by hand is an option, but small tests I have done in the past were really bad for performance.

The only viable solution is most likely splitting kernels (#310), not only for CUDA but also for C++. Once we have more than 1000 Feynman diagrams as in ggttggg, it makes no sense to do any optimizations across a single calculate_wavefunctions method with O(1k-10k) FFV calls. It looks better, even just for C++ and for build times, to split this into O(1k) functions, one per diagram.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce build times in CUDA and C++ for complex processes (split kernels and more) #348

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce build times in CUDA and C++ for complex processes (split kernels and more) #348

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions