Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speed up
n_is_primeby replacing the BPSW test with smarter strong probable prime tests and some other optimizations.Use bit-table lookup up to 2^15 instead of 2^12.
Up to 2^32, do a base-2 strong probable prime test and eliminate any remaining composites using a lookup table instead of another probable prime test. The modular exponentiation for the base-2 test is optimized using Shoup reduction found by @vneiger in Faster NMOD_RED using different precomputation? #2061 and taking advantage of the base being 2. The 32-bit test is now so fast that the
n_is_oddprime_binarycall previously used up to 20 bits (which triggered a runtime precomputation of the primes up to one million) is obsolete. I might put that back later in a more optimized form.Up to 2^64, do a base-2 strong probable prime test followed by a single additional base-b test where b is chosen to guarantee that we detect all remaining composites. This is done using a big precomputed hash table following the idea of Forisek and Jancina https://ceur-ws.org/Vol-1326/020-Forisek.pdf.
Note that the original Forisek and Jancina table has 2^18 16 bit entries = 512 KB (there is a version with a smaller table, but this requires one extra probable prime test). The same idea has also been implemented in the Rust library https://github.com/JASory/machine-prime which seems to use a different table of the same size. I managed to generate an equally efficient table containing essentially 2^17 21-bit entries, requiring just 350 KB, i.e. 2/3 the space. I think it might be possible to push this down below 300 KB, but this would require a huge parallel computation (if anyone is interested, let me know). I think the table is justified because
n_is_primeis one of the most important functions in FLINT, and the speedup is significant.This PR also adds
n_is_prime_odd_no_trialwhich skips trial division (useful in factoring, etc. where one already knows or suspects that there are no small factors).Note: this PR leaves some unused and duplicated code in the ulong_extras module, but I'm not sure if I'll fix them right away; there is various other cleanup to do anyway.
Note: this PR also fixes a latent read-out-of-bounds bug from #2449 which showed up in CI, presumably due to changed malloc patterns.
Timings, average result for 10000 random inputs of the given bit length and type: