Skip to content

Conversation

@fredrik-johansson
Copy link
Collaborator

@fredrik-johansson fredrik-johansson commented Nov 10, 2025

Speed up n_is_prime by replacing the BPSW test with smarter strong probable prime tests and some other optimizations.

  • Use bit-table lookup up to 2^15 instead of 2^12.

  • Up to 2^32, do a base-2 strong probable prime test and eliminate any remaining composites using a lookup table instead of another probable prime test. The modular exponentiation for the base-2 test is optimized using Shoup reduction found by @vneiger in Faster NMOD_RED using different precomputation? #2061 and taking advantage of the base being 2. The 32-bit test is now so fast that the n_is_oddprime_binary call previously used up to 20 bits (which triggered a runtime precomputation of the primes up to one million) is obsolete. I might put that back later in a more optimized form.

  • Up to 2^64, do a base-2 strong probable prime test followed by a single additional base-b test where b is chosen to guarantee that we detect all remaining composites. This is done using a big precomputed hash table following the idea of Forisek and Jancina https://ceur-ws.org/Vol-1326/020-Forisek.pdf.

Note that the original Forisek and Jancina table has 2^18 16 bit entries = 512 KB (there is a version with a smaller table, but this requires one extra probable prime test). The same idea has also been implemented in the Rust library https://github.com/JASory/machine-prime which seems to use a different table of the same size. I managed to generate an equally efficient table containing essentially 2^17 21-bit entries, requiring just 350 KB, i.e. 2/3 the space. I think it might be possible to push this down below 300 KB, but this would require a huge parallel computation (if anyone is interested, let me know). I think the table is justified because n_is_prime is one of the most important functions in FLINT, and the speedup is significant.

This PR also adds n_is_prime_odd_no_trial which skips trial division (useful in factoring, etc. where one already knows or suspects that there are no small factors).

Note: this PR leaves some unused and duplicated code in the ulong_extras module, but I'm not sure if I'll fix them right away; there is various other cleanup to do anyway.

Note: this PR also fixes a latent read-out-of-bounds bug from #2449 which showed up in CI, presumably due to changed malloc patterns.

Timings, average result for 10000 random inputs of the given bit length and type:

old is the previous n_is_prime
new is the new n_is_prime (followed by speedup over old)
notrial is n_is_prime_odd_no_trial (followed by speedup over old)

                  random                                 primes                                composites without small factors
bits |    old       new    speedup |     old       new    speedup   notrial  speedup |     old       new    speedup   notrial  speedup
 3   | 2.11e-09  2.29e-09   0.921  |  2.09e-09  2.32e-09   0.901   2.10e-09   0.995  |  2.09e-09  2.31e-09   0.905   2.08e-09   1.005
 4   | 2.97e-09  2.29e-09   1.297  |  2.99e-09  2.32e-09   1.289   2.11e-09   1.417  |  2.64e-09  2.32e-09   1.138   2.09e-09   1.263
 5   | 3.02e-09  2.30e-09   1.313  |  2.99e-09  2.32e-09   1.289   2.11e-09   1.417  |  2.58e-09  2.32e-09   1.112   2.08e-09   1.240
 6   | 3.06e-09  2.30e-09   1.330  |  2.99e-09  2.33e-09   1.283   2.11e-09   1.417  |  2.92e-09  2.33e-09   1.253   2.08e-09   1.404
 7   | 3.12e-09  2.30e-09   1.357  |  3.39e-09  2.32e-09   1.461   2.11e-09   1.607  |  3.17e-09  2.32e-09   1.366   2.08e-09   1.524
 8   | 3.33e-09  2.30e-09   1.448  |  4.58e-09  2.32e-09   1.974   2.11e-09   2.171  |  4.54e-09  2.32e-09   1.957   2.08e-09   2.183
 9   | 3.33e-09  2.30e-09   1.448  |  4.57e-09  2.33e-09   1.961   2.11e-09   2.166  |  5.63e-09  2.33e-09   2.416   2.08e-09   2.707
10   | 3.34e-09  2.29e-09   1.459  |  4.57e-09  2.33e-09   1.961   2.11e-09   2.166  |  6.86e-09  2.33e-09   2.944   2.08e-09   3.298
11   | 3.33e-09  2.30e-09   1.448  |  4.57e-09  2.32e-09   1.970   2.11e-09   2.166  |  8.09e-09  2.33e-09   3.472   2.08e-09   3.889
12   | 3.47e-09  2.29e-09   1.515  |  7.08e-09  2.33e-09   3.039   2.11e-09   3.355  |  9.57e-09  2.32e-09   4.125   2.09e-09   4.579
13   | 5.37e-09  2.30e-09   2.335  |  3.80e-08  2.32e-09  16.379   2.12e-09  17.925  |  2.25e-08  2.32e-09   9.698   2.08e-09  10.817
14   | 5.74e-09  2.30e-09   2.496  |  3.96e-08  2.32e-09  17.069   2.11e-09  18.768  |  3.60e-08  2.32e-09  15.517   2.09e-09  17.225
15   | 5.85e-09  2.29e-09   2.555  |  4.30e-08  2.33e-09  18.455   2.11e-09  20.379  |  4.11e-08  2.32e-09  17.716   2.08e-09  19.760
16   | 6.96e-09  5.68e-09   1.225  |  4.70e-08  4.01e-08   1.172   3.68e-08   1.277  |  4.55e-08  3.32e-08   1.370   2.86e-08   1.591
17   | 7.06e-09  6.11e-09   1.155  |  5.18e-08  4.49e-08   1.154   4.41e-08   1.175  |  5.03e-08  3.70e-08   1.359   3.22e-08   1.562
18   | 8.50e-09  6.92e-09   1.228  |  5.62e-08  4.84e-08   1.161   4.78e-08   1.176  |  5.52e-08  4.06e-08   1.360   3.65e-08   1.512
19   | 8.53e-09  7.07e-09   1.207  |  6.15e-08  5.77e-08   1.066   5.62e-08   1.094  |  5.98e-08  4.39e-08   1.362   4.07e-08   1.469
20   | 1.17e-08  9.17e-09   1.276  |  8.95e-08  6.57e-08   1.362   6.27e-08   1.427  |  7.51e-08  4.89e-08   1.536   4.48e-08   1.676
21   | 3.08e-08  1.13e-08   2.726  |  2.82e-07  7.23e-08   3.900   6.89e-08   4.093  |  1.45e-07  5.31e-08   2.731   4.89e-08   2.965
22   | 3.06e-08  1.19e-08   2.571  |  2.92e-07  7.50e-08   3.893   7.16e-08   4.078  |  1.47e-07  5.65e-08   2.602   5.22e-08   2.816
23   | 3.11e-08  1.23e-08   2.528  |  3.04e-07  8.21e-08   3.703   7.84e-08   3.878  |  1.53e-07  5.99e-08   2.554   5.55e-08   2.757
24   | 3.12e-08  1.27e-08   2.457  |  3.17e-07  8.77e-08   3.615   8.39e-08   3.778  |  1.58e-07  6.36e-08   2.484   5.90e-08   2.678
25   | 2.95e-08  1.24e-08   2.379  |  3.29e-07  9.15e-08   3.596   8.75e-08   3.760  |  1.64e-07  6.70e-08   2.448   6.24e-08   2.628
26   | 3.18e-08  1.36e-08   2.338  |  3.41e-07  9.78e-08   3.487   9.39e-08   3.632  |  1.71e-07  7.05e-08   2.426   6.58e-08   2.599
27   | 3.25e-08  1.42e-08   2.289  |  3.55e-07  1.04e-07   3.413   9.92e-08   3.579  |  1.77e-07  7.37e-08   2.402   6.90e-08   2.565
28   | 3.43e-08  1.48e-08   2.318  |  3.70e-07  1.07e-07   3.458   1.04e-07   3.558  |  1.83e-07  7.71e-08   2.374   7.24e-08   2.528
29   | 3.50e-08  1.55e-08   2.258  |  3.84e-07  1.12e-07   3.429   1.09e-07   3.523  |  1.88e-07  8.03e-08   2.341   7.55e-08   2.490
30   | 3.65e-08  1.60e-08   2.281  |  4.00e-07  1.17e-07   3.419   1.13e-07   3.540  |  1.94e-07  8.35e-08   2.323   7.87e-08   2.465
31   | 4.14e-08  1.58e-08   2.620  |  5.28e-07  1.23e-07   4.293   1.18e-07   4.475  |  2.03e-07  8.67e-08   2.341   8.20e-08   2.476
32   | 4.22e-08  1.68e-08   2.512  |  5.44e-07  1.29e-07   4.217   1.25e-07   4.352  |  2.08e-07  9.01e-08   2.309   8.54e-08   2.436
33   | 4.47e-08  2.95e-08   1.515  |  5.59e-07  3.19e-07   1.752   2.86e-07   1.955  |  2.14e-07  1.62e-07   1.321   1.28e-07   1.672
34   | 4.26e-08  2.87e-08   1.484  |  5.75e-07  3.28e-07   1.753   2.96e-07   1.943  |  2.20e-07  1.67e-07   1.317   1.33e-07   1.654
35   | 4.26e-08  2.88e-08   1.479  |  5.92e-07  3.38e-07   1.751   3.04e-07   1.947  |  2.27e-07  1.71e-07   1.327   1.37e-07   1.657
36   | 4.74e-08  3.20e-08   1.481  |  6.08e-07  3.47e-07   1.752   3.14e-07   1.936  |  2.33e-07  1.76e-07   1.324   1.42e-07   1.641
37   | 4.70e-08  3.15e-08   1.492  |  6.20e-07  3.57e-07   1.737   3.23e-07   1.920  |  2.39e-07  1.81e-07   1.320   1.47e-07   1.626
38   | 4.58e-08  3.11e-08   1.473  |  6.38e-07  3.66e-07   1.743   3.32e-07   1.922  |  2.45e-07  1.85e-07   1.324   1.52e-07   1.612
39   | 4.99e-08  3.38e-08   1.476  |  6.54e-07  3.76e-07   1.739   3.43e-07   1.907  |  2.52e-07  1.90e-07   1.326   1.57e-07   1.605
40   | 4.98e-08  3.38e-08   1.473  |  6.67e-07  3.85e-07   1.732   3.52e-07   1.895  |  2.57e-07  1.95e-07   1.318   1.62e-07   1.586
41   | 5.14e-08  3.50e-08   1.469  |  6.82e-07  3.94e-07   1.731   3.61e-07   1.889  |  2.63e-07  2.00e-07   1.315   1.67e-07   1.575
42   | 4.99e-08  3.44e-08   1.451  |  6.99e-07  4.04e-07   1.730   3.71e-07   1.884  |  2.69e-07  2.04e-07   1.319   1.71e-07   1.573
43   | 5.08e-08  3.50e-08   1.451  |  7.15e-07  4.13e-07   1.731   3.80e-07   1.882  |  2.76e-07  2.09e-07   1.321   1.76e-07   1.568
44   | 5.15e-08  3.54e-08   1.455  |  7.30e-07  4.22e-07   1.730   3.90e-07   1.872  |  2.82e-07  2.14e-07   1.318   1.81e-07   1.558
45   | 5.21e-08  3.60e-08   1.447  |  7.47e-07  4.32e-07   1.729   4.00e-07   1.867  |  2.89e-07  2.18e-07   1.326   1.86e-07   1.554
46   | 5.17e-08  3.59e-08   1.440  |  7.62e-07  4.41e-07   1.728   4.10e-07   1.859  |  2.95e-07  2.23e-07   1.323   1.91e-07   1.545
47   | 5.48e-08  3.80e-08   1.442  |  7.81e-07  4.51e-07   1.732   4.19e-07   1.864  |  3.02e-07  2.29e-07   1.319   1.96e-07   1.541
48   | 5.37e-08  3.68e-08   1.459  |  8.03e-07  4.60e-07   1.746   4.28e-07   1.876  |  3.11e-07  2.33e-07   1.335   2.01e-07   1.547
49   | 5.24e-08  3.60e-08   1.456  |  8.26e-07  4.70e-07   1.757   4.38e-07   1.886  |  3.22e-07  2.37e-07   1.359   2.05e-07   1.571
50   | 5.95e-08  3.91e-08   1.522  |  8.61e-07  4.79e-07   1.797   4.48e-07   1.922  |  3.36e-07  2.43e-07   1.383   2.10e-07   1.600
51   | 6.30e-08  4.04e-08   1.559  |  9.14e-07  4.89e-07   1.869   4.57e-07   2.000  |  3.57e-07  2.47e-07   1.445   2.14e-07   1.668
52   | 6.19e-08  3.71e-08   1.668  |  1.00e-06  5.00e-07   2.000   4.66e-07   2.146  |  3.95e-07  2.53e-07   1.561   2.19e-07   1.804
53   | 7.64e-08  4.09e-08   1.868  |  1.16e-06  5.11e-07   2.270   4.74e-07   2.447  |  4.58e-07  2.61e-07   1.755   2.24e-07   2.045
54   | 4.84e-08  3.79e-08   1.277  |  8.43e-07  5.21e-07   1.618   4.84e-07   1.742  |  2.92e-07  2.66e-07   1.098   2.28e-07   1.281
55   | 5.51e-08  4.32e-08   1.275  |  8.56e-07  5.30e-07   1.615   4.94e-07   1.733  |  2.97e-07  2.70e-07   1.100   2.33e-07   1.275
56   | 5.41e-08  4.23e-08   1.279  |  8.71e-07  5.40e-07   1.613   5.03e-07   1.732  |  3.01e-07  2.75e-07   1.095   2.38e-07   1.265
57   | 5.02e-08  3.94e-08   1.274  |  8.85e-07  5.49e-07   1.612   5.12e-07   1.729  |  3.05e-07  2.80e-07   1.089   2.43e-07   1.255
58   | 5.39e-08  4.24e-08   1.271  |  8.97e-07  5.59e-07   1.605   5.22e-07   1.718  |  3.10e-07  2.85e-07   1.088   2.47e-07   1.255
59   | 5.54e-08  4.36e-08   1.271  |  9.15e-07  5.67e-07   1.614   5.31e-07   1.723  |  3.15e-07  2.88e-07   1.094   2.52e-07   1.250
60   | 5.50e-08  4.34e-08   1.267  |  9.29e-07  5.77e-07   1.610   5.41e-07   1.717  |  3.20e-07  2.93e-07   1.092   2.56e-07   1.250
61   | 5.56e-08  4.43e-08   1.255  |  9.43e-07  5.89e-07   1.601   5.50e-07   1.715  |  3.27e-07  3.00e-07   1.090   2.61e-07   1.253
62   | 5.87e-08  4.71e-08   1.246  |  9.57e-07  6.01e-07   1.592   5.59e-07   1.712  |  3.33e-07  3.08e-07   1.081   2.65e-07   1.257
63   | 5.48e-08  4.39e-08   1.248  |  9.77e-07  6.10e-07   1.602   5.68e-07   1.720  |  3.38e-07  3.12e-07   1.083   2.70e-07   1.252
64   | 5.49e-08  4.21e-08   1.304  |  9.76e-07  5.92e-07   1.649   5.50e-07   1.775  |  3.43e-07  2.92e-07   1.175   2.49e-07   1.378

@albinahlback
Copy link
Collaborator

Wow, this is amazing!

@fredrik-johansson
Copy link
Collaborator Author

@danaj might have some comments

@user202729
Copy link
Contributor

Side note: from my understanding of the code, the computation 2^t mod n is done "generically" n_is_strong_probabprime2_preinv / _n_is_strong_probabprime_nmod -> n_powmod2_ui_preinv without making use of the base 2 being small.

If say n ≥ 2^32, the following can be done:

  • write t = a[0] + a[1] * 32 + a[2] * 32^2 + ... + a[l] * 32^l
  • compute 2^t = 2^a[0] * (2^a[1] * (2^a[2] * ...)^32)^32
  • each 2^a[i] can be computed in $O(1)$ time, the ^32 needs 5 squarings as before

this way you only need roughly $\frac{6}{5} \log_2 n$ instead of $2 \log_2 n$ multiplications. Wikipedia calls this the $2^k$-ary method.

@fredrik-johansson
Copy link
Collaborator Author

That could be worth trying, but note that the code already takes advantage of the base being 2 by doing multiplications by 2 as modular additions which are quite a bit cheaper than general multiplications.

A 4-ary or 8-ary method could potentially be useful for the non-2 bases too when one has 64 bit of exponents.

I tried using Shoup preconditioned multiplication for multiplication by the base when the base isn't 2 but this slowed things down in my experiments.

Another idea for the modular arithmetic is to exploit floating-point arithmetic with FMA below 53 bits. We have precomp functions in ulong_extras that use a floating-point inverse, but they use the regular integer multiplication, not floating-point with FMA. I'm not sure if this actually gives better latency than the integer code; it might make more sense for some form of vectorized primality test.

@fredrik-johansson fredrik-johansson merged commit 0915468 into flintlib:main Nov 13, 2025
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants