Added erf(x) Float64 and Float32 Julia implementations #491

AhmedYKadah · 2025-03-31T15:47:05Z

Faster than current wrapper function call (including Float32 function call).
Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c

codecov · 2025-03-31T16:29:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.24%. Comparing base (1f0527c) to head (1d45644).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #491      +/-   ##
==========================================
+ Coverage   94.11%   94.24%   +0.13%     
==========================================
  Files          14       14              
  Lines        2905     2973      +68     
==========================================
+ Hits         2734     2802      +68     
  Misses        171      171

Flag	Coverage Δ
unittests	`94.24% <100.00%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AhmedYKadah · 2025-03-31T17:28:21Z

Old:
Float 64
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217729 samples with 1000 evaluations per sample.
Range (min … max): 6.300 ns … 283.700 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.500 ns ┊ GC (median): 0.00%
Time (mean ± σ): 21.993 ns ± 13.209 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 312732 samples with 1000 evaluations per sample.
Range (min … max): 4.300 ns … 125.100 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.900 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.035 ns ± 7.951 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New:
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 507504 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 4.890 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.700 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.775 ns ± 9.855 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark Float32(erf(data)) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 526797 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 195.500 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.800 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.521 ns ± 2.236 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

AhmedYKadah · 2025-03-31T17:30:33Z

Float32 implementation available, but not faster than Float64 version due to a exp() call.
Float64 version still faster than old Float32.

AhmedYKadah · 2025-04-05T13:09:02Z

need to clean up polynomial evaluations.
code also could use more organization

AhmedYKadah · 2025-08-02T08:07:59Z

Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation

src/erf.jl

oscardssmith · 2025-09-14T00:19:04Z

src/erf.jl

+        else
+            return 1.0 - r
+        end
+    elseif (ia < 0x4017a000)


any chance you can regenerate the polies to make this be 0x40180000? If you can, you would be able to use UInt16 literals for all of these by only taking the top 16 rather than top 32 bits.

would that have a meaningful impact?

probably not much. It might just save a cycle or 2.

if you wanted to test the speed, you could try it without regenerating the polys and it should give you a good idea.

…necessary whitespace, and removed explicit copysigns

src/erf.jl

oscardssmith · 2025-09-14T00:59:31Z

src/erf.jl

-
 end

 _erf(x::Float16)=Float16(_erf(Float32(x)))


if you wanted to do a Float16 impl, it should be easier than the others. Specifically, the domain is only to 2, and the accuracy required is much reduced.

100% could wait for a followup PR.

I'm thinking that too to be honest.
this and the poli regen.

oscardssmith · 2025-09-15T03:47:22Z

Given that this is faster and accurate, seems good to merge to me!

mschauer · 2025-09-16T06:30:08Z

Are there any tests for edge cases/ULP in the c version we do not do ourselves?

devmotion

The implementation does not handle NaN32 and NaN16 correctly:

julia> erf(NaN32)
1.0f0

julia> erf(NaN16)
Float16(1.0)

src/erf.jl

mschauer · 2025-09-16T07:17:24Z

Then we should also add a test for these

Co-authored-by: David Müller-Widmann <[email protected]>

src/erf.jl

Co-authored-by: David Müller-Widmann <[email protected]>

AhmedYKadah · 2025-09-19T09:44:51Z

There aren't any tests for erfc. Is that expected?

AhmedYKadah · 2025-09-19T10:56:44Z

Any other changes needed?

oscardssmith · 2025-09-19T12:57:28Z

we should probably should test erfc.

AhmedYKadah added 5 commits March 31, 2025 17:16

Added erf(x) Float64/Float32 Julia implementation

6f554ef

changed erf to _erf, got rid of unnecessary branch

7f4fd2d

fixed syntax error in ccall

e784c9f

fixed syntax error in ccall 2

da16cb1

NaN edge case for erf(x)

0a755b6

added test cases for erf(x)

6efcec8

AhmedYKadah and others added 2 commits August 2, 2025 10:19

Merge branch 'master' into erf(x)-implementation

0fc6d4d

cleaned up erf(Float64)

3cee8ce

AhmedYKadah added 4 commits September 14, 2025 02:40

added erf(x::Float32) implementation

b819c58

added NaN edge case to erf(x::Float32)

57bbaf2

Merge branch 'master' into erf(x)-implementation

5ad8278

reversed NaN check

26b3b1f

AhmedYKadah changed the title ~~Added erf(x) Float64 Julia implementation~~ Added erf(x) Float64 and Float32 Julia implementations Sep 14, 2025