Skip to content

Conversation

AhmedYKadah
Copy link

Faster than current wrapper function call (including Float32 function call).
Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c

Copy link

codecov bot commented Mar 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.24%. Comparing base (1f0527c) to head (1d45644).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #491      +/-   ##
==========================================
+ Coverage   94.11%   94.24%   +0.13%     
==========================================
  Files          14       14              
  Lines        2905     2973      +68     
==========================================
+ Hits         2734     2802      +68     
  Misses        171      171              
Flag Coverage Δ
unittests 94.24% <100.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AhmedYKadah
Copy link
Author

Old:
Float 64
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217729 samples with 1000 evaluations per sample.
Range (min … max): 6.300 ns … 283.700 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.500 ns ┊ GC (median): 0.00%
Time (mean ± σ): 21.993 ns ± 13.209 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 312732 samples with 1000 evaluations per sample.
Range (min … max): 4.300 ns … 125.100 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.900 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.035 ns ± 7.951 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New:
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 507504 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 4.890 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.700 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.775 ns ± 9.855 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32
@benchmark Float32(erf(data)) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 526797 samples with 1000 evaluations per sample.
Range (min … max): 5.400 ns … 195.500 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.800 ns ┊ GC (median): 0.00%
Time (mean ± σ): 8.521 ns ± 2.236 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

@AhmedYKadah
Copy link
Author

Float32 implementation available, but not faster than Float64 version due to a exp() call.
Float64 version still faster than old Float32.

@AhmedYKadah
Copy link
Author

need to clean up polynomial evaluations.
code also could use more organization

@AhmedYKadah
Copy link
Author

Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation

@AhmedYKadah AhmedYKadah changed the title Added erf(x) Float64 Julia implementation Added erf(x) Float64 and Float32 Julia implementations Sep 14, 2025
else
return 1.0 - r
end
elseif (ia < 0x4017a000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance you can regenerate the polies to make this be 0x40180000? If you can, you would be able to use UInt16 literals for all of these by only taking the top 16 rather than top 32 bits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that have a meaningful impact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not much. It might just save a cycle or 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you wanted to test the speed, you could try it without regenerating the polys and it should give you a good idea.

…necessary whitespace, and removed explicit copysigns

end

_erf(x::Float16)=Float16(_erf(Float32(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you wanted to do a Float16 impl, it should be easier than the others. Specifically, the domain is only to 2, and the accuracy required is much reduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% could wait for a followup PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that too to be honest.
this and the poli regen.

@oscardssmith
Copy link
Member

Given that this is faster and accurate, seems good to merge to me!

@mschauer
Copy link
Member

Are there any tests for edge cases/ULP in the c version we do not do ourselves?

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation does not handle NaN32 and NaN16 correctly:

julia> erf(NaN32)
1.0f0

julia> erf(NaN16)
Float16(1.0)

@mschauer
Copy link
Member

Then we should also add a test for these

AhmedYKadah and others added 2 commits September 17, 2025 11:11
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
AhmedYKadah and others added 3 commits September 17, 2025 11:13
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
@AhmedYKadah
Copy link
Author

There aren't any tests for erfc. Is that expected?

@AhmedYKadah
Copy link
Author

Any other changes needed?

@oscardssmith
Copy link
Member

we should probably should test erfc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants