-
Notifications
You must be signed in to change notification settings - Fork 104
Added erf(x) Float64 and Float32 Julia implementations #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #491 +/- ##
==========================================
+ Coverage 94.11% 94.24% +0.13%
==========================================
Files 14 14
Lines 2905 2973 +68
==========================================
+ Hits 2734 2802 +68
Misses 171 171
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Old:
Float32
New:
Float32
|
Float32 implementation available, but not faster than Float64 version due to a exp() call. |
need to clean up polynomial evaluations. |
Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation |
else | ||
return 1.0 - r | ||
end | ||
elseif (ia < 0x4017a000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance you can regenerate the polies to make this be 0x40180000
? If you can, you would be able to use UInt16 literals for all of these by only taking the top 16 rather than top 32 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that have a meaningful impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not much. It might just save a cycle or 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wanted to test the speed, you could try it without regenerating the polys and it should give you a good idea.
…necessary whitespace, and removed explicit copysigns
|
||
end | ||
|
||
_erf(x::Float16)=Float16(_erf(Float32(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wanted to do a Float16
impl, it should be easier than the others. Specifically, the domain is only to 2, and the accuracy required is much reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% could wait for a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that too to be honest.
this and the poli regen.
Given that this is faster and accurate, seems good to merge to me! |
Are there any tests for edge cases/ULP in the c version we do not do ourselves? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation does not handle NaN32
and NaN16
correctly:
julia> erf(NaN32)
1.0f0
julia> erf(NaN16)
Float16(1.0)
Then we should also add a test for these |
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
There aren't any tests for erfc. Is that expected? |
Any other changes needed? |
we should probably should test erfc. |
Faster than current wrapper function call (including Float32 function call).
Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c