-
-
Notifications
You must be signed in to change notification settings - Fork 2
algo: implement sine/cosine based on Intel SVML for single/double precision #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a3356f1
to
18d9307
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements highly accurate sine and cosine functions using Intel SVML techniques and extended precision range reduction, with separate implementations for small and large argument cases. Key changes include:
- New implementations of sin/cos functions that guarantee ≤1 ULP (single precision) and ≤4 ULP (double precision) accuracy.
- Support for large argument range reduction using multi-precision techniques based on a precomputed lookup table.
- Integration with the Google Highway library for platform-specific SIMD optimizations.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
npsr/utils-inl.h | Adds utility functions with conditional compilation and lookup support. |
npsr/trig/small-inl.h | Provides polynomial approximations for small-angle sin/cos calculations. |
npsr/trig/lut-inl.h.py | Implements a Python script to generate lookup tables for trigonometric reductions and approximations. |
npsr/trig/large-inl.h | Implements the extended precision algorithm for large argument reduction. |
npsr/trig/inl.h | Integrates the small and large argument implementations into a unified sin/cos API. |
npsr/npsr.h | Wraps the trigonometric headers ensuring single inclusion per target. |
npsr/common.h | Provides common macros, type definitions, and an RAII class for floating‑point environment control. |
Comments suppressed due to low confidence (2)
npsr/trig/small-inl.h:338
- Cosine-specific adjustments are marked as TODO and not yet implemented. Please complete the phase adjustment logic (i.e. correctly apply the π/2 shift) to ensure cosine accuracy matches the sine pathway.
// TODO: Implement cosine-specific adjustments
npsr/trig/large-inl.h:147
- [nitpick] Consider adding more inline comments to explain the rationale behind the magic numbers used for fractional shifts to improve code clarity and maintainability.
constexpr int kFracMidShift = kIsSingle ? 18 : 24;
npsr/common.h
Outdated
* | ||
* Precise precise = {kLowAccuracy, kNoSpecialCases, kNoLargeArgument}; | ||
* const ScalableTag<float> d; | ||
* typename V = Vec<DFromV<SclableTag>>; |
Copilot
AI
Jun 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the documentation comment: 'SclableTag' should be corrected to 'ScalableTag'.
* typename V = Vec<DFromV<SclableTag>>; | |
* typename V = Vec<DFromV<ScalableTag>>; |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this is a good starting point. Do you want to land this with #if 0
and other various things incomplete @seiko2plus? We can be a bit relaxed as we setup the repo.
npsr/utils-inl.h
Outdated
#ifdef NPSR_UTILS_INL_H_ | ||
#undef NPSR_UTILS_INL_H_ | ||
#else | ||
#define NPSR_UTILS_INL_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a lut-inl.h
, so we can haev LutX2
, LutX4
etc?
A generic utils-inl.h
is going to get pretty big I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, mainly to optimize 128-bit simd extensions, transpose and shuffles to reduce cache/memory access
npsr/common.h
Outdated
struct _Zero {}; | ||
static constexpr auto kForce = _Force{}; | ||
static constexpr auto kNearest = _Nearest{}; | ||
#if 0 // not used yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add these when we come across use cases for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, this good point .. I'm gonna remove them
npsr/trig/inl.h
Outdated
|
||
namespace npsr::HWY_NAMESPACE::sincos { | ||
template <bool IS_COS, typename V, typename Prec> | ||
HWY_API V SinCos(Prec &prec, V x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially worth creating a new macro for HWY_API
so it doesn't change behaviour for Highway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a name? HWY_API just combine HWY_ATTR and HWY_INLINE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPSR_INLINE
which expands to HWY_ATTR HWY_INLINE
?
I feel like a lot of things will eventually be using this, so we don't want it to break if Highway's API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: HWY_API does not include HWY_ATTR
, since all definitions are placed between the BEFORE/AFTER macros, which emit target attributes through #pragma
.
NPSR_INLINE
might cause confusion with HWY_INLINE
. I’ve updated the PR to use NPSR_INTRIN
instead — feel free to rename it later if you prefer. Please see:
numpy-simd-routines/npsr/hwy.h
Lines 5 to 13 in e2e1b10
// This macro is used to define intrinsics that are: | |
// NOTE: equals to HWY_API. | |
// - always inlined | |
// - flattened (no separate stack frame) | |
// - marked maybe unused to suppress warnings when they are not used | |
// NOTE: we do not need to use HWY_ATTR because we wrap Highway intrinsics in | |
// HWY_BEFORE_NAMESPACE()/HWY_AFTER_NAMESPACE() | |
// which implies the nessessary target attributes via #pargma. | |
#define NPSR_INTRIN static HWY_INLINE HWY_FLATTEN HWY_MAYBE_UNUSED |
Also, please check the latest push for the full set of changes.
npsr/trig/inl.h
Outdated
#include "npsr/sincos/small-inl.h" | ||
|
||
// clang-format off | ||
#if defined(NPSR_TRIG_INL_H_) == defined(HWY_TARGET_TOGGLE) // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be trig-inl.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
I was putting all my time into the testing unit, also improve luts generations.. so I left any extra optimizations off until I make sure that both functions sin/cos are well implemented. |
All stress tests passed localy on x86, just need a CI and nice cleanup and then it will be ready for review ... maybe I should split it up into multiple prs |
Yep, this is unreviewable as-is, so incremental PRs would be much appreciated 😸 |
Yes please, I am having a hard time reviewing this. |
Working on it. |
- Add sin/cos functions with ≤1 ULP and ≤4 ULP accuracy variants - Support large argument ranges with proper range reduction - Standalone implementation doesn't require libm - Written using Google Highway for SIMD optimization - Optimize for both float32 and float64 data types
- Add Python-based generator tool for creating C++ headers and Python templates - Include sollya wrapper for mathematical computations - Add utilities for formatting C/Python arrays and header generation - Include comprehensive .gitignore for Python/C++ development
- Configure spin commands for building, testing, and development - Add custom generate command for sollya-based file generation - Set up IPython integration with pre-imported numpy_sr module - Configure project metadata and build requirements
- Implement Python bindings for Highway SIMD intrinsics - Add type system for scalar/vector data with proper conversions - Create dynamic dispatch system for multiple SIMD targets - Include helper functions for ULP distance calculations - Set up pytest infrastructure for comprehensive testing
Highway provides portable SIMD intrinsics required by the testing framework
Initialize build configuration for npsr module
- Generate test vectors using Sollya for high-precision reference values - Test challenging cases including large arguments, pi multiples, and edge cases - Cover both float32 and float64 precision with ULP accuracy validation - Include tests for standard and low-accuracy modes
Clean up empty PreciseBind function in preparation for proper implementation
Enable trigonometric functions in the Python testing framework
- Correct polynomial coefficients for double precision cosine - Fix table indexing and data layout for lookup tables - Improve accuracy for large argument range reduction
- Implement efficient SVML-style linear approximation for sin/cos - Fix precision issues in Cody-Waite range reduction - Add proper sign handling for both sine and cosine - Improve polynomial evaluation with correction terms - Document mathematical equivalence between traditional and optimized approaches
- Move Precise class and related types to npsr/precise.h - Add deduction guides for cleaner template instantiation - Improve modularity by separating precision control from common utilities
…ures - Change parameter order to Precise<Args...>, V for consistency - Update include paths from sincos/ to trig/ - Fix constant types and naming conventions (kException -> kExceptions)
Add meson configuration to include trigonometric test data in installation
Define Python dependencies for building and testing: - meson/meson-python for build system - ninja for compilation - spin for development workflow - pytest for test execution
This patch removes the Python dependency by rewriting the generator using pure Sollya scripting. It also applies general code cleanups and adds support for non-FMA targets.
- Define `arch_pkgs` mapping for native, aarch64, s390x, ppc64le, and armhf - Allow selection via `NPSR_ARCH` env var (defaults to native) - Configure per-arch virtualenv under `.direnv/py/<arch>` - Bootstrap Python venv with `spin==0.13`
- Completely rewrote Sollya integration with a new Python CLI tool (tools/sollya/cli.py) - Replaced the old generate.py script with a more robust implementation - Added comprehensive Sollya core library (tools/sollya/core.sol) with utility functions - Created new SIMD-optimized lookup table implementation (npsr/lut-inl.h) - Significantly enhanced the Precise class for floating-point precision control - Refactored trigonometric data generation scripts with better documentation - Updated constants generation with improved Cody-Waite and Payne-Hanek reduction - Added kpi16-inl.h.sol for high-precision π/16 table generation - Removed deprecated utils-inl.h and old trigonometric data files - Improved project configuration in pyproject.toml and .spin/cmds.py - Enhanced code documentation throughout the codebase
32d7b75
to
863b10a
Compare
Now this PR contains the implementation without data helpers (neither the Sollya scripts nor the generated headers). They can be found in PR #4. |
…on dispatch - Replace IS_COS boolean template parameters with Operation enum class (kSin/kCos) - Introduce NPSR_INTRIN macro for consistent intrinsic function attributes - Update trigonometric function implementations to use new enum-based dispatch - Improve code readability and maintainability by eliminating boolean parameter confusion - Update function signatures and documentation to reflect new API design - Replace HWY_ATTR with HWY_INLINE for better consistency in LUT implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's get things moving and land this and refactor further afterwards 😸
Thanks for your patience @seiko2plus!
Uh oh!
There was an error while loading. Please reload this page.