This page summarizes the state of SYCL feature support in the current develop branch of AdaptiveCpp. Features that are supported are listed with a link to the pull request where they have been merged.
(This list is incomplete and only contains features that are known to be problematic)
| Feature | Supported (PR link) | Caveats | Comments |
|---|---|---|---|
| Images | ❌ | --- | --- |
| OpenCL interop | ❌ | --- | --- |
| Hierarchical parallelism | ✔️ | HIP/CUDA: Does not limit execution in work group scope to one thread for performance reasons |
| Feature | Supported (PR link) | Caveats | Comments |
|---|---|---|---|
| Accessor simplifications | ✔️ (partial) (PR) | [6] | |
| USM: Memory management functions | ✔️ (PR) | [1] | |
| USM: Queue shortcuts | ✔️ (PR) | ||
| USM: Prefetch | ✔️ (PR) | [2] | |
| USM: mem_advise | ❌ | Implementation requires host tasks since backends do not provide async mem advise | |
| USM: memcpy | ✔️ (PR) | ||
| USM: memset/fill | ✔️ (PR) | ||
| host tasks | ❌ | ||
| Optional lambda naming | ✔️ (PR) | ||
| Subgroups | ✔️ (PR) | On CPU, subgroup size is always 1 | |
| In-order queues | ✔️ (PR) | ||
Explicit dependencies (depends_on()) |
✔️ (PR) | ||
| Backend interop API | ✔️ (PR) | [3] | |
| Reductions | ✔️ (PR) | [4] | |
| Group algorithms | ✔️ (PR) | [5] | |
| New device selector API | ✔️ (PR) | ||
| Aspect API | ✔️ (PR) | ||
| Deduction guides | ✔️ (PR) | ||
atomic_ref |
✔️ (PR) | ||
marray |
❌ | ||
New SYCL/sycl.hpp header |
✔️ (PR) | ||
| C++17 by default | ✔️ (PR) | ||
Builtin changes: ctz(), clz() |
❌ | ||
Remove *_class types |
❌ | ||
const return type for read accessor operator[] |
❌ | ||
Remove buffer API for unique_ptr |
❌ | ||
Replace program class with module |
❌ | ||
Add kernel_handler |
❌ | ||
explicit queue, context constructors |
✔️ (PR) | ||
| Only require C++ trivially copyable for shared data | ✔️ | Has always worked thanks to CUDA/HIP toolchain | |
| Update group class with new types/member functions | ❌ | ||
Remove nd_item::barrier() |
❌ | ||
Replace mem_fence with atomic_fence |
❌ | ||
Add vec::operator[],unary +,-, static constexpr get_size()/get_count() |
✔️ (PR) | ||
buffer, local accessor are C++ ContiguousContainer |
❌ | ||
Replace image with sampled_image, unsampled_image |
❌ | ||
| All accessors are placeholders | ✔️ (PR) | ||
Use single exception type derived from std::exception |
❌ | ||
| Default asynchronous handler should terminate program | ✔️ (PR) | ||
| Kernel invocation APIs take const reference to kernels, kernels must be immutable | ❌ | ||
Queue constructor accepting both device and context |
✔️ (PR) | ||
Simplified parallel_for API |
❌ | ||
| Clarified names for device specific info queries | ❌ | ||
| Address space changes, generic address spaces | ❌ | Partially, we have always had generic address spaces because of CUDA/HIP | |
Updated multi_ptr interface |
❌ | ||
Remove OpenCL types, cl_int etc |
✔️ | hipSYCL has stopped supporting them a long time ago |
- [1] HIP/ROCm implements unified memory using slow device accessible host memory. This means that hipSYCL's call to
hipMallocManagedcannot produce efficient shared allocations. - [2] HIP/ROCm does not provide the required functionality, so hipSYCL cannot expose it. Prefetch calls are ignored at the moment.
- [3] The interop types that backends expose is limited. Native queues can only be obtained using an
interop_handlebecause a queue in hipSYCL does not relate to any specific backend object. - [4] Only scalar reductions are supported. Note that the reduction interface is expected to change slightly with the release of SYCL 2020 final.
- [5] Note that the interface of group algorithms is expected to change slightly with the release of SYCL 2020 final.
- [6] Constructing read-only accessor using
accessor<const T>from non-constbuffer<T>is not yet supported.