Skip to content

Conversation

@albertomercurio
Copy link

@albertomercurio albertomercurio commented Nov 11, 2025

Fixes #37

This PR adds support for atomic operations on Complex{Float32} and Complex{Float64} arrays across CPU and all GPU backends.

Implementation

  • Component-wise atomic operations: Complex numbers are atomically updated by performing separate atomic operations on their real and imaginary components
  • Supported on CPU (via UnsafeAtomics) and all GPU backends: CUDA, Metal, oneAPI, and OpenCL
  • Optimized paths for + and - operations use native atomic add/sub instructions
  • Other operations (like swap, replace) use atomic CAS on individual components
  • Multiple dispatch cleanly handles Complex vs non-Complex types

Testing

  • Added 13 CPU tests covering all atomic operations (get, set, modify, swap, replace) for both ComplexF32 and ComplexF64
  • Added 3 GPU tests per backend (CUDA, Metal, oneAPI, OpenCL) testing CAS, modify, and sugar syntax
  • All existing tests continue to pass
  • Tested with KernelAbstractions.jl:
using CUDA
using KernelAbstractions
import Atomix

T = ComplexF64

x = CUDA.rand(T, 100) .+ 0.5f0
res = CUDA.zeros(T, 1)

@kernel cpu=false inbounds=true function my_kernel(res, @Const(x))
    i = @index(Global)
    Atomix.@atomic res[1] += x[i]
end

kernel = my_kernel(KernelAbstractions.get_backend(x))
kernel(res, x; ndrange = length(x))

Array(res)[1]  # Matches sum(x)
sum(x)

Generated with GitHub Copilot.

Implements atomic operations for Complex{Float32} and Complex{Float64}
by reinterpreting them as UInt64/UInt128 and using integer atomics.

- Uses CAS loops for modify! operations on Complex types
- Adds tests for all atomic operations with complex numbers
- Maintains full compatibility with existing functionality

Fixes JuliaConcurrent#37

Generated with GitHub Copilot
Comment on lines +37 to +38
# Note: This is NOT fully atomic (components updated separately)
# but works for both ComplexF32 and ComplexF64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof this is a no-go in my opinion. You will thus easily get torn writes.

I think this would need to use 128-byte atomics

Comment on lines +83 to +87
# Complex atomic operations - separate atomics on real and imaginary parts
# This works for operations that decompose component-wise (+, -, right)
# Note: This provides per-component atomicity, not full Complex atomicity
# (other threads may observe intermediate states, but final result is correct)
@inline function _cuda_atomic_modify!(ptr::Core.LLVMPtr{Complex{T},A}, op::OP, x::Complex{T}) where {T<:Union{Float32,Float64},A,OP}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you are not gurantueed that a user is only using one kind of atomic operation on a memory location.

(e.g. someone doing a mul for good measure).

@vchuravy
Copy link
Member

How does C++ implement them (if at all)?

I think we need to guarantee full atomicity, and thus use a compare and swap loop on the byte value.

@albertomercurio
Copy link
Author

I'm not an expert of atomic operations. I just need it for my case, and with the help of Copilot just wrote this that make sense to me.

I have tested with the code of my first comment, and it seems to work. Do you think that that test is not enough?

@vchuravy
Copy link
Member

No I don't think the test is sufficient, and you are right for some algorithms you might not care about "full atomicity" and partial atomicity might be sufficient.

For me the crux is that sofar all operations we currently have in Atomix promise full atomicity.

For a made up example, imagine a algorithm where even odd lane performs an atomic addition and every even lane performs an atomic multiplication.

That's a weird thing to do, and I don't know of any place this comes up, but that is besides the point. With this interface we are trying to implement a general set of operations, with consistent semantics, and I much prefer an error than to chase down memory semantics bugs later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Atomix example fails with complex array element type

2 participants