-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add Expr(:ivdepscope) to support not marking the entire loop body as ivdep
#43261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
After some further trial with:
I find that, to make self-inplace broadcast vectorlizable, we only need to tell LLVM " julia> using BenchmarkTools
julia> a = zeros(Float32, 128, 32); a_ = view(a, 1:128, 1:32);
julia> @btime $a .+= $a;
280.000 ns (0 allocations: 0 bytes) # on 1.7.0: 1.650 μs (0 allocations: 0 bytes)
julia> @btime $a .+= $a .+ $a .+ $a;
511.979 ns (0 allocations: 0 bytes) # on 1.7.0: 3.550 μs (0 allocations: 0 bytes)
julia> @btime $a_ .+= $a_;
289.286 ns (0 allocations: 0 bytes) # on 1.7.0: 1.730 μs (0 allocations: 0 bytes)
julia> @btime $a_ .+= $a_ .+ $a_ .+ $a_;
609.551 ns (0 allocations: 0 bytes) # on 1.7.0: 6.440 μs (0 allocations: 0 bytes)Some simple safety check: julia> const p = Ref(0);
julia> a = zeros(Float32, 128, 32); b = similar(a);
julia> f(x) = x + (p[] += 0); # f has no side-effect
julia> @btime $b .= f.($a);
221.526 ns (0 allocations: 0 bytes) # on 1.7.0: 1.440 μs (0 allocations: 0 bytes)
julia> @btime $a .= f.($a);
164.899 ns (0 allocations: 0 bytes) # on 1.7.0: 2.033 μs (0 allocations: 0 bytes)
julia> g(x) = x + (p[] = ~p[]); # g has side-effect
julia> @btime $b .= g.($a);
2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.522 μs (0 allocations: 0 bytes)
julia> @btime $a .= g.($a);
2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.511 μs (0 allocations: 0 bytes)The above example shows that this change is safer than replacing |
vchuravy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of the current construct. Currently Expr(:loopinfo) has the semantics that it should always be the last instruction in a loop. For a begin/end construct I would rather mimic Base.Experimental.Const and @aliasscope
base/simdloop.jl
Outdated
| """ | ||
| macro simd(forloop) | ||
| esc(compile(forloop, nothing)) | ||
| esc(compile(forloop, Symbol("julia.ivdep.end"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this nothing? Makes little sense to have non-matching "begin"/"end" construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we'd better have something like Expr(:ivdepscope) and Expr(:popivdepscope) instead.
(and add a @ivdep macro ?).
Edit: Apparently this is beyond my competence. Can't understand why:
julia> @eval f(x) = $(Expr(:ivdepscope, :begin))
f (generic function with 1 method)
julia> @code_lowered f(1)
CodeInfo(
1 ─ $(Expr(:ivdepscope, :(Main.begin)))
└── return nothing
)
julia> @eval f(x) = $(Expr(:loopinfo, :begin))
f (generic function with 1 method)
julia> @code_lowered f(1)
CodeInfo(
1 ─ $(Expr(:loopinfo, :begin))
└── return nothing
)d184d0a to
bbf2dbb
Compare
vchuravy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. introduce `jl_ivdepscope_sym` 2. define `jl_ivdepscope_error` to thrown error message.
make `Expr(:ivdepscope, :begin/end)` lowered to `jl_ivdepscope_func`
1. let `loopinfo_mark` erase `julia.ivdepscope` if it has `julia.simd`. 2. erase `julia.ivdepscope` in unreachable branch even there's no `loopinfo_mark`. (make error message clearer)
|
I'm not sure whether this is the correct way to implement scoped
Some example: julia> f(x) = @inbounds for i in eachindex(x)
Base.@ivdep x[i] += i
end
f (generic function with 1 method)
julia> f([1,2,3,4])
ERROR: Found ivdepscope outside @simd.
Stacktrace:
[1] macro expansion
@ .\simdloop.jl:151 [inlined]
[2] f(x::Vector{Int64})
@ Main .\REPL[1]:2
[3] top-level scope
@ REPL[2]:1
julia> f((1,2,3,4))
ERROR: MethodError: no method matching setindex!(::NTuple{4, Int64}, ::Int64, ::Int64)
Stacktrace:
[1] macro expansion
@ .\simdloop.jl:152 [inlined]
[2] f(x::NTuple{4, Int64})
@ Main .\REPL[1]:2
[3] top-level scope
@ REPL[3]:1 |
julia.ivdep with julia.ivdep.begin/end to support not marking the entire loop body as ivdepExpr(:ivdepscope) to support not marking the entire loop body as ivdep
|
I guess we won't need this after #43852. |
Currently,
@simd ivdepassumes the entire loop is free of "no loop-carried memory dependencies", which limits its usage in our broadcast system.This PR tries to split
julia.ivdepinto 2 meta:julia.ivdep.beginandjulia.ivdep.end, and makes the simd-loop pass only marks the access within abegin/endblock asMD_mem_parallel_loop_access.With this PR, if we find that all the args in a flat
bc::Broadcastedare safe to parallelly loaded, and thedest::AbstractArrayis safe to parallelly strored.Then we can implement the
copyto!kernal as:If
bc.fis free of memory access, then LLVM should FIND this loop vectorlizable and add no runtime check. (and we can makea .+= 1vectorlized more easily)If not, then let LLVM checks whether
bc.fmight have side effect.This PR only changes the pass inplementation.
And makes
@simd ivdepgenerates ajulia.ivdep.begin/endblock instead of a singlejulia.ivdepThe usage and effect of
@simdand@simd ivdepare not changed.I'm not familiar with LLVM and I'm not sure this change is the correct way to make self-inplace broadcast vectorlizable.
All suggestions and comments are welcome.