Add `Expr(:ivdepscope)` to support not marking the entire loop body as `ivdep` #43261

N5N3 · 2021-11-30T05:14:40Z

Motivation:
Currently, @simd ivdep assumes the entire loop is free of "no loop-carried memory dependencies", which limits its usage in our broadcast system.
This PR tries to split julia.ivdep into 2 meta: julia.ivdep.begin and julia.ivdep.end, and makes the simd-loop pass only marks the access within a begin/end block as MD_mem_parallel_loop_access.
With this PR, if we find that all the args in a flat bc::Broadcasted are safe to parallelly loaded, and the dest::AbstractArray is safe to parallelly strored.
Then we can implement the copyto! kernal as:

@eval function copyto!(dest::AbstractArray, bc::Broadcasted)
    @inbounds @simd for I in eachindex(bc)
        $(Expr(:loopinfo, Symbol("julia.ivdep.begin")))    # args are safe to load
        args = _getindex(bc.args, I)
        $(Expr(:loopinfo, Symbol("julia.ivdep.end")))
        temp = bc.f(args...)                                   # bc.f might have side effect
        $(Expr(:loopinfo, Symbol("julia.ivdep.begin")))    # result is safe to store
        dest[I] = temp
        $(Expr(:loopinfo, Symbol("julia.ivdep.end")))
    end
end

If bc.f is free of memory access, then LLVM should FIND this loop vectorlizable and add no runtime check. (and we can make a .+= 1 vectorlized more easily)
If not, then let LLVM checks whether bc.f might have side effect.

Changes in this PR:
This PR only changes the pass inplementation.
And makes @simd ivdep generates a julia.ivdep.begin/end block instead of a single julia.ivdep
The usage and effect of @simd and @simd ivdep are not changed.

I'm not familiar with LLVM and I'm not sure this change is the correct way to make self-inplace broadcast vectorlizable.
All suggestions and comments are welcome.

N5N3 · 2021-12-01T08:27:10Z

After some further trial with:

06a16b8 : removes @inline, and replace Ref with Some, thanks @vtjnash's code
b5d4bc0 : implements the broadcast kernal with local ivdep marks and some optimization inspired by Workaround #28126, support SIMDing broadcast in more cases #30973.

I find that, to make self-inplace broadcast vectorlizable, we only need to tell LLVM "arrayref within bc's getindex and arrayset within dest's setindex! are parallelly-safe during the broadcast kernal". Combining this PR with the above 2 commits, on my desktop I have:

julia> using BenchmarkTools
julia> a = zeros(Float32, 128, 32); a_ = view(a, 1:128, 1:32);
julia> @btime $a .+= $a;
  280.000 ns (0 allocations: 0 bytes)  # on 1.7.0: 1.650 μs (0 allocations: 0 bytes)
julia> @btime $a .+= $a .+ $a .+ $a;
  511.979 ns (0 allocations: 0 bytes)  # on 1.7.0: 3.550 μs (0 allocations: 0 bytes) 
julia> @btime $a_ .+= $a_;
  289.286 ns (0 allocations: 0 bytes)  # on 1.7.0: 1.730 μs (0 allocations: 0 bytes)
julia> @btime $a_ .+= $a_ .+ $a_ .+ $a_;
  609.551 ns (0 allocations: 0 bytes)  # on 1.7.0: 6.440 μs (0 allocations: 0 bytes)

Some simple safety check:

julia> const p = Ref(0);
julia> a = zeros(Float32, 128, 32); b = similar(a);

julia> f(x) = x + (p[] += 0); # f has no side-effect
julia> @btime $b .= f.($a);
  221.526 ns (0 allocations: 0 bytes) # on 1.7.0: 1.440 μs (0 allocations: 0 bytes) 
julia> @btime $a .= f.($a);
  164.899 ns (0 allocations: 0 bytes) # on 1.7.0: 2.033 μs (0 allocations: 0 bytes)

julia> g(x) = x + (p[] = ~p[]); # g has side-effect
julia> @btime $b .= g.($a);
  2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.522 μs (0 allocations: 0 bytes)
julia> @btime $a .= g.($a);
  2.411 μs (0 allocations: 0 bytes) # on 1.7.0: 2.511 μs (0 allocations: 0 bytes)

The above example shows that this change is safer than replacing @simd in broadcast kernal with @simd ivdep.
But I didn't look into why 1.7.0 faild to vectorlize b .= f.(a). (1.7.0 could vectorlize b .= f.(a) if we define f(x) as f(x) = x + p[])
Maybe more tests are needed.
Hope this is the correct way to "fix" #43153 safely.

vchuravy

I am not a fan of the current construct. Currently Expr(:loopinfo) has the semantics that it should always be the last instruction in a loop. For a begin/end construct I would rather mimic Base.Experimental.Const and @aliasscope

vchuravy · 2021-12-05T17:39:14Z

base/simdloop.jl

 """
 macro simd(forloop)
-    esc(compile(forloop, nothing))
+    esc(compile(forloop, Symbol("julia.ivdep.end")))


Can we keep this nothing? Makes little sense to have non-matching "begin"/"end" construct

IIUC, we'd better have something like Expr(:ivdepscope) and Expr(:popivdepscope) instead.
(and add a @ivdep macro ?).
Edit: Apparently this is beyond my competence. Can't understand why:

julia> @eval f(x) = $(Expr(:ivdepscope, :begin)) f (generic function with 1 method) julia> @code_lowered f(1) CodeInfo( 1 ─ $(Expr(:ivdepscope, :(Main.begin))) └── return nothing ) julia> @eval f(x) = $(Expr(:loopinfo, :begin)) f (generic function with 1 method) julia> @code_lowered f(1) CodeInfo( 1 ─ $(Expr(:loopinfo, :begin)) └── return nothing )

vchuravy

Expr(:loopinfo) has the dedicated meaning to attach metadata to the LLVM loop. The current ivdep implementation uses that since it is for the entire scope of the loop. See #31376 for another use-case.

For scoped ivdep I think the implementation should mirror more #31018

1. introduce `jl_ivdepscope_sym` 2. define `jl_ivdepscope_error` to thrown error message.

make `Expr(:ivdepscope, :begin/end)` lowered to `jl_ivdepscope_func`

1. let `loopinfo_mark` erase `julia.ivdepscope` if it has `julia.simd`. 2. erase `julia.ivdepscope` in unreachable branch even there's no `loopinfo_mark`. (make error message clearer)

N5N3 · 2021-12-07T12:41:15Z

I'm not sure whether this is the correct way to implement scoped ivdep:

A new meta head Expr(:ivdepscope) is introduced, which will be lowered as call void @julia.ivdepscope() with begin/end Metadata:
When loopinfo_marker found julia.simdloop, it will add parallelaccess meta based on a simple begin/end count.
During which the call void @julia.ivdepscope() will also be erased.
If there're remaining julia.ivdepscope() after all optimizaion, user might use @ivdep outside @simd for end. So I make @julia.ivdepscope() fallback to jl_ivdepscope_error to throw error.
User might write invalid code and thus leads to unreachable branch. To recover the original error message, the simdpass will erase call void @julia.ivdepscope() within basicblocks ending with unreachable

Some example:

julia> f(x) = @inbounds for i in eachindex(x)
           Base.@ivdep x[i] += i
       end
f (generic function with 1 method)

julia> f([1,2,3,4])
ERROR: Found ivdepscope outside @simd.
Stacktrace:
 [1] macro expansion
   @ .\simdloop.jl:151 [inlined]
 [2] f(x::Vector{Int64})
   @ Main .\REPL[1]:2
 [3] top-level scope
   @ REPL[2]:1

julia> f((1,2,3,4))
ERROR: MethodError: no method matching setindex!(::NTuple{4, Int64}, ::Int64, ::Int64)
Stacktrace:
 [1] macro expansion
   @ .\simdloop.jl:152 [inlined]
 [2] f(x::NTuple{4, Int64})
   @ Main .\REPL[1]:2
 [3] top-level scope
   @ REPL[3]:1

N5N3 · 2022-01-25T02:59:29Z

I guess we won't need this after #43852.

DilumAluthge requested a review from vchuravy November 30, 2021 07:04

vchuravy reviewed Dec 5, 2021

View reviewed changes

N5N3 closed this Dec 6, 2021

N5N3 reopened this Dec 6, 2021

N5N3 force-pushed the partial-ivdep branch 2 times, most recently from d184d0a to bbf2dbb Compare December 6, 2021 14:10

vchuravy requested changes Dec 6, 2021

View reviewed changes

N5N3 added 4 commits December 7, 2021 20:14

add Expr(:ivdepscope)

1e38f0e

1. introduce `jl_ivdepscope_sym` 2. define `jl_ivdepscope_error` to thrown error message.

modify codegen

3b915fb

make `Expr(:ivdepscope, :begin/end)` lowered to `jl_ivdepscope_func`

modify pass

f1043ed

1. let `loopinfo_mark` erase `julia.ivdepscope` if it has `julia.simd`. 2. erase `julia.ivdepscope` in unreachable branch even there's no `loopinfo_mark`. (make error message clearer)

add @ivdep

5d739dc

N5N3 force-pushed the partial-ivdep branch from bbf2dbb to 5d739dc Compare December 7, 2021 12:22

N5N3 requested a review from vchuravy December 7, 2021 12:41

add throw test

a51d031

N5N3 changed the title ~~Replace julia.ivdep with julia.ivdep.begin/end to support not marking the entire loop body as ivdep~~ Add Expr(:ivdepscope) to support not marking the entire loop body as ivdep Dec 8, 2021

N5N3 closed this Jan 25, 2022

N5N3 deleted the partial-ivdep branch February 17, 2022 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `Expr(:ivdepscope)` to support not marking the entire loop body as `ivdep` #43261

Add `Expr(:ivdepscope)` to support not marking the entire loop body as `ivdep` #43261

Uh oh!

N5N3 commented Nov 30, 2021 •

edited

Loading

Uh oh!

N5N3 commented Dec 1, 2021 •

edited

Loading

Uh oh!

vchuravy left a comment

Uh oh!

vchuravy Dec 5, 2021

Uh oh!

N5N3 Dec 6, 2021 •

edited

Loading

Uh oh!

vchuravy left a comment

Uh oh!

N5N3 commented Dec 7, 2021 •

edited

Loading

Uh oh!

N5N3 commented Jan 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add Expr(:ivdepscope) to support not marking the entire loop body as ivdep #43261

Add Expr(:ivdepscope) to support not marking the entire loop body as ivdep #43261

Uh oh!

Conversation

N5N3 commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N5N3 commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy left a comment

Choose a reason for hiding this comment

Uh oh!

vchuravy Dec 5, 2021

Choose a reason for hiding this comment

Uh oh!

N5N3 Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vchuravy left a comment

Choose a reason for hiding this comment

Uh oh!

N5N3 commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N5N3 commented Jan 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `Expr(:ivdepscope)` to support not marking the entire loop body as `ivdep` #43261

Add `Expr(:ivdepscope)` to support not marking the entire loop body as `ivdep` #43261

N5N3 commented Nov 30, 2021 •

edited

Loading

N5N3 commented Dec 1, 2021 •

edited

Loading

N5N3 Dec 6, 2021 •

edited

Loading

N5N3 commented Dec 7, 2021 •

edited

Loading