-
Notifications
You must be signed in to change notification settings - Fork 154
fix: regression in non-fast scalar indexing support #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #760 +/- ##
==========================================
+ Coverage 89.79% 90.05% +0.25%
==========================================
Files 11 12 +1
Lines 1039 1066 +27
==========================================
+ Hits 933 960 +27
Misses 106 106 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ext/ForwardDiffGPUArraysCoreExt.jl
Outdated
idxs = collect( | ||
Iterators.drop(ForwardDiff.structural_eachindex(result), offset) | ||
)[1:chunksize] | ||
result[idxs] .= partial_fn.(Ref(dual), 1:chunksize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not have an inference issue due to losing static information about size? I would think this needs to be ntuple
unless it can prove things about size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be type-stable, it would just have dynamism in the function that would slow it down a bit during the broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the chunksize is already an Int, so I don't think we will have any benefit of using an ntuple
Noted in #759 (comment), GPU is completely untested in ForwardDiff.jl, so this sets up the buildkite pipeline. I setup the backend and all, and just took a few tests from #760 to seed it. The point of this isn't really to be a comprehensive set of GPU tests but rather to update this repo to have the standard tools the other repos have so GPU doesn't regress again/more.
19e8423
to
da2efb7
Compare
2536221
to
11540a0
Compare
In #472, the Has it been properly explored if the existing functions can be written in an alternative way that would support both fast and non-fast scalar arrays with the same generic code (which would avoid any new extensions)? |
Yes, on the master branch seeding is (again) performed without broadcasting. Depending on the structural array type the set of indices are not readily available in an allocation-free broadcastable form (e.g. set of uppertriangular indices for If we want to avoid these allocations (and the broadcasting overhead) for non-GPU arrays, I don't immediately see how this issue could be solved by a generic implementation. Possibly the amount of code duplication could be reduced by introducing a helper function or branch that based on the type of the input array switches between broadcasting and iterating (presumably defaulting to iteration?), but even in this case it would be necessary to add an extension that ensures that GPU arrays use broadcasting. Alternatively, we could default to using broadcasting (with the additional overhead of collecting the indices), and - as an additional optimization - only use iteration for a handful of selected base array types such as What are your thoughts @KristofferC? |
0b9132a
to
babf94f
Compare
bump on this |
Testing this patch out with Lux.jl, it will still cause regressions in the cases where we have a wrapper over CuArray.
This seems like a good solution without causing regression on use-cases that was supported prior to #739. There's also |
d2e7730
to
6688848
Compare
src/utils.jl
Outdated
@@ -0,0 +1,15 @@ | |||
# overload for array types that | |||
@inline supports_fast_scalar_indexing(::Array) = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the @inline
needed, did you encounter problems without it?
I think we might also want to extend this to
@inline supports_fast_scalar_indexing(::Array) = true | |
@inline supports_fast_scalar_indexing(::StridedArray) = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StridedArray is too broad here
julia> SubArray{Float64, 2, JLArray{Float64, 2}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false} <: StridedArray
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is that only a problem with how JLArray
is defined? Does it also cover views of CuArray
s?
If StridedArray
is problematic, another more generic alternative would be DenseArray
.
bump on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: project toml for julia pre 1.9 fix: support gradient + more test coverage chore: relax version chore: remove 1.6 support and bump min version to 1.10 fix: apply suggestions from code review Co-authored-by: David Widmann <[email protected]> fix: use a struct instead of closure fix: sizecheck chore: remove GPUArraysCore Co-authored-by: David Widmann <[email protected]> fix: revert _take chore: remove 1.8 checks chore: remove 0.1 Co-authored-by: David Widmann <[email protected]>
bbe764b
to
2c97323
Compare
The GPUArray backends don't support broadcasting with wrapped arrays nicely, so those tests will mostly fail. julia> using CUDA, LinearAlgebra
julia> x = LowerTriangular(cu(rand(Float32, 4, 4)))
4×4 LowerTriangular{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}}:
0.960887 ⋅ ⋅ ⋅
0.316333 0.612238 ⋅ ⋅
0.236091 0.209854 0.0883058 ⋅
0.370694 0.732681 0.0111619 0.270063
julia> x[diagind(x)] .= 10 And they won't have un-assigned elements. julia> CuArray{Float32}(undef, 2, 3)
2×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
-5.81535f-36 1.25147f7 -2.50125f-11
-1.98624 1.84662 1.95155
julia> JLArray{Float32}(undef, 3, 4)
3×4 JLArray{Float32, 2}:
6.771f-42 6.771f-42 9.42f-43 0.0
6.771f-42 6.771f-42 0.0 0.0
6.771f-42 6.771f-42 0.0 4.5657f-41 |
fixes #759
ForwardDiff.gradient
now supports GPU Arrayscc @ChrisRackauckas @devmotion