-
Notifications
You must be signed in to change notification settings - Fork 111
LaunchContextTypes: (Thread loop optimizations RAJA launch) #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
cd5065e
initial commit for launch loop optimization
artv3 484ff1a
add structs to store gpu thread/block info in launch ctx
artv3 18f332b
add cuda variant and add build guards for cpu
artv3 21f6184
Merge branch 'develop' into artv3/launch-loop-opt
artv3 73f224a
rework to support dim3 copy in ctx
artv3 8a02fee
Merge branch 'artv3/launch-loop-opt' of https://github.com/LLNL/RAJA …
artv3 1fbe50b
minor clean up pass
artv3 672889e
make format
artv3 5908a20
Update include/RAJA/pattern/launch/launch_core.hpp
artv3 316e019
Merge branch 'develop' into artv3/launch-loop-opt
rhornung67 4d9f800
clean up pass
artv3 d9ce271
update with develop and fix merge conflicts
artv3 85aef5a
fix build error
artv3 0469302
take develop submodule
artv3 4a695f2
cuda backend
artv3 f91a498
make style
artv3 d21c41f
omp backend
artv3 40a5c1b
seq backend + make style
artv3 e0f4825
clean up pass
artv3 96e99d5
Update include/RAJA/pattern/launch/launch_context_policy.hpp
artv3 a9f0cca
minor clean up
artv3 7d4595b
minor clean up
artv3 c23f76f
Merge branch 'artv3/launch-loop-opt' of github.com:LLNL/RAJA into art…
artv3 c990a4f
revert changes to example
artv3 f7939fd
remove specialization from launch policy
artv3 c24331c
make work for function pointers
artv3 0518138
store dim3 based on launch context type - hip
artv3 d5da29a
rework omp backend
artv3 af88dbb
update sequential backend
artv3 21ad0a8
get things building for cuda -- need a good clean up pass
artv3 646a95b
cuda clean up pass
artv3 597641b
clean up ordering in hip launch
artv3 5403737
clean up ordering
artv3 e41e970
make style
artv3 7c95430
use constexpt for getting dim values
artv3 d7cbbb5
Add classes that can cache Idx/Dim
MrBurmark bfe72de
merge develop, fix conflict
artv3 e494dac
Merge branch 'feature/burmark1/cache_idx_dim' into artv3/launch-loop-opt
artv3 5c88a4d
use cache idx in launch
artv3 960f0b7
remove dead code
artv3 aa3186c
clean up pass
artv3 7e79393
clean up code
artv3 e8e5e6d
have it also work for cuda
artv3 97c5edd
simplify helper functions
artv3 4ffefda
clean up pass
artv3 c2135ed
minor clean up
artv3 f5218ef
Update include/RAJA/policy/cuda/launch.hpp
artv3 93b3456
update the way we get index data
artv3 f36a2ce
clean up pass
artv3 0e18deb
default needs the indicies and dims struct
artv3 4a5c0a6
clean up pass
artv3 f078f0c
make style
artv3 26a00a3
Merge branch 'develop' into artv3/launch-loop-opt
artv3 75d9fc8
clean up pass
artv3 0954fdb
clean up pass
artv3 4936ad3
clean up pass
artv3 78f5ba3
clean up pass
artv3 67d52d7
make style
artv3 b6e45b3
clena up pass
artv3 140e88d
minor move to base
artv3 eb98047
clean up pass
artv3 a05e982
make style
artv3 f7237c1
make style
artv3 f482fff
PR comments
artv3 882157b
make style
artv3 cf8a88f
Update include/RAJA/pattern/launch/launch_core.hpp
artv3 a40aa65
Update include/RAJA/pattern/launch/launch_core.hpp
artv3 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// | ||
| // Copyright (c) Lawrence Livermore National Security, LLC and other | ||
| // RAJA Project Developers. See top-level LICENSE and COPYRIGHT | ||
| // files for dates and other details. No copyright assignment is required | ||
| // to contribute to RAJA. | ||
| // | ||
| // SPDX-License-Identifier: (BSD-3-Clause) | ||
| //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// | ||
|
|
||
| #include <iostream> | ||
|
|
||
| #include "RAJA/RAJA.hpp" | ||
|
|
||
| /* | ||
| * RAJA Launch Example: LaunchContext index/dimension caching (CUDA/HIP) | ||
| * | ||
| * RAJA launch kernels receive a "launch context" object (ctx) that provides | ||
| * access to execution details needed by hierarchical kernels, such as: | ||
| * - team (block) indices and dimensions | ||
| * - thread indices and dimensions | ||
| * | ||
| * Many RAJA launch patterns (multiple nested RAJA::loop regions, multiple uses | ||
| * of indices/dims, etc.) can lead to repeated queries of the underlying device | ||
| * intrinsics (e.g., blockIdx.x, threadIdx.x, blockDim.x). RAJA provides | ||
| * LaunchContext policies that control whether those values are cached within | ||
| * the context object on first access and then reused. | ||
| * | ||
| * This example selects the "all cached indices and dims" policy for CUDA/HIP | ||
| * and runs a simple teams/threads kernel that writes `d_array[i] = i`. | ||
| */ | ||
|
|
||
| template<typename Backend> | ||
| struct BackendTraits; | ||
|
|
||
| #if defined(RAJA_ENABLE_HIP) | ||
| struct HipBackend; | ||
|
|
||
| template<> | ||
| struct BackendTraits<HipBackend> | ||
| { | ||
| static constexpr const char* name = "HIP"; | ||
| using device_res_t = RAJA::resources::Hip; | ||
| using launch_t = RAJA::hip_launch_t<true>; | ||
| // Cache all indices/dimensions accessed through the launch context. | ||
| //threadIdx, blockDim, blockIdx, gridDim cached | ||
| using cache_policy_t = RAJA::HipIndicesAndDims<true, true, true, true>; | ||
| using ctx_policy_t = RAJA::HipLaunchContextIndicesAndDimsPolicy<cache_policy_t>; | ||
| using block_x_direct_t = RAJA::hip_block_x_direct; | ||
| using thread_x_direct_t = RAJA::hip_thread_x_loop; | ||
| }; | ||
| #endif | ||
|
|
||
| #if defined(RAJA_ENABLE_CUDA) | ||
| struct CudaBackend; | ||
|
|
||
| template<> | ||
| struct BackendTraits<CudaBackend> | ||
| { | ||
| static constexpr const char* name = "CUDA"; | ||
| using device_res_t = RAJA::resources::Cuda; | ||
| using launch_t = RAJA::cuda_launch_t<true>; | ||
| // Cache all indices/dimensions accessed through the launch context. | ||
| //threadIdx, blockDim, blockIdx, gridDim cached | ||
| using cache_policy_t = RAJA::CudaIndicesAndDims<true, true, true, true>; | ||
| using ctx_policy_t = RAJA::CudaLaunchContextIndicesAndDimsPolicy<cache_policy_t>; | ||
| using block_x_direct_t = RAJA::cuda_block_x_direct; | ||
| using thread_x_direct_t = RAJA::cuda_thread_x_loop; | ||
| }; | ||
| #endif | ||
|
|
||
| template<typename Backend> | ||
| int run_example() | ||
| { | ||
| using T = BackendTraits<Backend>; | ||
|
|
||
| std::cout << "\n Running RAJA " << T::name | ||
| << " launch-context indices/dims caching example...\n"; | ||
|
|
||
| constexpr int N = 64; | ||
| constexpr int BLOCK_DIM = 32; | ||
| constexpr int GRID_DIM = 1; | ||
|
|
||
| typename T::device_res_t device_res; | ||
| RAJA::resources::Host host_res; | ||
|
|
||
| int* d_array = device_res.template allocate<int>(N); | ||
| int* h_array = host_res.allocate<int>(N); | ||
|
|
||
| for (int i = 0; i < N; ++i) | ||
| { | ||
| h_array[i] = -1; | ||
| } | ||
| device_res.memcpy(d_array, h_array, sizeof(int) * N); | ||
|
|
||
| using launch_policy = RAJA::LaunchPolicy<typename T::launch_t>; | ||
| // LaunchContextT binds a LaunchContext policy to the context type. | ||
| using Ctx = RAJA::LaunchContextT<typename T::ctx_policy_t>; | ||
| using teams_x = RAJA::LoopPolicy<typename T::block_x_direct_t>; | ||
| using threads_x = RAJA::LoopPolicy<typename T::thread_x_direct_t>; | ||
|
|
||
| RAJA::launch<launch_policy>( | ||
| device_res, | ||
| RAJA::LaunchParams(RAJA::Teams(GRID_DIM), RAJA::Threads(BLOCK_DIM)), | ||
| [=] RAJA_HOST_DEVICE(Ctx ctx) { | ||
| // The nested loops below will access team/thread indices/dimensions via | ||
| // the launch context. With the "all cached" policy, those values are | ||
| // cached in `ctx` the first time they are needed. | ||
|
|
||
| RAJA::loop<teams_x>(ctx, RAJA::RangeSegment(0, GRID_DIM), [&](int bx) { | ||
|
|
||
| // Iterate over more logical thread-iterations than the physical | ||
| // thread dimension to exercise the *_thread_x_loop mapping. | ||
| RAJA::loop<threads_x>(ctx, RAJA::RangeSegment(0, 2 * BLOCK_DIM), | ||
| [&](int tx) { | ||
|
|
||
| if (tx < N) | ||
| { | ||
| d_array[tx] = tx; | ||
| } | ||
|
|
||
| }); | ||
| }); | ||
| }); | ||
|
|
||
| device_res.memcpy(h_array, d_array, sizeof(int) * N); | ||
|
|
||
| int err_count = 0; | ||
| for (int i = 0; i < N; ++i) | ||
| { | ||
| if (h_array[i] != i) | ||
| { | ||
| ++err_count; | ||
| } | ||
| } | ||
|
|
||
| std::cout << " Result -- " << (err_count ? "FAIL" : "PASS") << "\n"; | ||
| if (err_count) | ||
| { | ||
| std::cout << " error count = " << err_count << "\n"; | ||
| } | ||
|
|
||
| device_res.deallocate(d_array); | ||
| host_res.deallocate(h_array); | ||
|
|
||
| return (err_count ? 1 : 0); | ||
| } | ||
|
|
||
| int main(int RAJA_UNUSED_ARG(argc), char** RAJA_UNUSED_ARG(argv[])) | ||
| { | ||
| #if defined(RAJA_ENABLE_HIP) || defined(RAJA_ENABLE_CUDA) | ||
| int err_count = 0; | ||
|
|
||
| #if defined(RAJA_ENABLE_HIP) | ||
| err_count += run_example<HipBackend>(); | ||
| #endif | ||
|
|
||
| #if defined(RAJA_ENABLE_CUDA) | ||
| err_count += run_example<CudaBackend>(); | ||
| #endif | ||
|
|
||
| std::cout << "\n DONE!...\n"; | ||
| return (err_count ? 1 : 0); | ||
| #else | ||
| std::cout << "Please build with HIP or CUDA to run this example ...\n"; | ||
| return 0; | ||
| #endif | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| /*! | ||
| ****************************************************************************** | ||
| * | ||
| * \file | ||
| * | ||
| * \brief RAJA header file containing a helper to | ||
| * determine the launch context type | ||
| * | ||
| ****************************************************************************** | ||
| */ | ||
|
|
||
| //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// | ||
| // Copyright (c) Lawrence Livermore National Security, LLC and other | ||
| // RAJA Project Developers. See top-level LICENSE and COPYRIGHT | ||
| // files for dates and other details. No copyright assignment is required | ||
| // to contribute to RAJA. | ||
| // | ||
| // SPDX-License-Identifier: (BSD-3-Clause) | ||
| //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// | ||
|
|
||
| #ifndef RAJA_pattern_context_policy_HPP | ||
| #define RAJA_pattern_context_policy_HPP | ||
|
|
||
| #include <type_traits> | ||
|
|
||
| namespace RAJA | ||
| { | ||
|
|
||
| template<typename LaunchContextPolicy> | ||
| class LaunchContextT; | ||
|
|
||
| class LaunchContextHostPolicy; | ||
|
|
||
| namespace detail | ||
| { | ||
|
|
||
| template<typename T> | ||
| struct first_argument; | ||
|
|
||
| template<typename R, typename Arg0, typename... Args> | ||
| struct first_argument<R(Arg0, Args...)> | ||
|
artv3 marked this conversation as resolved.
|
||
| { | ||
| using type = Arg0; | ||
| }; | ||
|
|
||
| template<typename C, typename R, typename Arg0, typename... Args> | ||
| struct first_argument<R (C::*)(Arg0, Args...)> | ||
| : first_argument<R(Arg0, Args...)> | ||
| {}; | ||
|
|
||
| template<typename C, typename R, typename Arg0, typename... Args> | ||
| struct first_argument<R (C::*)(Arg0, Args...) const> | ||
| : first_argument<R(Arg0, Args...)> | ||
| {}; | ||
|
|
||
| template<typename C, typename R, typename Arg0, typename... Args> | ||
| struct first_argument<R (C::*)(Arg0, Args...) noexcept> | ||
| : first_argument<R(Arg0, Args...)> | ||
| {}; | ||
|
|
||
| template<typename C, typename R, typename Arg0, typename... Args> | ||
| struct first_argument<R (C::*)(Arg0, Args...) const noexcept> | ||
| : first_argument<R(Arg0, Args...)> | ||
| {}; | ||
|
|
||
| template<typename T, typename = void> | ||
| struct callable_signature | ||
| { | ||
| using type = camp::decay<T>; | ||
| }; | ||
|
|
||
| template<typename T> | ||
| struct callable_signature<T, std::void_t<decltype(&camp::decay<T>::operator())>> | ||
| { | ||
| using type = decltype(&camp::decay<T>::operator()); | ||
| }; | ||
|
|
||
| template<typename T, typename = void> | ||
| struct launch_context_type | ||
| { | ||
| using type = LaunchContextT<LaunchContextHostPolicy>; | ||
| }; | ||
|
|
||
| template<typename T> | ||
| struct launch_context_type<T, | ||
| std::void_t<typename first_argument<camp::decay< | ||
| typename callable_signature<T>::type>>::type>> | ||
| { | ||
| using type = camp::decay< | ||
| typename first_argument<typename callable_signature<T>::type>::type>; | ||
| }; | ||
|
|
||
|
|
||
| } // namespace detail | ||
|
|
||
| } // namespace RAJA | ||
| #endif | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.