Experimental: Asynchronous kernels#3402
Open
bendudson wants to merge 1 commit into
Open
Conversation
Gather kernels using an `eval_into(result, expression)` builder pattern. The kernels can be streamed asynchronously or merged into one large kernel.
| ddt(vort) = -bracket(phi, vort, bm) + alpha * (nonzonal_phi - nonzonal_n); | ||
| // Two kernels can be evaluated asynchronously | ||
| eval_into(ddt(n), // Density equation | ||
| -bracket(phi, n, bm) + alpha * (nonzonal_phi - nonzonal_n) |
Contributor
There was a problem hiding this comment.
warning: no header providing "bracket" is directly included [misc-include-cleaner]
-bracket(phi, n, bm) + alpha * (nonzonal_phi - nonzonal_n)
^| #include <type_traits> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
Contributor
There was a problem hiding this comment.
warning: included header vector is not used directly [misc-include-cleaner]
Suggested change
ZedThree
reviewed
Jun 24, 2026
Comment on lines
+334
to
+339
| template <typename ExprView> | ||
| void launchExprView(BoutReal* out, const ExprView& expr_view | ||
| #if BOUT_HAS_CUDA && defined(__CUDACC__) | ||
| , | ||
| cudaStream_t stream | ||
| #endif |
Member
There was a problem hiding this comment.
Please can the whole function be in the preprocessor guards, rather than splitting the function arguments like this?
Comment on lines
+582
to
+588
| template <typename Result, typename Expr> | ||
| auto eval_into(Result& result, Expr&& expr) && { | ||
| using ExprType = std::decay_t<Expr>; | ||
| static_assert(bout::detail::is_eval_result_v<Result>, | ||
| "eval_into only supports Field2D, Field3D, and FieldPerp results"); | ||
| static_assert(bout::detail::is_eval_compatible_v<Result, ExprType>, | ||
| "eval_into result type does not match the expression family"); |
Member
There was a problem hiding this comment.
We should be able to use concepts here to make this clearer, I think?
Comment on lines
+386
to
+401
| template <typename T> | ||
| inline constexpr bool is_eval_result_v = | ||
| std::is_same_v<std::decay_t<T>, Field2D> || std::is_same_v<std::decay_t<T>, Field3D> | ||
| || std::is_same_v<std::decay_t<T>, FieldPerp>; | ||
|
|
||
| template <typename Result, typename Expr> | ||
| inline constexpr bool is_eval_compatible_v = | ||
| (std::is_same_v<std::decay_t<Result>, Field3D> && is_expr_field3d_v<Expr>) | ||
| || (std::is_same_v<std::decay_t<Result>, Field2D> && is_expr_field2d_v<Expr>) | ||
| || (std::is_same_v<std::decay_t<Result>, FieldPerp> && is_expr_fieldperp_v<Expr>); | ||
|
|
||
| template <typename Expr> | ||
| inline constexpr bool is_materialized_eval_expr_v = | ||
| std::is_same_v<std::decay_t<Expr>, Field3D> | ||
| || std::is_same_v<std::decay_t<Expr>, Field2D> | ||
| || std::is_same_v<std::decay_t<Expr>, FieldPerp>; |
Member
There was a problem hiding this comment.
I think we have like is_Field etc in traits.hxx -- are these different?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Gather kernels using an
eval_into(result, expression)builder pattern. The kernels can be streamed asynchronously or merged into one large kernel.