Use ATTACH maps for array-sections/subscripts on pointers. #1

abhinavgaba · 2025-07-16T14:02:39Z

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036
[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.

With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.

 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }

Attach-style maps for declare mappers. That should be a separate PR.
use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
use_device_ptr handling has some issues too. Need debugging.
Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

abhinavgaba · 2025-07-16T14:03:51Z

offload/libomptarget/interface.cpp

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba · 2025-07-23T13:31:19Z

clang/lib/CodeGen/CGOpenMPRuntime.cpp

      const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
      ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
-          OverlappedElements = {},
-      bool AreBothBasePtrAndPteeMapped = false) const {


AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

Address llvm#162651 (comment).

…ribution. (llvm#163443) In some cases, loop bounds (lower, upper and step) of `scf.for` can come locally from the parent warp op the `scf.for`. Current logic will not yield the loop bounds in the new warp op generated during lowering causing sinked `scf.for` to have non dominating use. In this PR, we have added logic to yield loop bounds by default (treat them as other operands of `scf.for`) which fixes this bug.

The canonicalize is also triggered when the `trunc` is `nuw`. Proof: https://alive2.llvm.org/ce/z/eWvWe3 Fixes: llvm#162451

…m#163938) Addresses issue llvm#145937 Without this patch SROA generates new dbg_assign for new stores. We can simply steal the existing dbg_assigns linked to the old store when the store is not being split.

This reverts commit 0b9a7b8. This is causing test failures under LLVM: 1. Other/pass-pipeline-parsing.ll This broke premerge. This was notably not caught by premerge testing on the original PR because the original PR only touches polly, and premerge does not test LLVM when only polly is touched.

Implement CXXDefaultArgExpr support for ComplexType Issue llvm#141365

… SSE41 phminposuw intrinsic to be used in constexp (llvm#163041) Fix llvm#161336

…texpr (llvm#161914) Fix llvm#154520

Added support for ConditionalOperator, BinaryConditionalOperator and OpaqueValueExpr as lvalue. Implemented support for ternary operators with one branch being a throw expression. This required weakening the requirement that the true and false regions of the ternary operator must terminate with a `YieldOp`. Instead the true and false regions are now allowed to terminate with an `UnreachableOp` and no `YieldOp` gets emitted when the block throws.

This were all removed in llvm#160028, but I apparently missed this one instance in the documentation. Remove it given that it no longer works.

This patch adds a new script, premerge_advisor_explain.py that requests test failure explanations from the premerge advisor. For now it just prints them out to STDOUT. This allows for testing of the entire system by looking at failure explanations in failed jobs before we do the rest of the wiring to enable the premerge advisor to write out comments.

… AVX/AVX512 subvector extraction intrinsics to be used in constexpr llvm#157712 (llvm#162836) **This PR supersedes and replaces PR llvm#158853** The original branch diverged too far from the main branch, resulting in significant merge conflicts that were difficult to resolve cleanly. To provide a clean and reviewable history, this new PR was created by cherry-picking the necessary commits onto a fresh branch based on the latest `main`. --- *(Original Description)* This patch enables the use of AVX/AVX512 subvector extraction intrinsics within `constexpr` functions. This is achieved by implementing the evaluation logic for these intrinsics in `VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`. The original discussion and review comments can be found in the previous pull request for context: llvm#158853 Fixes llvm#157712

The primary purpose of this commit is to enable marking loads to LDS (global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux as with normal buffer loads) and to ensure that their !nontemporal annotations translate to appropriate settings of te cache control bits. However, in the process of implementing this feature, we also fixed - Incorrect handling of buffer loads to LDS in GlobalISel - Updating the handling of volatile on buffers in SIMemoryLegalizer: previously, the mapping of address spaces would cause volatile on buffer loads to be silently dropped on at least gfx10. --------- Co-authored-by: Matt Arsenault <[email protected]>

These two are lowered as if they are the expression: LHS = (LHS < RHS ) ? RHS : LHS; and LHS = (LHS < RHS ) ? LHS : RHS; This patch generates these expressions and ensures they are properly emitted into IR. Note: this is dependent on llvm#163580 and cannot be merged until that one is (or the tests will fail).

Replace with PatGprShiftMaskXLen/PatGprShiftMask32 or using the ShiftMaskXLen/ShiftMask32 ComplexPattern direclty in patterns. This avoids various casts that were need to make a ComplexPattern work inside of a PatFrag.

Variant part, represented by `DW_TAG_variant_part` is a structure with a discriminant and different variants, from which only one can be active and valid at the same time. The discriminant is the main difference between variant parts and unions represented by `DW_TAG_union` type. Variant parts are used by Rust enums, which look like: ```rust pub enum MyEnum { First { a: u32, b: i32 }, Second(u32), } ``` This type's debug info is the following `DICompositeType` with `DW_TAG_structure_type` tag: ```llvm !4 = !DICompositeType(tag: DW_TAG_structure_type, name: "MyEnum", scope: !2, file: !5, size: 96, align: 32, flags: DIFlagPublic, elements: !6, templateParams: !16, identifier: "faba668fd9f71e9b7cf3b9ac5e8b93cb") ``` With one element being also a `DICompositeType`, but with `DW_TAG_variant_part` tag: ```llvm !6 = !{!7} !7 = !DICompositeType(tag: DW_TAG_variant_part, scope: !4, file: !5, size: 96, align: 32, elements: !8, templateParams: !16, identifier: "e4aee046fc86d111657622fdcb8c42f7", discriminator: !21) ``` Which has a discriminator: ```llvm !21 = !DIDerivedType(tag: DW_TAG_member, scope: !4, file: !5, baseType: !13, size: 32, align: 32, flags: DIFlagArtificial) ``` Which then holds different variants as `DIDerivedType` elements with `DW_TAG_member` tag: ```llvm !8 = !{!9, !17} !9 = !DIDerivedType(tag: DW_TAG_member, name: "First", scope: !7, file: !5, baseType: !10, size: 96, align: 32, extraData: i32 0) !10 = !DICompositeType(tag: DW_TAG_structure_type, name: "First", scope: !4, file: !5, size: 96, align: 32, flags: DIFlagPublic, elements: !11, templateParams: !16, identifier: "cc7748c842e275452db4205b190c8ff7") !11 = !{!12, !14} !12 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !10, file: !5, baseType: !13, size: 32, align: 32, offset: 32, flags: DIFlagPublic) !13 = !DIBasicType(name: "u32", size: 32, encoding: DW_ATE_unsigned) !14 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !10, file: !5, baseType: !15, size: 32, align: 32, offset: 64, flags: DIFlagPublic) !15 = !DIBasicType(name: "i32", size: 32, encoding: DW_ATE_signed) !16 = !{} !17 = !DIDerivedType(tag: DW_TAG_member, name: "Second", scope: !7, file: !5, baseType: !18, size: 96, align: 32, extraData: i32 1) !18 = !DICompositeType(tag: DW_TAG_structure_type, name: "Second", scope: !4, file: !5, size: 96, align: 32, flags: DIFlagPublic, elements: !19, templateParams: !16, identifier: "a2094b1381f3082d504fbd0903aa7c06") !19 = !{!20} !20 = !DIDerivedType(tag: DW_TAG_member, name: "__0", scope: !18, file: !5, baseType: !13, size: 32, align: 32, offset: 32, flags: DIFlagPublic) ``` BPF backend was assuming that all the elements of any `DICompositeType` have tag `DW_TAG_member` and are instances of `DIDerivedType`. However, the single element of the outer composite type `!4` has tag `DW_TAG_variant_part` and is an instance of `DICompositeType`. The unconditional call of `cast<DIDerivedType>` on all elements was causing an assertion failure when any Rust code with enums was compiled to the BPF target. Fix that by: * Handling `DW_TAG_variant_part` in `visitStructType`. * Replacing unconditional call of `cast<DIDerivedType>` over `DICompositeType` elements with a `switch` statement, handling both `DW_TAG_member` and `DW_TAG_variant_part` and casting the element to an appropriate type (`DIDerivedType` or `DICompositeType`). Fixes: llvm#155778

Add `try_lock` to confirm to Lockable, which is necessary to use it with `std::scoped_lock`.

Tested in llvm#160193.

Having taken on a maintainer role for these dialects, make it official with a CODEOWNERS entry. --------- Co-authored-by: Jakub Kuderski <[email protected]>

Suggest the `initializer_list` overload instead. 4+ args is an arbitrary number that allows for incremental deprecation without having too update too many call sites. For more context, see llvm#163117.

Adding llvm-ir2vec, llvm-ctxprof-util (and llvm-sim) in the Bazel configs. llvm-ctxprof-util and llvm-ir2vec are used in several LIT unit tests, and the missing binary is causing unit test failures. llvm-ctxprof-util: https://github.com/llvm/llvm-project/blob/15cde999d47c3edc7647faf5fd967f5d5d88416a/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll#L2 llvm-ir2vec: https://github.com/llvm/llvm-project/blob/55d4e92c8821d5543469118a76fe38db866377b7/llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py#L60 llvm-sim: https://github.com/llvm/llvm-project/blob/894eaf481542adefde861a7e39f769f21d3f4fa4/llvm/test/lit.cfg.py#L268

This patch implements llvm::countr_zero_constexpr, a constexpr version of llvm::countr_zero, in terms of llvm::popcount while making llvm::popcount a constexpr function at the same time. The new function is intended to serve as a marker. When we switch to C++20, we will most likely go through functions in llvm/ADT/bit.h and replace them with their counterparts from <bit>. With llvm::countr_zero_constexpr, we can easily replace its use with std::countr_zero. This patch reimplements ConstantLog2 in terms of the new function.

This rewrite does not preserve numerics: for example, we'd expect the maximum fp value to yield Inf instead of identity. `GL.Length` does not allow for fast math flags, so we need to remove this. Special cases (constants) can be handled via a folder if someone wants to implement one.

) These two files were left during the upstream of the corresponding feature.

Add builders on the Python side that match builders in the C++ side, add tests for launching GPU kernels and regions, and correct some small documentation mistakes. This reflects the API decisions already made in the func dialect's Python bindings and makes use of the GPU dialect's bindings work more similar to C++ interface.

…ns (llvm#163863) Before the patch the added test case would indent the function and moving its second line beyond the column limit. Fixes llvm#68122.

As the Cygwin platform requires $PATH to be set in order to run unittests, do the same as for the regular Windows target.

…lvm#164039) Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from llvm#153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.

…ion variable (llvm#164147) `@SHLIBDIR@` is replaced by CMake's configuration function, so it must be in `lit.site.cfg.py.in` but not `lit.cfg.py`. `lit.cfg.py` must reference variables in generated `lit.site.cfg.py`. We didn't notice this problem because it only affects Windows (including MinGW and Cygwin) that are configured with either LLVM_LINK_LLVM_DYLIB=ON or BUILD_SHARED=ON.

Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement ObjectStore and ActionCache respectively. Those are on-disk persistent storage that build upon OnDiskTrieHashMap and implements key functions that are required by LLVMCAS interfaces. This abstraction layer defines how the objects are hashed and stored on disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB also defines: * How objects of various size are store on disk and are referenced by the trie nodes. * How to store the references from one stored object to another object that is referenced. In addition to basic APIs for ObjectStore and ActionCache, other advances database configuration features can be implemented in this layer without exposing to the users of the LLVMCAS interface. For example, OnDiskGraphDB has a faulty in function to fetch data from an upstream OnDiskGraphDB if the data is missing.

Move the parse tree utility function semantics::getDesignatorNameIfDataRef to Parser/tools.h and rename it to comply with the local style.

This fixes a build error when building tensorflow on riscv64 linux.

…-using-attach-maptype

…anslations happen at the end.

abhinavgaba commented Jul 16, 2025

View reviewed changes

offload/libomptarget/interface.cpp

Copy link

Owner Author

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba mentioned this pull request Jul 17, 2025

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036

Merged

abhinavgaba changed the title ~~[WIP] Use ATTACH maps for array-sections/subscripts on pointers.~~ Use ATTACH maps for array-sections/subscripts on pointers. Jul 22, 2025

This was referenced Jul 22, 2025

[OpenMP] Mapping of 'middle' structures chained through '->' does not work llvm/llvm-project#141042

Open

[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

Open

abhinavgaba commented Jul 23, 2025

View reviewed changes

c8ef and others added 6 commits October 17, 2025 23:41

[libc] add IPV6 related macros (llvm#162219)

c4ceb8d

[libc][NFC] Remove cpp20 requirement for arpa/inet test (llvm#163983)

404099d

Address llvm#162651 (comment).

[InstCombine]: Canonicalize to a mask when trunc nuw (llvm#163628)

fcb5293

The canonicalize is also triggered when the `trunc` is `nuw`. Proof: https://alive2.llvm.org/ce/z/eWvWe3 Fixes: llvm#162451

[NFC][SROA][DebugInfo] Reuse existing dbg_assigns where possible (llv…

7bbb03d

…m#163938) Addresses issue llvm#145937 Without this patch SROA generates new dbg_assign for new stores. We can simply steal the existing dbg_assigns linked to the old store when the store is not being split.

AmrDeveloper and others added 30 commits October 20, 2025 18:58

[CIR] Implement VisitCXXDefaultArgExpr for ComplexType (llvm#164079)

aac8a0d

Implement CXXDefaultArgExpr support for ComplexType Issue llvm#141365

[Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow…

3afbda0

… SSE41 phminposuw intrinsic to be used in constexp (llvm#163041) Fix llvm#161336

[Headers][X86] Allow MMX/SSE/AVX MOVMSK intrinsics to be used in cons…

725a297

…texpr (llvm#161914) Fix llvm#154520

[LLVM][Docs] Remove Stray %T Substitution

737e116

This were all removed in llvm#160028, but I apparently missed this one instance in the documentation. Remove it given that it no longer works.

[RISCV] Remove shiftop/shiftopw PatFrags. NFC (llvm#164050)

5c86520

Replace with PatGprShiftMaskXLen/PatGprShiftMask32 or using the ShiftMaskXLen/ShiftMask32 ComplexPattern direclty in patterns. This avoids various casts that were need to make a ComplexPattern work inside of a PatFrag.

[lldb] Add try_lock to SBMutex (llvm#164109)

c01a223

Add `try_lock` to confirm to Lockable, which is necessary to use it with `std::scoped_lock`.

[GitHub][CI] Run clang-tidy in dedicated container (llvm#164290)

c0073a9

Tested in llvm#160193.

[NFC] Add myself to CODEOWNERS for AMD dialects (llvm#164289)

a39704f

Having taken on a maintainer role for these dialects, make it official with a CODEOWNERS entry. --------- Co-authored-by: Jakub Kuderski <[email protected]>

[ADT] Deprecate StringSwitch Cases with 4+ args. NFC. (llvm#164276)

894eaf4

Suggest the `initializer_list` overload instead. 4+ args is an arbitrary number that allows for incremental deprecation without having too update too many call sites. For more context, see llvm#163117.

[NFC][Clang][AMDGPU] Fix upstream and downstream difference (llvm#164304

c683f21

) These two files were left during the upstream of the corresponding feature.

[clang-format] Respect ColumnLimit while aligning multiline expressio…

c318f82

…ns (llvm#163863) Before the patch the added test case would indent the function and moving its second line beyond the column limit. Fixes llvm#68122.

[Unittest][Cygwin] Set $PATH when running unittests (llvm#163947)

d4af5e6

As the Cygwin platform requires $PATH to be set in order to run unittests, do the same as for the regular Windows target.

[flang] Move parse tree tool to Parser/tools.h (llvm#163998)

5cd9f0f

Move the parse tree utility function semantics::getDesignatorNameIfDataRef to Parser/tools.h and rename it to comply with the local style.

[bazel] Add riscv64 linux platform (llvm#163781)

c3d905e

This fixes a build error when building tensorflow on riscv64 linux.

Merge remote-tracking branch 'origin/main' into map-ptr-array-section…

2ef0068

…-using-attach-maptype

Remove XFAIL from two class-member use_device_ptr tests.

ea97bd6

Remove dead deferred-entry codegen. All non-mathing use_device_ptr tr…

605c732

…anslations happen at the end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Use ATTACH maps for array-sections/subscripts on pointers. #1

Uh oh!

abhinavgaba commented Jul 16, 2025 •

edited

Loading

Uh oh!

abhinavgaba Jul 16, 2025

Uh oh!

abhinavgaba Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

135 participants

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Are you sure you want to change the base?

Use ATTACH maps for array-sections/subscripts on pointers. #1

Uh oh!

Conversation

abhinavgaba commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavgaba Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

135 participants

abhinavgaba commented Jul 16, 2025 •

edited

Loading