Skip to content

Conversation

@abhinavgaba
Copy link
Owner

@abhinavgaba abhinavgaba commented Jul 16, 2025

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

  • When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.
 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }
  • Attach-style maps for declare mappers. That should be a separate PR.
  • use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
  • use_device_ptr handling has some issues too. Need debugging.
  • Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
OverlappedElements = {},
bool AreBothBasePtrAndPteeMapped = false) const {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

c8ef and others added 6 commits October 17, 2025 23:41
…ribution. (llvm#163443)

In some cases, loop bounds (lower, upper and step) of `scf.for` can come
locally from the parent warp op the `scf.for`. Current logic will not
yield the loop bounds in the new warp op generated during lowering
causing sinked `scf.for` to have non dominating use.

In this PR, we have added logic to yield loop bounds by default (treat
them as other operands of `scf.for`) which fixes this bug.
The canonicalize is also triggered when the `trunc` is `nuw`.

Proof: https://alive2.llvm.org/ce/z/eWvWe3
Fixes: llvm#162451
…m#163938)

Addresses issue llvm#145937

Without this patch SROA generates new dbg_assign for new stores. We can
simply steal the existing dbg_assigns linked to the old store when the
store is not being split.
This reverts commit 0b9a7b8.

This is causing test failures under LLVM:
1. Other/pass-pipeline-parsing.ll

This broke premerge. This was notably not caught by premerge testing on
the original PR because the original PR only touches polly, and premerge
does not test LLVM when only polly is touched.
AmrDeveloper and others added 30 commits October 20, 2025 18:58
Implement CXXDefaultArgExpr support for ComplexType

Issue llvm#141365
… SSE41 phminposuw intrinsic to be used in constexp (llvm#163041)

Fix llvm#161336
Added support for ConditionalOperator, BinaryConditionalOperator and
OpaqueValueExpr as lvalue.

Implemented support for ternary operators with one branch being a throw
expression. This required weakening the requirement that the true and
false regions of the ternary operator must terminate with a `YieldOp`.
Instead the true and false regions are now allowed to terminate with an
`UnreachableOp` and no `YieldOp` gets emitted when the block throws.
This were all removed in llvm#160028, but I apparently missed this one
instance in the documentation. Remove it given that it no longer works.
This patch adds a new script, premerge_advisor_explain.py that requests
test failure explanations from the premerge advisor. For now it just
prints them out to STDOUT. This allows for testing of the entire system
by looking at failure explanations in failed jobs before we do the rest
of the wiring to enable the premerge advisor to write out comments.
… AVX/AVX512 subvector extraction intrinsics to be used in constexpr llvm#157712 (llvm#162836)

**This PR supersedes and replaces PR llvm#158853**

The original branch diverged too far from the main branch, resulting in
significant merge conflicts that were difficult to resolve cleanly. To
provide a clean and reviewable history, this new PR was created by
cherry-picking the necessary commits onto a fresh branch based on the
latest `main`.

---

*(Original Description)*

This patch enables the use of AVX/AVX512 subvector extraction intrinsics
within `constexpr` functions. This is achieved by implementing the
evaluation logic for these intrinsics in
`VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`.

The original discussion and review comments can be found in the previous
pull request for context: llvm#158853

Fixes llvm#157712
The primary purpose of this commit is to enable marking loads to LDS
(global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux
as with normal buffer loads) and to ensure that their !nontemporal
annotations translate to appropriate settings of te cache control bits.

However, in the process of implementing this feature, we also fixed
- Incorrect handling of buffer loads to LDS in GlobalISel
- Updating the handling of volatile on buffers in SIMemoryLegalizer:
previously, the mapping of address spaces would cause volatile on buffer
loads to be silently dropped on at least gfx10.

---------

Co-authored-by: Matt Arsenault <[email protected]>
These two are lowered as if they are the expression: LHS = (LHS < RHS )
? RHS : LHS;
and
LHS = (LHS < RHS ) ? LHS : RHS;

This patch generates these expressions and ensures they are properly
emitted into IR.

Note: this is dependent on
llvm#163580
and cannot be merged until that one is (or the tests will fail).
Replace with PatGprShiftMaskXLen/PatGprShiftMask32 or using the
ShiftMaskXLen/ShiftMask32 ComplexPattern direclty in patterns.

This avoids various casts that were need to make a ComplexPattern work
inside of a PatFrag.
Variant part, represented by `DW_TAG_variant_part` is a structure with a
discriminant and different variants, from which only one can be active
and valid at the same time. The discriminant is the main difference
between variant parts and unions represented by `DW_TAG_union` type.

Variant parts are used by Rust enums, which look like:

```rust
pub enum MyEnum {
    First { a: u32, b: i32 },
    Second(u32),
}
```

This type's debug info is the following `DICompositeType` with
`DW_TAG_structure_type` tag:

```llvm
!4 = !DICompositeType(tag: DW_TAG_structure_type, name: "MyEnum",
     scope: !2, file: !5, size: 96, align: 32, flags: DIFlagPublic,
     elements: !6, templateParams: !16,
     identifier: "faba668fd9f71e9b7cf3b9ac5e8b93cb")
```

With one element being also a `DICompositeType`, but with
`DW_TAG_variant_part` tag:

```llvm
!6 = !{!7}
!7 = !DICompositeType(tag: DW_TAG_variant_part, scope: !4, file: !5,
     size: 96, align: 32, elements: !8, templateParams: !16,
     identifier: "e4aee046fc86d111657622fdcb8c42f7", discriminator: !21)
```

Which has a discriminator:

```llvm
!21 = !DIDerivedType(tag: DW_TAG_member, scope: !4, file: !5,
      baseType: !13, size: 32, align: 32, flags: DIFlagArtificial)
```

Which then holds different variants as `DIDerivedType` elements with
`DW_TAG_member` tag:

```llvm
!8 = !{!9, !17}
!9 = !DIDerivedType(tag: DW_TAG_member, name: "First", scope: !7,
     file: !5, baseType: !10, size: 96, align: 32, extraData: i32 0)
!10 = !DICompositeType(tag: DW_TAG_structure_type, name: "First",
      scope: !4, file: !5, size: 96, align: 32, flags: DIFlagPublic,
      elements: !11, templateParams: !16,
      identifier: "cc7748c842e275452db4205b190c8ff7")
!11 = !{!12, !14}
!12 = !DIDerivedType(tag: DW_TAG_member, name: "a", scope: !10,
      file: !5, baseType: !13, size: 32, align: 32, offset: 32,
      flags: DIFlagPublic)
!13 = !DIBasicType(name: "u32", size: 32, encoding: DW_ATE_unsigned)
!14 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !10,
      file: !5, baseType: !15, size: 32, align: 32, offset: 64,
      flags: DIFlagPublic)
!15 = !DIBasicType(name: "i32", size: 32, encoding: DW_ATE_signed)
!16 = !{}
!17 = !DIDerivedType(tag: DW_TAG_member, name: "Second", scope: !7,
      file: !5, baseType: !18, size: 96, align: 32, extraData: i32 1)
!18 = !DICompositeType(tag: DW_TAG_structure_type, name: "Second",
      scope: !4, file: !5, size: 96, align: 32, flags: DIFlagPublic,
      elements: !19, templateParams: !16,
      identifier: "a2094b1381f3082d504fbd0903aa7c06")
!19 = !{!20}
!20 = !DIDerivedType(tag: DW_TAG_member, name: "__0", scope: !18,
      file: !5, baseType: !13, size: 32, align: 32, offset: 32,
      flags: DIFlagPublic)
```

BPF backend was assuming that all the elements of any `DICompositeType`
have tag `DW_TAG_member` and are instances of `DIDerivedType`. However,
the single element of the outer composite type `!4` has tag
`DW_TAG_variant_part` and is an instance of `DICompositeType`. The
unconditional call of `cast<DIDerivedType>` on all elements was causing
an assertion failure when any Rust code with enums was compiled to the
BPF target.

Fix that by:

* Handling `DW_TAG_variant_part` in `visitStructType`.
* Replacing unconditional call of `cast<DIDerivedType>` over
`DICompositeType` elements with a `switch` statement, handling both
`DW_TAG_member` and `DW_TAG_variant_part` and casting the element to an
appropriate type (`DIDerivedType` or `DICompositeType`).

Fixes: llvm#155778
Add `try_lock` to confirm to Lockable, which is necessary to use it with
`std::scoped_lock`.
Having taken on a maintainer role for these dialects, make it official
with a CODEOWNERS entry.

---------

Co-authored-by: Jakub Kuderski <[email protected]>
Suggest the `initializer_list` overload instead.

4+ args is an arbitrary number that allows for incremental deprecation
without having too update too many call sites.

For more context, see llvm#163117.
This patch implements llvm::countr_zero_constexpr, a constexpr version
of llvm::countr_zero, in terms of llvm::popcount while making
llvm::popcount a constexpr function at the same time.

The new function is intended to serve as a marker.  When we switch to
C++20, we will most likely go through functions in llvm/ADT/bit.h and
replace them with their counterparts from <bit>.  With
llvm::countr_zero_constexpr, we can easily replace its use with
std::countr_zero.

This patch reimplements ConstantLog2 in terms of the new function.
This rewrite does not preserve numerics: for example, we'd expect the
maximum fp value to yield Inf instead of identity.

`GL.Length` does not allow for fast math flags, so we need to remove
this. Special cases (constants) can be handled via a folder if someone
wants to implement one.
)

These two files were left during the upstream of the corresponding
feature.
Add builders on the Python side that match builders in the C++ side, add tests for launching GPU kernels and regions, and correct some small documentation mistakes. This reflects the API decisions already made in the func dialect's Python bindings and makes use of the GPU dialect's bindings work more similar to C++ interface.
…ns (llvm#163863)

Before the patch the added test case would indent the function and
moving its second line beyond the column limit.

Fixes llvm#68122.
As the Cygwin platform requires $PATH to be set in order to run
unittests, do the same as for the regular Windows target.
…lvm#164039)

Two of the tests are currently asserting, and two are emitting
unexpected results.

The asserting tests will be fixed using the ATTACH-style codegen from
llvm#153683.

The other two involve `use_device_addr` on byrefs, and need more
follow-up codegen changes, that have been noted in a FIXME comment.
…ion variable (llvm#164147)

`@SHLIBDIR@` is replaced by CMake's configuration function, so it must
be in `lit.site.cfg.py.in` but not `lit.cfg.py`. `lit.cfg.py` must
reference variables in generated `lit.site.cfg.py`.

We didn't notice this problem because it only affects Windows (including
MinGW and Cygwin) that are configured with either
LLVM_LINK_LLVM_DYLIB=ON or BUILD_SHARED=ON.
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
* How objects of various size are store on disk and are referenced by
  the trie nodes.
* How to store the references from one stored object to another object
  that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.
Move the parse tree utility function
semantics::getDesignatorNameIfDataRef to Parser/tools.h and rename it to
comply with the local style.
This fixes a build error when building tensorflow on riscv64 linux.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment