Skip to content

Conversation

mark-sed
Copy link
Contributor

@mark-sed mark-sed commented Oct 8, 2025

This patch removes canRotateDeoptimizingLatchExit check from loop roate
and connected multi rotation option for loop roate (-loop-rotate-multi
option).

The heuristic in canRotateDeoptimizingLatchExit returns true if any of
the loop exits are non-deoptimizing. This means if the loop has multiple
deopt exits, then without multi-rotate, we may still end up having deopt
exit at the latch. This multi-rotate option is introduced but it is off
by default. We have not seen any improvements downstream as well, where
we have deoptimizations significantly in IR. Since the original
heuristic without the multi-rotate effectively ends up being "rotate
under some conditions", but the loop may still not be in the form we
want, we should remove the heuristic and multi-rotate framework entirely

Copy link

github-actions bot commented Oct 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Marek Sedláček (mark-sed)

Changes

After further investigation of improvements to runtime unrolling and to it connected loop rotation (with my attempts in #146540 and #148243) I have looked more into loop rotation and the existing profitability checks.

The canRotateDeoptimizingLatchExit check does not seem to work as intended. It was introduced in https://gitlab.azulsystems.com/dev/orca/-/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3 for the loop-rotate-multi-flag, where this flag was disabled and there was an intent to enable it by default later (https://reviews.llvm.org/D73058) and this has not happen even 5 years later.

I have done multiple experiments in our downstream with multi-rotate and with this check. We suggest removal of this heuristic and multi-rotate as well.

Note that the diff is big, but it's just removal of while loop and indentation change.

After this patch I would like to continue here and propose adding a computability check for exit count, but that will be in a separate PR.

Requests for review: @annamthomas @fhahn @davemgreen


Patch is 61.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162482.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+437-504)
  • (removed) llvm/test/Transforms/LoopRotate/multiple-deopt-exits.ll (-164)
  • (removed) llvm/test/Transforms/LoopRotate/multiple-exits.ll (-236)
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 7cc9ff8b11139..f0f7bbb64da32 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -45,12 +45,6 @@ STATISTIC(NumInstrsHoisted,
           "Number of instructions hoisted into loop preheader");
 STATISTIC(NumInstrsDuplicated,
           "Number of instructions cloned into loop preheader");
-STATISTIC(NumRotated, "Number of loops rotated");
-
-static cl::opt<bool>
-    MultiRotate("loop-rotate-multi", cl::init(false), cl::Hidden,
-                cl::desc("Allow loop rotation multiple times in order to reach "
-                         "a better latch exit"));
 
 // Probability that a rotated loop has zero trip count / is never entered.
 static constexpr uint32_t ZeroTripCountWeights[] = {1, 127};
@@ -206,50 +200,6 @@ static bool profitableToRotateLoopExitingLatch(Loop *L) {
   return false;
 }
 
-// Check that latch exit is deoptimizing (which means - very unlikely to happen)
-// and there is another exit from the loop which is non-deoptimizing.
-// If we rotate latch to that exit our loop has a better chance of being fully
-// canonical.
-//
-// It can give false positives in some rare cases.
-static bool canRotateDeoptimizingLatchExit(Loop *L) {
-  BasicBlock *Latch = L->getLoopLatch();
-  assert(Latch && "need latch");
-  BranchInst *BI = dyn_cast<BranchInst>(Latch->getTerminator());
-  // Need normal exiting latch.
-  if (!BI || !BI->isConditional())
-    return false;
-
-  BasicBlock *Exit = BI->getSuccessor(1);
-  if (L->contains(Exit))
-    Exit = BI->getSuccessor(0);
-
-  // Latch exit is non-deoptimizing, no need to rotate.
-  if (!Exit->getPostdominatingDeoptimizeCall())
-    return false;
-
-  SmallVector<BasicBlock *, 4> Exits;
-  L->getUniqueExitBlocks(Exits);
-  if (!Exits.empty()) {
-    // There is at least one non-deoptimizing exit.
-    //
-    // Note, that BasicBlock::getPostdominatingDeoptimizeCall is not exact,
-    // as it can conservatively return false for deoptimizing exits with
-    // complex enough control flow down to deoptimize call.
-    //
-    // That means here we can report success for a case where
-    // all exits are deoptimizing but one of them has complex enough
-    // control flow (e.g. with loops).
-    //
-    // That should be a very rare case and false positives for this function
-    // have compile-time effect only.
-    return any_of(Exits, [](const BasicBlock *BB) {
-      return !BB->getPostdominatingDeoptimizeCall();
-    });
-  }
-  return false;
-}
-
 static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
                                 bool HasConditionalPreHeader,
                                 bool SuccsSwapped) {
@@ -387,506 +337,489 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
 /// rotation. LoopRotate should be repeatable and converge to a canonical
 /// form. This property is satisfied because simplifying the loop latch can only
 /// happen once across multiple invocations of the LoopRotate pass.
-///
-/// If -loop-rotate-multi is enabled we can do multiple rotations in one go
-/// so to reach a suitable (non-deoptimizing) exit.
 bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
   // If the loop has only one block then there is not much to rotate.
   if (L->getBlocks().size() == 1)
     return false;
 
   bool Rotated = false;
-  do {
-    BasicBlock *OrigHeader = L->getHeader();
-    BasicBlock *OrigLatch = L->getLoopLatch();
-
-    BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
-    if (!BI || BI->isUnconditional())
+  BasicBlock *OrigHeader = L->getHeader();
+  BasicBlock *OrigLatch = L->getLoopLatch();
+
+  BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+  if (!BI || BI->isUnconditional())
+    return Rotated;
+
+  // If the loop header is not one of the loop exiting blocks then
+  // either this loop is already rotated or it is not
+  // suitable for loop rotation transformations.
+  if (!L->isLoopExiting(OrigHeader))
+    return Rotated;
+
+  // If the loop latch already contains a branch that leaves the loop then the
+  // loop is already rotated.
+  if (!OrigLatch)
+    return Rotated;
+
+  // Rotate if the loop latch was just simplified. Or if it makes the loop exit
+  // count computable. Or if we think it will be profitable.
+  if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
+      !profitableToRotateLoopExitingLatch(L))
+    return Rotated;
+
+  // Check size of original header and reject loop if it is very big or we can't
+  // duplicate blocks inside it.
+  {
+    SmallPtrSet<const Value *, 32> EphValues;
+    CodeMetrics::collectEphemeralValues(L, AC, EphValues);
+
+    CodeMetrics Metrics;
+    Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
+    if (Metrics.notDuplicatable) {
+      LLVM_DEBUG(
+                  dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
+                  << " instructions: ";
+                  L->dump());
       return Rotated;
-
-    // If the loop header is not one of the loop exiting blocks then
-    // either this loop is already rotated or it is not
-    // suitable for loop rotation transformations.
-    if (!L->isLoopExiting(OrigHeader))
-      return Rotated;
-
-    // If the loop latch already contains a branch that leaves the loop then the
-    // loop is already rotated.
-    if (!OrigLatch)
+    }
+    if (Metrics.Convergence != ConvergenceKind::None) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
+                  "instructions: ";
+                  L->dump());
       return Rotated;
-
-    // Rotate if either the loop latch does *not* exit the loop, or if the loop
-    // latch was just simplified. Or if we think it will be profitable.
-    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
-        !profitableToRotateLoopExitingLatch(L) &&
-        !canRotateDeoptimizingLatchExit(L))
+    }
+    if (!Metrics.NumInsts.isValid()) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
+                  " with invalid cost: ";
+                  L->dump());
       return Rotated;
-
-    // Check size of original header and reject loop if it is very big or we can't
-    // duplicate blocks inside it.
-    {
-      SmallPtrSet<const Value *, 32> EphValues;
-      CodeMetrics::collectEphemeralValues(L, AC, EphValues);
-
-      CodeMetrics Metrics;
-      Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
-      if (Metrics.notDuplicatable) {
-        LLVM_DEBUG(
-                   dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
-                   << " instructions: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (Metrics.Convergence != ConvergenceKind::None) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
-                   "instructions: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (!Metrics.NumInsts.isValid()) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
-                   " with invalid cost: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (Metrics.NumInsts > MaxHeaderSize) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
-                          << Metrics.NumInsts
-                          << " instructions, which is more than the threshold ("
-                          << MaxHeaderSize << " instructions): ";
-                   L->dump());
-        ++NumNotRotatedDueToHeaderSize;
-        return Rotated;
-      }
-
-      // When preparing for LTO, avoid rotating loops with calls that could be
-      // inlined during the LTO stage.
-      if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
-        return Rotated;
     }
-
-    // Now, this loop is suitable for rotation.
-    BasicBlock *OrigPreheader = L->getLoopPreheader();
-
-    // If the loop could not be converted to canonical form, it must have an
-    // indirectbr in it, just give up.
-    if (!OrigPreheader || !L->hasDedicatedExits())
+    if (Metrics.NumInsts > MaxHeaderSize) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
+                        << Metrics.NumInsts
+                        << " instructions, which is more than the threshold ("
+                        << MaxHeaderSize << " instructions): ";
+                  L->dump());
+      ++NumNotRotatedDueToHeaderSize;
       return Rotated;
-
-    // Anything ScalarEvolution may know about this loop or the PHI nodes
-    // in its header will soon be invalidated. We should also invalidate
-    // all outer loops because insertion and deletion of blocks that happens
-    // during the rotation may violate invariants related to backedge taken
-    // infos in them.
-    if (SE) {
-      SE->forgetTopmostLoop(L);
-      // We may hoist some instructions out of loop. In case if they were cached
-      // as "loop variant" or "loop computable", these caches must be dropped.
-      // We also may fold basic blocks, so cached block dispositions also need
-      // to be dropped.
-      SE->forgetBlockAndLoopDispositions();
     }
 
-    LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
-    if (MSSAU && VerifyMemorySSA)
-      MSSAU->getMemorySSA()->verifyMemorySSA();
-
-    // Find new Loop header. NewHeader is a Header's one and only successor
-    // that is inside loop.  Header's other successor is outside the
-    // loop.  Otherwise loop is not suitable for rotation.
-    BasicBlock *Exit = BI->getSuccessor(0);
-    BasicBlock *NewHeader = BI->getSuccessor(1);
-    bool BISuccsSwapped = L->contains(Exit);
-    if (BISuccsSwapped)
-      std::swap(Exit, NewHeader);
-    assert(NewHeader && "Unable to determine new loop header");
-    assert(L->contains(NewHeader) && !L->contains(Exit) &&
-           "Unable to determine loop header and exit blocks");
-
-    // This code assumes that the new header has exactly one predecessor.
-    // Remove any single-entry PHI nodes in it.
-    assert(NewHeader->getSinglePredecessor() &&
-           "New header doesn't have one pred!");
-    FoldSingleEntryPHINodes(NewHeader);
-
-    // Begin by walking OrigHeader and populating ValueMap with an entry for
-    // each Instruction.
-    BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
-    ValueToValueMapTy ValueMap, ValueMapMSSA;
-
-    // For PHI nodes, the value available in OldPreHeader is just the
-    // incoming value from OldPreHeader.
-    for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
-      InsertNewValueIntoMap(ValueMap, PN,
-                            PN->getIncomingValueForBlock(OrigPreheader));
-
-    // For the rest of the instructions, either hoist to the OrigPreheader if
-    // possible or create a clone in the OldPreHeader if not.
-    Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
-
-    // Record all debug records preceding LoopEntryBranch to avoid
-    // duplication.
-    using DbgHash =
-        std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
-    auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
-      auto VarLocOps = D->location_ops();
-      return {{hash_combine_range(VarLocOps), D->getVariable()},
-              D->getExpression()};
-    };
-
-    SmallDenseSet<DbgHash, 8> DbgRecords;
-    // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
-    // terminator.
-    for (const DbgVariableRecord &DVR :
-         filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
-      DbgRecords.insert(makeHash(&DVR));
-
-    // Remember the local noalias scope declarations in the header. After the
-    // rotation, they must be duplicated and the scope must be cloned. This
-    // avoids unwanted interaction across iterations.
-    SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
-    for (Instruction &I : *OrigHeader)
-      if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
-        NoAliasDeclInstructions.push_back(Decl);
-
-    Module *M = OrigHeader->getModule();
-
-    // Track the next DbgRecord to clone. If we have a sequence where an
-    // instruction is hoisted instead of being cloned:
-    //    DbgRecord blah
-    //    %foo = add i32 0, 0
-    //    DbgRecord xyzzy
-    //    %bar = call i32 @foobar()
-    // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
-    // attached to %foo, then when %foo his hoisted it will "fall down" onto the
-    // function call:
-    //    DbgRecord blah
-    //    DbgRecord xyzzy
-    //    %bar = call i32 @foobar()
-    // causing it to appear attached to the call too.
-    //
-    // To avoid this, cloneDebugInfoFrom takes an optional "start cloning from
-    // here" position to account for this behaviour. We point it at any
-    // DbgRecords on the next instruction, here labelled xyzzy, before we hoist
-    // %foo. Later, we only only clone DbgRecords from that position (xyzzy)
-    // onwards, which avoids cloning DbgRecord "blah" multiple times. (Stored as
-    // a range because it gives us a natural way of testing whether
-    //  there were DbgRecords on the next instruction before we hoisted things).
-    iterator_range<DbgRecord::self_iterator> NextDbgInsts =
-        (I != E) ? I->getDbgRecordRange() : DbgMarker::getEmptyDbgRecordRange();
-
-    while (I != E) {
-      Instruction *Inst = &*I++;
-
-      // If the instruction's operands are invariant and it doesn't read or write
-      // memory, then it is safe to hoist.  Doing this doesn't change the order of
-      // execution in the preheader, but does prevent the instruction from
-      // executing in each iteration of the loop.  This means it is safe to hoist
-      // something that might trap, but isn't safe to hoist something that reads
-      // memory (without proving that the loop doesn't write).
-      if (L->hasLoopInvariantOperands(Inst) && !Inst->mayReadFromMemory() &&
-          !Inst->mayWriteToMemory() && !Inst->isTerminator() &&
-          !isa<AllocaInst>(Inst) &&
-          // It is not safe to hoist the value of these instructions in
-          // coroutines, as the addresses of otherwise eligible variables (e.g.
-          // thread-local variables and errno) may change if the coroutine is
-          // resumed in a different thread.Therefore, we disable this
-          // optimization for correctness. However, this may block other correct
-          // optimizations.
-          // FIXME: This should be reverted once we have a better model for
-          // memory access in coroutines.
-          !Inst->getFunction()->isPresplitCoroutine()) {
-
-        if (!NextDbgInsts.empty()) {
-          auto DbgValueRange =
-              LoopEntryBranch->cloneDebugInfoFrom(Inst, NextDbgInsts.begin());
-          RemapDbgRecordRange(M, DbgValueRange, ValueMap,
-                              RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
-          // Erase anything we've seen before.
-          for (DbgVariableRecord &DVR :
-               make_early_inc_range(filterDbgVars(DbgValueRange)))
-            if (DbgRecords.count(makeHash(&DVR)))
-              DVR.eraseFromParent();
-        }
-
-        NextDbgInsts = I->getDbgRecordRange();
-
-        Inst->moveBefore(LoopEntryBranch->getIterator());
+    // When preparing for LTO, avoid rotating loops with calls that could be
+    // inlined during the LTO stage.
+    if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
+      return Rotated;
+  }
 
-        ++NumInstrsHoisted;
-        continue;
-      }
+  // Now, this loop is suitable for rotation.
+  BasicBlock *OrigPreheader = L->getLoopPreheader();
+
+  // If the loop could not be converted to canonical form, it must have an
+  // indirectbr in it, just give up.
+  if (!OrigPreheader || !L->hasDedicatedExits())
+    return Rotated;
+
+  // Anything ScalarEvolution may know about this loop or the PHI nodes
+  // in its header will soon be invalidated. We should also invalidate
+  // all outer loops because insertion and deletion of blocks that happens
+  // during the rotation may violate invariants related to backedge taken
+  // infos in them.
+  if (SE) {
+    SE->forgetTopmostLoop(L);
+    // We may hoist some instructions out of loop. In case if they were cached
+    // as "loop variant" or "loop computable", these caches must be dropped.
+    // We also may fold basic blocks, so cached block dispositions also need
+    // to be dropped.
+    SE->forgetBlockAndLoopDispositions();
+  }
 
-      // Otherwise, create a duplicate of the instruction.
-      Instruction *C = Inst->clone();
-      if (const DebugLoc &DL = C->getDebugLoc())
-        mapAtomInstance(DL, ValueMap);
+  LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
+  if (MSSAU && VerifyMemorySSA)
+    MSSAU->getMemorySSA()->verifyMemorySSA();
 
-      C->insertBefore(LoopEntryBranch->getIterator());
+  // Find new Loop header. NewHeader is a Header's one and only successor
+  // that is inside loop.  Header's other successor is outside the
+  // loop.  Otherwise loop is not suitable for rotation.
+  BasicBlock *Exit = BI->getSuccessor(0);
+  BasicBlock *NewHeader = BI->getSuccessor(1);
+  bool BISuccsSwapped = L->contains(Exit);
+  if (BISuccsSwapped)
+    std::swap(Exit, NewHeader);
+  assert(NewHeader && "Unable to determine new loop header");
+  assert(L->contains(NewHeader) && !L->contains(Exit) &&
+          "Unable to determine loop header and exit blocks");
+
+  // This code assumes that the new header has exactly one predecessor.
+  // Remove any single-entry PHI nodes in it.
+  assert(NewHeader->getSinglePredecessor() &&
+          "New header doesn't have one pred!");
+  FoldSingleEntryPHINodes(NewHeader);
+
+  // Begin by walking OrigHeader and populating ValueMap with an entry for
+  // each Instruction.
+  BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
+  ValueToValueMapTy ValueMap, ValueMapMSSA;
+
+  // For PHI nodes, the value available in OldPreHeader is just the
+  // incoming value from OldPreHeader.
+  for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
+    InsertNewValueIntoMap(ValueMap, PN,
+                          PN->getIncomingValueForBlock(OrigPreheader));
+
+  // For the rest of the instructions, either hoist to the OrigPreheader if
+  // possible or create a clone in the OldPreHeader if not.
+  Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
+
+  // Record all debug records preceding LoopEntryBranch to avoid
+  // duplication.
+  using DbgHash =
+      std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
+  auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
+    auto VarLocOps = D->location_ops();
+    return {{hash_combine_range(VarLocOps), D->getVariable()},
+            D->getExpression()};
+  };
 
-      ++NumInstrsDuplicated;
+  SmallDenseSet<DbgHash, 8> DbgRecords;
+  // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
+  // terminator.
+  for (const DbgVariableRecord &DVR :
+        filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
+    DbgRecords.insert(makeHash(&DVR));
+
+  // Remember the local noalias scope declarations in the header. After the
+  // rotation, they must be duplicated and the scope must be cloned. This
+  // avoids unwanted interaction across iterations.
+  SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
+  for (Instruction &I : *OrigHeader)
+    if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
+      NoAliasDeclInstructions.push_back(Decl);
+
+  Module *M = OrigHeader->getModule();
+
+  // Track the next DbgRecord to clone. If we have a sequence where an
+  // instruction is hoisted instead of being cloned:
+  //    DbgRecord blah
+  //    %foo = add i32 0, 0
+  //    DbgRecord xyzzy
+  //    %bar = call i32 @foobar()
+  // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
+  // attached to %foo, then when %foo his hoisted it will "fall down" onto the
+  // function call:
+  //    DbgRecord blah
+  //    DbgRecord xyzzy
+  //    %bar = call i32 @foobar()
+  // causing it to appear...
[truncated]

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not making use of this downstream, then I agree that we should remove this code.

@nikic
Copy link
Contributor

nikic commented Oct 8, 2025

The canRotateDeoptimizingLatchExit check does not seem to work as intended. It was introduced in https://gitlab.azulsystems.com/dev/orca/-/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3 for the loop-rotate-multi-flag, where this flag was disabled and there was an intent to enable it by default later (https://reviews.llvm.org/D73058) and this has not happen even 5 years later.

Note that the first reference here is non-public, was that intentional?

Copy link

github-actions bot commented Oct 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@mark-sed
Copy link
Contributor Author

mark-sed commented Oct 8, 2025

Note that the first reference here is non-public, was that intentional?

No a mistake on my side, which I caught, but you were faster. Sorry for that

@annamthomas
Copy link
Contributor

The canRotateDeoptimizingLatchExit check does not seem to work as intended. It was introduced in https://gitlab.azulsystems.com/dev/orca/-/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3 for the loop-rotate-multi-flag, where this flag was disabled and there was an intent to enable it by default later (https://reviews.llvm.org/D73058) and this has not happen even 5 years later.

Note that the first reference here is non-public, was that intentional?

@mark-sed pls note above comment! Pls reference the original LLVM Phabricator review (or llvm commit). Also, pls update the commit MR to state why exactly the current heuristic doesn't work as intended without the multi-rotate option (thats an important detail apart from the fact that the multi-rotate option is off by default).

"The heuristic in canRotateDeoptimizingLatchExit returns true if any of the loop exits are non-deoptimizing. This means if the loop has multiple deopt exits, then without multi-rotate, we may still end up having deopt exit at the latch. This multi-rotate option is introduced but it is off by default. We have not seen any improvements downstream as well, where we have deoptimizations significantly in IR. Since the original heuristic without the multi-rotate effectively ends up being "rotate under some conditions", but the loop may still not be in the form we want, we should remove the heuristic and multi-rotate framework entirely"

and connected multi rotation option for loop roate (-loop-rotate-multi
option).

The heuristic in canRotateDeoptimizingLatchExit returns true if any of
the loop exits are non-deoptimizing. This means if the loop has multiple
deopt exits, then without multi-rotate, we may still end up having deopt
exit at the latch. This multi-rotate option is introduced but it is off
by default. We have not seen any improvements downstream as well, where
we have deoptimizations significantly in IR. Since the original
heuristic without the multi-rotate effectively ends up being "rotate
under some conditions", but the loop may still not be in the form we
want, we should remove the heuristic and multi-rotate framework entirely
@mark-sed mark-sed force-pushed the remove-canRotateDeoptimizingLatchExit branch from 9aad318 to ee1bc0a Compare October 8, 2025 14:17
@annamthomas
Copy link
Contributor

@mark-sed does not have commit access. He's asked me to push it for him. I'll enable auto-merge, which should push the changes once CI passes.

@annamthomas annamthomas enabled auto-merge (squash) October 8, 2025 14:29
@nikic
Copy link
Contributor

nikic commented Oct 8, 2025

Note that the commit message is taken from the PR description, so you'll want to add the information you added to the commit message there as well.

@mark-sed
Copy link
Contributor Author

mark-sed commented Oct 8, 2025

Note that the commit message is taken from the PR description, so you'll want to add the information you added to the commit message there as well.

Thank you, I did not realize that. I have updated it.

@annamthomas
Copy link
Contributor

Yep. I realized that. I've already updated the commit message directly when it prompted during auto-merge.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sounds like a great candidate for removal, thanks!

@annamthomas annamthomas merged commit 6ccb487 into llvm:main Oct 8, 2025
9 checks passed
Copy link

github-actions bot commented Oct 8, 2025

@mark-sed Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
…162482)

This patch removes canRotateDeoptimizingLatchExit check from loop roate
and connected multi rotation option for loop roate (-loop-rotate-multi
option).

The heuristic in canRotateDeoptimizingLatchExit returns true if any of
the loop exits are non-deoptimizing. This means if the loop has multiple
deopt exits, then without multi-rotate, we may still end up having deopt
exit at the latch. This multi-rotate option is introduced but it is off
by default. 

We have not seen any improvements downstream as well, where
we have frequent number of deoptimizations in IR. Since the original
heuristic without the multi-rotate effectively ends up being "rotate
under some conditions", but the loop may still not be in the form we
want, we should remove the heuristic and multi-rotate framework entirely

Note that the diff is big, but it's just removal of while loop and
indentation change.

After this patch I would like to continue here and propose adding a
computability check for exit count, but that will be in a separate PR.

Requests for review: @annamthomas @fhahn @davemgreen

Co-authored-by: Marek Sedlacek <[email protected]>
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
…lvm#162482)

This patch removes canRotateDeoptimizingLatchExit check from loop roate
and connected multi rotation option for loop roate (-loop-rotate-multi
option).

The heuristic in canRotateDeoptimizingLatchExit returns true if any of
the loop exits are non-deoptimizing. This means if the loop has multiple
deopt exits, then without multi-rotate, we may still end up having deopt
exit at the latch. This multi-rotate option is introduced but it is off
by default. 

We have not seen any improvements downstream as well, where
we have frequent number of deoptimizations in IR. Since the original
heuristic without the multi-rotate effectively ends up being "rotate
under some conditions", but the loop may still not be in the form we
want, we should remove the heuristic and multi-rotate framework entirely

Note that the diff is big, but it's just removal of while loop and
indentation change.

After this patch I would like to continue here and propose adding a
computability check for exit count, but that will be in a separate PR.

Requests for review: @annamthomas @fhahn @davemgreen

Co-authored-by: Marek Sedlacek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants