Skip to content

[Bug][RELAX][PIPELINE] Metaschedule failure due to pipeline passes #18224

@cbalint13

Description

@cbalint13

Failures are encontered when tuning a small example with tensorization.
This also affects the work in PR#18182 where this issue was discovered.


Reproducer

  • Here is a standalone minimal program reproducing the issue: x86-dense-relax-metaschedule.py
  • Swapping out L146 (enabling) and L151(disabling) makes the program running fine, tensorizations are also working.

On success, the program will find and print the tensorized IR from the tuning database.
On failure (with many obvious errors inside tuner) the program finds zero tensorized schedules.

Using latest HEAD tvm (as in the mentioned PR) with llvm-20.

Descriptions

  • It seems to be a subtle difference in IR originating from bb.emit_te(tvm.topi.nn.dense) vs bb.emit(relax.op.matmul).

Investigation

  • Investigatiated but found only very subtle naming difference, betweeen the good/bad IR:
$ diff -Nru relax-ir-te.txt relax-ir-pp.txt 
--- relax-ir-te.txt	2025-08-22 21:42:34.567064474 +0300
+++ relax-ir-pp.txt	2025-08-22 21:46:34.309096420 +0300
@@ -5,22 +5,22 @@
 @I.ir_module
 class Module:
     @T.prim_func(private=True)
-    def dense1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), T_matmul_NT: T.Buffer((T.int64(4), T.int64(4)), "int32")):
+    def matmul1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), matmul: T.Buffer((T.int64(4), T.int64(4)), "int32")):
         T.func_attr({"layout_free_buffers": [], "op_pattern": 4, "tir.noalias": True})
         # with T.block("root"):
         for i0, i1, k in T.grid(T.int64(4), T.int64(4), T.int64(4)):
-            with T.block("T_matmul_NT"):
+            with T.block("matmul"):
                 v_i0, v_i1, v_k = T.axis.remap("SSR", [i0, i1, k])
-                T.reads(data[v_i0, v_k], weight[v_i1, v_k])
-                T.writes(T_matmul_NT[v_i0, v_i1])
+                T.reads(data[v_i0, v_k], weight[v_k, v_i1])
+                T.writes(matmul[v_i0, v_i1])
                 with T.init():
-                    T_matmul_NT[v_i0, v_i1] = 0
-                T_matmul_NT[v_i0, v_i1] = T_matmul_NT[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_i1, v_k])
+                    matmul[v_i0, v_i1] = 0
+                matmul[v_i0, v_i1] = matmul[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_k, v_i1])
 
     @R.function
     def main(data: R.Tensor((4, 4), dtype="uint8"), weight: R.Tensor((4, 4), dtype="int8")) -> R.Tensor((4, 4), dtype="int32"):
         cls = Module
         with R.dataflow():
-            gv = R.call_tir(cls.dense1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32"))
+            gv = R.call_tir(cls.matmul1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32"))
             R.output(gv)
         return gv

I am seeing only naming differences, no structural or other differences of IR
Looking at the differences [(-)good, (+)bad] I cannot tell what causes tuner failure issues.

Also, there are strange IR parsing related issues like IndexError: Variable is not defined in the environment: vi


I incline that there is a missing pass (beyond static_shape_tuning), but which one ?
Or is there a bug related to TIR processing ?

Cc @MasterJH5574 @Hzfengsy @mshr-h @tqchen
Cc @fzi-peccia

Thank you !

cc @junrushao

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions