-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Failures are encontered when tuning a small example with tensorization.
This also affects the work in PR#18182 where this issue was discovered.
Reproducer
- Here is a standalone minimal program reproducing the issue: x86-dense-relax-metaschedule.py
- Swapping out L146 (enabling) and L151(disabling) makes the program running fine, tensorizations are also working.
On success, the program will find and print the tensorized IR from the tuning database.
On failure (with many obvious errors inside tuner) the program finds zero tensorized schedules.
Using latest HEAD tvm (as in the mentioned PR) with llvm-20.
Descriptions
- It seems to be a subtle difference in IR originating from
bb.emit_te(tvm.topi.nn.dense)vsbb.emit(relax.op.matmul).
Investigation
- Investigatiated but found only very subtle naming difference, betweeen the good/bad IR:
$ diff -Nru relax-ir-te.txt relax-ir-pp.txt
--- relax-ir-te.txt 2025-08-22 21:42:34.567064474 +0300
+++ relax-ir-pp.txt 2025-08-22 21:46:34.309096420 +0300
@@ -5,22 +5,22 @@
@I.ir_module
class Module:
@T.prim_func(private=True)
- def dense1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), T_matmul_NT: T.Buffer((T.int64(4), T.int64(4)), "int32")):
+ def matmul1(data: T.Buffer((T.int64(4), T.int64(4)), "uint8"), weight: T.Buffer((T.int64(4), T.int64(4)), "int8"), matmul: T.Buffer((T.int64(4), T.int64(4)), "int32")):
T.func_attr({"layout_free_buffers": [], "op_pattern": 4, "tir.noalias": True})
# with T.block("root"):
for i0, i1, k in T.grid(T.int64(4), T.int64(4), T.int64(4)):
- with T.block("T_matmul_NT"):
+ with T.block("matmul"):
v_i0, v_i1, v_k = T.axis.remap("SSR", [i0, i1, k])
- T.reads(data[v_i0, v_k], weight[v_i1, v_k])
- T.writes(T_matmul_NT[v_i0, v_i1])
+ T.reads(data[v_i0, v_k], weight[v_k, v_i1])
+ T.writes(matmul[v_i0, v_i1])
with T.init():
- T_matmul_NT[v_i0, v_i1] = 0
- T_matmul_NT[v_i0, v_i1] = T_matmul_NT[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_i1, v_k])
+ matmul[v_i0, v_i1] = 0
+ matmul[v_i0, v_i1] = matmul[v_i0, v_i1] + T.Cast("int32", data[v_i0, v_k]) * T.Cast("int32", weight[v_k, v_i1])
@R.function
def main(data: R.Tensor((4, 4), dtype="uint8"), weight: R.Tensor((4, 4), dtype="int8")) -> R.Tensor((4, 4), dtype="int32"):
cls = Module
with R.dataflow():
- gv = R.call_tir(cls.dense1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32"))
+ gv = R.call_tir(cls.matmul1, (data, weight), out_sinfo=R.Tensor((4, 4), dtype="int32"))
R.output(gv)
return gv
I am seeing only naming differences, no structural or other differences of IR
Looking at the differences [(-)good, (+)bad] I cannot tell what causes tuner failure issues.
Also, there are strange IR parsing related issues like IndexError: Variable is not defined in the environment: vi
I incline that there is a missing pass (beyond static_shape_tuning), but which one ?
Or is there a bug related to TIR processing ?
Cc @MasterJH5574 @Hzfengsy @mshr-h @tqchen
Cc @fzi-peccia
Thank you !
cc @junrushao