[Bug]: Another issue with Inductor partition codegen for attn+nvfp4 quant fusion

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

Happens on: 
- commit: [3ae0f1936cb4218be90a1d00d29ad55d0a2ccef0](https://github.com/vllm-project/vllm/commit/3ae0f1936cb4218be90a1d00d29ad55d0a2ccef0)
- PRs: #26738 + #24604


Command:
```
pytest tests/compile/test_fusions_e2e.py -s -v

# not tested but should be reproducible on just #26738 with:
python examples/offline_inference/basic/generate.py --model=nvidia/Llama-4-Scout-17B-16E-Instruct-FP4 --kv-cache-dtype=fp8 -O.pass_config='{"enable_noop":true, "enable_attn_fusion": true} -O.use_inductor_graph_partition

# tested, also reproduces:
python examples/offline_inference/basic/generate.py --model RedHatAI/Qwen3-30B-A3B-NVFP4 --kv-cache-dtype=fp8 --no-enable-prefix-caching -O.pass_config='{"enable_attn_fusion":true,"enable_noop":true}' -O.use_inductor_graph_partition=True -O.cudagraph_mode=FULL_AND_PIECEWISE

# this works:
chg run -g=1 -- python examples/offline_inference/basic/generate.py --model RedHatAI/Qwen3-30B-A3B-NVFP4 --kv-cache-dtype=fp8 --no-enable-prefix-caching -O.pass_config='{"enable_attn_fusion":false,"enable_noop":true}' -O.use_inductor_graph_partition=True -O.cudagraph_mode=FULL_AND_PIECEWISE
```

Failing test, note that same model with FP8 quant (`nvidia/Llama-4-Scout-17B-16E-Instruct-FP8) succeeds, also same model works without inductor partition:
```
FAILED tests/compile/test_fusions_e2e.py::test_attn_quant[True-nvidia/Llama-4-Scout-17B-16E-Instruct-FP4-model_kwargs2-_Backend.FLASHINFER-48-96-] - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
============================ 1 failed, 5 passed, 8 skipped, 2 warnings in 832.85s (0:13:52) ============================
```

The last part of the stack trace:
```
(EngineCore_DP0 pid=2090223)   File "/home/ProExpertProg/git/vllm2/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(EngineCore_DP0 pid=2090223)     return compiled_fn(runtime_args)
(EngineCore_DP0 pid=2090223)            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2090223)   File "/home/ProExpertProg/git/vllm2/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 613, in __call__
(EngineCore_DP0 pid=2090223)     return self.current_callable(inputs)
(EngineCore_DP0 pid=2090223)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2090223)   File "/tmp/torchinductor_ProExpertProg/tmp5ccrosm8/3w/c3w6voddnusogdjqye5gnwhaq6h5yvgtn33akkevdv44fkyfjaqn.py", line 34387, in call
(EngineCore_DP0 pid=2090223)     del buf15
(EngineCore_DP0 pid=2090223)         ^^^^^
(EngineCore_DP0 pid=2090223) UnboundLocalError: cannot access local variable 'buf15' where it is not associated with a value
[rank0]:[W1016 00:28:16.398287624 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
FAILED
```

Upon inspection, there's just a `del buf15` in the output code even though `buf15` no longer exists. If I had to guess, it is a leftover relic of a node/tensor corresponding to the unquantized output.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Another issue with Inductor partition codegen for attn+nvfp4 quant fusion #26988

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Another issue with Inductor partition codegen for attn+nvfp4 quant fusion #26988

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions