[Test] Add ACL graph capture/replay DP test #4259

yiz-liu · 2025-11-18T12:34:19Z

What this PR does / why we need it?

Add ACL graph capture/replay DP test, this is a imprved version of #3886

Restructures the multi-card ACL graph test for improved clarity, robustness, and accuracy.

Key improvements include:

Replaces fragile sys.settrace and manual patching with a clean, reusable spy installer using unittest.mock.patch.
Introduces more precise metrics by tracking NPUModelRunner.execute_model and _dummy_run calls directly.
Rewrites assertions to be more accurate and provides clear explanations for the expected counts of graph captures, replays, model executions, and dummy runs.
Simplifies the overall test structure by separating the worker logic into a dedicated function.
Removes a long, unnecessary sleep at the end of the test.
Expands test coverage by adding a larger max_tokens parameter.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

None.

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

Signed-off-by: lilinsiman <[email protected]>

github-actions · 2025-11-18T12:34:27Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a new end-to-end test for ACL graph capture and replay with data parallelism. The changes are a significant improvement, using unittest.mock.patch for cleaner method spying and providing detailed, well-commented assertions for various metrics. The test structure is clear and robust.

My review focuses on ensuring test reliability. I've identified one area for improvement: the use of a hardcoded network port, which could lead to flaky tests in a parallel execution environment. I've suggested using a dynamic port to address this.

tests/e2e/multicard/test_aclgraph_capture_replay.py

Copilot

Pull Request Overview

This PR adds a comprehensive end-to-end test for ACL graph capture and replay functionality in data parallel (DP) mode. This is an improved version of PR #3886 that uses a cleaner testing approach with mock-based spies instead of fragile sys.settrace mechanisms.

Key improvements:

Implements thread-safe spy installation using unittest.mock.patch to track NPU method invocations
Adds precise metrics tracking for graph captures, replays, model executions, and dummy runs
Expands test coverage with multiple max_tokens values (4 and 36) to test different execution paths

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
tests/e2e/multicard/test_aclgraph_capture_replay.py	New test file that validates ACL graph capture/replay behavior in DP mode with comprehensive metrics tracking and assertions
.github/workflows/_e2e_test.yaml	Adds the new test to the full e2e test suite execution

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/e2e/multicard/test_aclgraph_capture_replay.py

whx-sjtu · 2025-11-19T09:08:48Z

In the future we should break down the template script for launching DP into more granular functions, and then all DP-related UTs can import these functions to reuse code. Specially we should enable the 'main' function to support passing extra parameters and additional patch functions.

Restructures the multi-card ACL graph test for improved clarity, robustness, and accuracy. Key improvements include: - Replaces fragile `sys.settrace` and manual patching with a clean, reusable spy installer using `unittest.mock.patch`. - Introduces more precise metrics by tracking `NPUModelRunner.execute_model` and `_dummy_run` calls directly. - Rewrites assertions to be more accurate and provides clear explanations for the expected counts of graph captures, replays, model executions, and dummy runs. - Simplifies the overall test structure by separating the worker logic into a dedicated function. - Removes a long, unnecessary sleep at the end of the test. - Expands test coverage by adding a larger `max_tokens` parameter. Signed-off-by: Yizhou Liu <[email protected]>

add new test case for aclgraph capture and replay

ae09ac2

Signed-off-by: lilinsiman <[email protected]>

github-actions bot added the module:tests label Nov 18, 2025

yiz-liu added ready read for review ready-for-test start test by label for PR labels Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

tests/e2e/multicard/test_aclgraph_capture_replay.py Show resolved Hide resolved

tests/e2e/multicard/test_aclgraph_capture_replay.py Show resolved Hide resolved

yiz-liu force-pushed the graph-test branch 2 times, most recently from 3b53853 to 0af82f1 Compare November 18, 2025 17:13

Copilot AI review requested due to automatic review settings November 18, 2025 17:50

yiz-liu force-pushed the graph-test branch from 0af82f1 to 19408a4 Compare November 18, 2025 17:50

Copilot started reviewing on behalf of yiz-liu November 18, 2025 17:51 View session

Copilot finished reviewing on behalf of yiz-liu November 18, 2025 17:54

Copilot AI reviewed Nov 18, 2025

View reviewed changes

yiz-liu force-pushed the graph-test branch 6 times, most recently from 41e3554 to 3f0daa2 Compare November 19, 2025 09:07

yiz-liu force-pushed the graph-test branch 5 times, most recently from aad0fb6 to 748f733 Compare November 20, 2025 04:32

yiz-liu force-pushed the graph-test branch from 748f733 to 0061f87 Compare November 20, 2025 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Test] Add ACL graph capture/replay DP test #4259

[Test] Add ACL graph capture/replay DP test #4259

yiz-liu commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whx-sjtu commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Test] Add ACL graph capture/replay DP test #4259

Are you sure you want to change the base?

[Test] Add ACL graph capture/replay DP test #4259

Conversation

yiz-liu commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whx-sjtu commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiz-liu commented Nov 18, 2025 •

edited by github-actions bot

Loading