-
Notifications
You must be signed in to change notification settings - Fork 583
[Test] Add ACL graph capture/replay DP test #4259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: lilinsiman <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new end-to-end test for ACL graph capture and replay with data parallelism. The changes are a significant improvement, using unittest.mock.patch for cleaner method spying and providing detailed, well-commented assertions for various metrics. The test structure is clear and robust.
My review focuses on ensuring test reliability. I've identified one area for improvement: the use of a hardcoded network port, which could lead to flaky tests in a parallel execution environment. I've suggested using a dynamic port to address this.
3b53853 to
0af82f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive end-to-end test for ACL graph capture and replay functionality in data parallel (DP) mode. This is an improved version of PR #3886 that uses a cleaner testing approach with mock-based spies instead of fragile sys.settrace mechanisms.
Key improvements:
- Implements thread-safe spy installation using
unittest.mock.patchto track NPU method invocations - Adds precise metrics tracking for graph captures, replays, model executions, and dummy runs
- Expands test coverage with multiple
max_tokensvalues (4 and 36) to test different execution paths
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/e2e/multicard/test_aclgraph_capture_replay.py | New test file that validates ACL graph capture/replay behavior in DP mode with comprehensive metrics tracking and assertions |
| .github/workflows/_e2e_test.yaml | Adds the new test to the full e2e test suite execution |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
41e3554 to
3f0daa2
Compare
|
In the future we should break down the template script for launching DP into more granular functions, and then all DP-related UTs can import these functions to reuse code. Specially we should enable the 'main' function to support passing extra parameters and additional patch functions. |
aad0fb6 to
748f733
Compare
Restructures the multi-card ACL graph test for improved clarity, robustness, and accuracy. Key improvements include: - Replaces fragile `sys.settrace` and manual patching with a clean, reusable spy installer using `unittest.mock.patch`. - Introduces more precise metrics by tracking `NPUModelRunner.execute_model` and `_dummy_run` calls directly. - Rewrites assertions to be more accurate and provides clear explanations for the expected counts of graph captures, replays, model executions, and dummy runs. - Simplifies the overall test structure by separating the worker logic into a dedicated function. - Removes a long, unnecessary sleep at the end of the test. - Expands test coverage by adding a larger `max_tokens` parameter. Signed-off-by: Yizhou Liu <[email protected]>
What this PR does / why we need it?
Add ACL graph capture/replay DP test, this is a imprved version of #3886
Restructures the multi-card ACL graph test for improved clarity, robustness, and accuracy.
Key improvements include:
sys.settraceand manual patching with a clean, reusable spy installer usingunittest.mock.patch.NPUModelRunner.execute_modeland_dummy_runcalls directly.max_tokensparameter.Does this PR introduce any user-facing change?
None.
How was this patch tested?
None.