Skip to content

Conversation

@iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Nov 10, 2025

What?

This is a continuation of request handling optimization effort started in #982
In this PR 2 things are optimized:

  • Release all (except one) pending requests right after posting
  • Removed nixlUcxIntReq class, moved connection management to nixlUcxBackendH

Performance results

In nixlbench post time decreases 10x for RDMA batch 64k messages of 512B
PR1: #982
PR2: this PR

NIXLBENCH
nixlbench --initiator_seg_type=VRAM --target_seg_type=VRAM --start_block_size=512 --max_block_size=512 --start_batch_size=64000 --max_batch_size=64000 --warmup_iter=10 --num_iter=100 --progress_threads=8 &
 
# Num_threads=8 512:64k cuda_ipc
Branch  Block Size (B)      Batch Size     B/W (GB/Sec)   Avg Lat. (us)  Avg Prep (us)  P99 Prep (us)  Avg Post (us)
main    512                 65000          0.122929       4.2            6022.0         6022.0         121784.0
PR1     512                 65000          0.135798       3.8            6433.0         6433.0         103406.5
PR2     512                 64000          0.124752       4.1            8171.0         8171.0         114922.7
 
# Num_threads=8 512:64k rdma
Branch  Block Size (B)      Batch Size     B/W (GB/Sec)   Avg Lat. (us)  Avg Prep (us)  P99 Prep (us)  Avg Post (us)
main    512                 65000          1.932672       0.3            5540.0         5540.0         13065.2
PR1     512                 65000          2.787826       0.2            5583.0         5583.0         8764.7
PR2     512                 64000          6.710469       0.1            5871.0         5871.0         895.1
SGLANG TTFT
Size  MC     main   PR1    PR2 
1     28     25     22     20
2     47     54     45     45
4     125    129    115    111 
8     243    378    346    332
16    442    699    647    522
32    831    1433   1001   904 
64    1364   2032   1853   1804
128   2583   3869   3611   3060 
256   5232   6618   6083   5501
512   10469  12805  12440  10170
1024  22521  24990  22800  20392
output

@github-actions
Copy link

👋 Hi iyastreb! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@brminich
Copy link
Contributor

/build

@brminich brminich enabled auto-merge (squash) November 18, 2025 09:24
@iyastreb
Copy link
Contributor Author

/build

1 similar comment
@iyastreb
Copy link
Contributor Author

/build

@brminich brminich merged commit d33d8a9 into ai-dynamo:main Nov 18, 2025
21 checks passed
@iyastreb iyastreb deleted the ucx-drop-requests-optimization branch November 18, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants