feat(qwen3): support tensor-parallel LoRA adapter loading by NolanHo · Pull Request #190 · xiaguan/pegainfer

NolanHo · 2026-05-28T11:50:02Z

Summary

Part of #173.

This PR extends the Qwen3 LoRA MVP from PR1 to tensor-parallel execution while keeping the same correctness-first scope:

single active adapter per engine
dynamic /v1/load_lora_adapter
adapter loading only when the scheduler is idle
CUDA Graph disabled in LoRA mode
no per-request or mixed-adapter batching yet

The branch is stacked on the PR1 LoRA control/API work until PR1 lands.

What Changed

Added rank-local LoRA adapter sharding for Qwen3 TP:
- q_proj, k_proj, v_proj, gate_proj, up_proj: replicate LoRA A, row-shard LoRA B.
- o_proj, down_proj: column-shard LoRA A, replicate LoRA B.
Updated Qwen3 executor LoRA loading to install the same adapter name on all TP ranks and aggregate per-rank load errors.
Removed the Qwen3 server-side --enable-lora && --tp-size != 1 rejection.
Added --tp-size to tools/qwen3_lora_live_parity.py so the live HF/PEFT parity smoke can exercise TP.
Added unit coverage for TP sharding shape/range behavior.

Scope Boundary

This PR does not add:

multiple active adapters
per-request adapter selection
mixed base + LoRA batching
/v1/unload_lora_adapter
CUDA Graph cache keys per adapter
optimized grouped LoRA kernels

Those remain follow-up work for the later staged LoRA PRs in #173.

Validation

Local checks:

cargo fmt --check
PEGAINFER_CUDA_SM=80 cargo test -p pegainfer-qwen3-4b --lib lora -- --nocapture
PEGAINFER_CUDA_SM=80 cargo test -p pegainfer-qwen3-4b --lib scheduler -- --nocapture
PEGAINFER_CUDA_SM=80 cargo check -p pegainfer-server
python -m py_compile tools/qwen3_lora_live_parity.py
git diff --check

Worker5 TP2 live parity:

Branch commit: ac91cff
Model: Qwen3-4B
Command shape:

python tools/qwen3_lora_live_parity.py \
  --model-path Qwen3-4B \
  --port 18112 \
  --startup-timeout-s 360 \
  --max-tokens 8 \
  --disable-peft-adapter-autocast \
  --tp-size 2

Result:

Server started with --enable-lora --tp-size 2.
/v1/load_lora_adapter returned Success: LoRA adapter 'parity' added successfully.
HF/PEFT and PegaInfer both generated about a young girl named Lila who.
Token IDs matched exactly: [911, 264, 3908, 3743, 6941, 444, 10524, 879].
Result summary had "match": true and "first_token_mismatch": null.

gemini-code-assist

Code Review

This pull request enables tensor-parallel loading of Qwen3 LoRA adapters, removing the previous single-GPU limitation. It introduces adapter sharding logic (shard_for_tensor_parallel) for row-parallel and column-parallel projections, updates the executor to shard and distribute adapters to all workers, and adds corresponding unit and integration tests. Feedback points out a potential desynchronization issue where a sharding failure on a later rank could leave earlier ranks in an inconsistent state. It is recommended to pre-shard the adapters for all ranks before sending any load commands to the workers.

xiaguan

LGTM

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread pegainfer-qwen3-4b/src/executor.rs Outdated

feat(qwen3): shard lora adapters for tp

6432037

NolanHo force-pushed the feat/qwen3-lora-pr2-tp-clean branch from ac91cff to 6432037 Compare May 28, 2026 11:55

xiaguan approved these changes May 28, 2026

View reviewed changes

xiaguan merged commit d08851b into xiaguan:main May 28, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qwen3): support tensor-parallel LoRA adapter loading#190

feat(qwen3): support tensor-parallel LoRA adapter loading#190
xiaguan merged 1 commit into
xiaguan:mainfrom
NolanHo:feat/qwen3-lora-pr2-tp-clean

NolanHo commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

xiaguan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NolanHo commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Scope Boundary

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NolanHo commented May 28, 2026 •

edited

Loading