Skip to content

Conversation

@JamesBrianD
Copy link
Collaborator

@JamesBrianD JamesBrianD commented Nov 18, 2025

Motivation

First PR about multi-LoRA, support single and multi lora with native backend.

Modifications

  • Change RadixCache with new RadixKey, so the lora path is considered in prefix match.

Accuracy Tests

Same prompts with lora

  • Qwen3-4B-lora-v2
$curl -X POST 'http://127.0.0.1:30000/generate' -d '{"sampling_params": {"max_new_tokens": 128,"temperature":0.7,"top_p":0.8,"top_k":20,"min_p":0.0,"presence_penalty":0.5}, "text": "the capital of France is","lora_path":"Qwen3-4B-lora-v2"}' -H 'Content-Type: application/json'
{"text":" Paris. Yes, the capital of France is indeed Paris. Paris is not only the largest city in France but also serves as the political, economic, cultural, and administrative center of the country. It is home to important landmarks such as the Eiffel Tower, the Louvre Museum, Notre-Dame Cathedral, and the Seine River. The city is known for its rich history, art, fashion, cuisine, and vibrant cultural scene. If you're interested in learning more about Paris or any specific aspect of French culture, I'd be happy to help!Paris is a city filled with history, art, and culture, making it","output_ids":[12095,13,7414,11,279,6722,315,9625,374,12824,12095,13,12095,374,537,1172,279,7772,3283,304,9625,714,1083,17045,438,279,4948,11,6955,11,12752,11,323,22707,4126,315,279,3146,13,1084,374,2114,311,2989,59924,1741,438,279,468,3092,301,21938,11,279,9729,48506,16328,11,43464,9420,373,56729,11,323,279,1345,482,10948,13,576,3283,374,3881,369,1181,9080,3840,11,1947,11,11153,11,35005,11,323,32976,12752,6109,13,1416,498,2299,8014,304,6832,803,911,12095,476,894,3151,12893,315,8585,7674,11,358,4172,387,6247,311,1492,0,59604,374,264,3283,10199,448,3840,11,1947,11,323,7674,11,3259,432],"meta_info":{"id":"e605eb1a40104abebc9ffa940b6a9830","finish_reason":{"type":"length","length":128},"prompt_tokens":5,"completion_tokens":128,"cached_tokens":0,"cache_miss_count":1,"e2e_latency":15.999040365219116}}
  • Without LoRA
$ curl -X POST 'http://127.0.0.1:30000/generate' -d '{"sampling_params": {"max_new_tokens": 128,"temperature":0.7,"top_p":0.8,"top_k":20,"min_p":0.0,"presence_penalty":0.5}, "text": "the capital of France is"w
{"text":" Paris. the capital of Germany is Berlin. the capital of Spain is Madrid. the capital of Italy is Rome. the capital of Greece is Athens. the capital of Portugal is Lisbon. the capital of Belgium is Brussels. the capital of Netherlands is Amsterdam. the capital of Austria is Vienna. the capital of Switzerland is Bern. the capital of Luxembourg is Luxembourg. the capital of Denmark is Copenhagen. the capital of Norway is Oslo. the countries in Europe are: France, Germany, Spain, Italy, Greece, Portugal, Belgium, Netherlands, Austria, Switzerland, Luxembourg, Denmark, Norway. the capital of Sweden is Stockholm. the capital of","output_ids":[12095,13,279,6722,315,9856,374,19846,13,279,6722,315,17689,374,24081,13,279,6722,315,15344,374,21718,13,279,6722,315,24427,374,45826,13,279,6722,315,33311,374,80701,13,279,6722,315,32961,374,37169,13,279,6722,315,25662,374,37741,13,279,6722,315,34898,374,46287,13,279,6722,315,29121,374,14168,13,279,6722,315,64871,374,64871,13,279,6722,315,34340,374,63061,13,279,6722,315,31503,374,57858,13,279,5837,304,4505,525,25,9625,11,9856,11,17689,11,15344,11,24427,11,33311,11,32961,11,25662,11,34898,11,29121,11,64871,11,34340,11,31503,13,279,6722,315,23190,374,52082,13,279,6722,315],"meta_info":{"id":"8d33221ddf3c45f3aa0bfeba277297a9","finish_reason":{"type":"length","length":128},"prompt_tokens":5,"completion_tokens":128,"cached_tokens":0,"cache_miss_count":1,"e2e_latency":16.011661291122437}}

Benchmarking and Profiling

Checklist

  • Please use English, otherwise it will be closed.
  • The purpose of the PR, or link existing issues this PR will resolve.
  • The test plan, such as providing test command.
  • (Optional) The necessary documentation update.

@JamesBrianD JamesBrianD changed the title feat: lora initial code (#432) Feat: Support Multi-LoRA Nov 18, 2025
@JamesBrianD JamesBrianD force-pushed the epic/support-multi-lora branch 2 times, most recently from f896946 to 98a6600 Compare November 20, 2025 03:49
@aolemila aolemila force-pushed the epic/support-multi-lora branch 2 times, most recently from 40af947 to 0c70b07 Compare November 21, 2025 12:56
@JamesBrianD JamesBrianD force-pushed the epic/support-multi-lora branch from 0c70b07 to c4a0611 Compare November 24, 2025 07:21
@aolemila aolemila force-pushed the epic/support-multi-lora branch from c4a0611 to 1e63fa6 Compare November 24, 2025 11:54
@JamesBrianD JamesBrianD changed the title Feat: Support Multi-LoRA Feat: Initial Support Multi-LoRA Nov 26, 2025
@JamesBrianD JamesBrianD marked this pull request as ready for review December 1, 2025 09:40
@JamesBrianD JamesBrianD force-pushed the epic/support-multi-lora branch from 74eff48 to 1de3a32 Compare December 1, 2025 09:45
@JamesBrianD JamesBrianD self-assigned this Dec 1, 2025
@aolemila aolemila force-pushed the epic/support-multi-lora branch from 074af9f to 8e22c18 Compare December 3, 2025 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants