【feature】support n parameter #4273

kxz2002 · 2025-09-25T13:19:02Z

Support adding the parameter n to the request to retrieve multiple model responses.

paddle-bot · 2025-09-25T13:19:09Z

Thanks for your contribution!

CLAassistant · 2025-09-26T02:15:09Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ kxz2002
✅ gzy19990617
❌ LiqinruiG
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

luukunn · 2025-09-28T01:58:08Z

fastdeploy/entrypoints/engine_client.py

+            request["prompt_token_ids_len"] = len(request["prompt_token_ids"])
+            input_ids_len = request["prompt_token_ids_len"]
+            request["max_tokens"] = min(self.max_model_len - input_ids_len, request.get("max_tokens"))
+            if request.get("reasoning_max_tokens", None) is None:


这里reasoning_max_tokens的逻辑去掉吧

luukunn · 2025-09-28T02:38:30Z

fastdeploy/entrypoints/openai/serving_chat.py

        chunk_object_type: str = "chat.completion.chunk"
-        first_iteration = True
-        previous_num_tokens = 0
+        n_param = request.n if request.n is not None else 1


直接用num_choices就可以，不用再加一个n_param，

luukunn · 2025-09-28T02:39:29Z

fastdeploy/entrypoints/openai/serving_chat.py

+                        first_iteration[idx] = False

                    output = res["outputs"]
+                    reasoning_content = output["reasoning_content"]


这行删掉

luukunn · 2025-09-28T02:40:05Z

fastdeploy/entrypoints/openai/serving_chat.py


                    delta_message = DeltaMessage(
-                        reasoning_content="",
+                        reasoning_content=reasoning_content,


这行也改成reasoning_content=""

luukunn · 2025-09-28T02:45:32Z

fastdeploy/entrypoints/openai/serving_chat.py

-            prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=final_res.get("num_cached_tokens", 0)),
+            prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=sum(num_cached_tokens)),
        )
-        work_process_metrics.e2e_request_latency.observe(time.time() - final_res["metrics"]["request_start_time"])


这行挪到循环里面就行，不用记latency

luukunn · 2025-09-28T02:46:42Z

fastdeploy/entrypoints/openai/serving_completion.py

            request_prompts = request_prompt_ids

-        num_choices = len(request_prompts)
+        num_choices = len(request_prompts) * request.n


request.n需要判空，下面的n_param = current_req_dict.get("n", 1)可以挪到上面来，用n_param去乘

luukunn · 2025-09-28T02:49:36Z

fastdeploy/entrypoints/openai/serving_completion.py

            try:
                for idx, prompt in enumerate(request_prompts):
-                    request_id_idx = f"{request_id}-{idx}"
+                    request_id_idx = f"{request_id}"


这行可以删掉

luukunn · 2025-09-28T02:49:48Z

fastdeploy/entrypoints/openai/serving_completion.py

                for idx, prompt in enumerate(request_prompts):
-                    request_id_idx = f"{request_id}-{idx}"
+                    request_id_idx = f"{request_id}"
                    current_req_dict = request.to_dict_for_infer(request_id_idx, prompt)


这里参数直接传request_id

LiqinruiG · 2025-10-14T07:31:35Z

fastdeploy/entrypoints/openai/serving_chat.py

                    else:
                        arrival_time = res["metrics"]["arrival_time"] - inference_start_time
-                    if first_iteration:
+                    if first_iteration[idx]:


下面已经是 for i in range(num_choices)了， first_iteration 不需要定义成数组了。

LiqinruiG · 2025-10-14T08:56:24Z

fastdeploy/entrypoints/openai/serving_chat.py

+                        )
+                        has_no_token_limit = request.max_tokens is None and request.max_completion_tokens is None
+                        max_tokens = request.max_completion_tokens or request.max_tokens
+                        # output 可能没有num_cached_tokens字段


不使用中文注释

LiqinruiG · 2025-10-14T08:58:43Z

fastdeploy/entrypoints/openai/serving_chat.py

                        if logprobs_res and logprobs_res.content is not None:
-                            logprob_contents.extend(logprobs_res.content)
+                            logprob_contents[idx].extend(logprobs_res.content)
                    if data["finished"]:


这个 if 中的代码，建议单独提一个方法，返回一个 choice, append 到 choices 中

LiqinruiG · 2025-10-14T09:23:54Z

fastdeploy/entrypoints/openai/serving_completion.py


            num_prompt_tokens += len(prompt_token_ids)

+        num_prompt_tokens = num_prompt_tokens // request.n


1 if request.n is None else request.n

…n_param

kxz2002 and others added 5 commits September 25, 2025 20:42

support n parameter

62c58f6

Merge branch 'develop' into n_param

3d2eea8

pre-commit check

087070c

Merge branch 'n_param' of github.com:kxz2002/FastDeploy into n_param

db0394b

pre-commit check

4e7f494

paddle-bot bot added the contributor External developers label Sep 25, 2025

Merge branch 'develop' into n_param

162cce1

luukunn reviewed Sep 28, 2025

View reviewed changes

kxz2002 and others added 20 commits September 28, 2025 18:08

restore format_and_add_data

bb586cb

Merge branch 'n_param' of github.com:kxz2002/FastDeploy into n_param

dfd8668

update n_param

8c5c409

bug fix index - str to int

e5fd38e

bug fix del child_task

67955ae

bug fix metrics

6151fb0

add debug info

8fe8ae0

add debug info2

1583caf

remove debug info

0793e1f

change connecting symbol to '-'

d8be7d9

bugfix change connecting symbol

a788c19

bugfix change connecting symbol2

103c1e9

unit tests fix

34a06ac

unit test fix2

26b0af2

Merge branch 'develop' into n_param

f145aa9

unittest add param n=2

f430557

Merge branch 'n_param' of github.com:kxz2002/FastDeploy into n_param

bfcf664

Merge branch 'develop' into n_param

343d0b3

n param add unit tests and adapt to echo

421b8f3

pre-commit fix

912fd4a

Merge branch 'develop' into n_param

ea0abb7

LiqinruiG reviewed Oct 14, 2025

View reviewed changes

kxz2002 and others added 10 commits October 15, 2025 12:47

resolve review

2386be8

adjust stop reason

0816637

add unittest for _create_chat_completion_choice

8e56316

Merge branch 'develop' into n_param

96fa679

modify unittest

7b4e133

Merge branch 'n_param' of https://github.com/kxz2002/FastDeploy into …

5b5a547

…n_param

Merge branch 'develop' into n_param

f0c55c5

solve confict

cd00d7d

solve conflict

9cd50f4

resolve conflict

da72a7c

LiqinruiG approved these changes Oct 17, 2025

View reviewed changes

LiqinruiG merged commit b5b993e into PaddlePaddle:develop Oct 17, 2025
34 of 38 checks passed


		num_prompt_tokens += len(prompt_token_ids)

		num_prompt_tokens = num_prompt_tokens // request.n

【feature】support n parameter #4273

【feature】support n parameter #4273

Uh oh!

Conversation

kxz2002 commented Sep 25, 2025

Uh oh!

paddle-bot bot commented Sep 25, 2025

Uh oh!

CLAassistant commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Sep 26, 2025 •

edited

Loading