Skip to content

Conversation

@kxz2002
Copy link
Contributor

@kxz2002 kxz2002 commented Sep 25, 2025

Support adding the parameter n to the request to retrieve multiple model responses.

@paddle-bot
Copy link

paddle-bot bot commented Sep 25, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Sep 25, 2025
@CLAassistant
Copy link

CLAassistant commented Sep 26, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ kxz2002
✅ gzy19990617
❌ LiqinruiG
You have signed the CLA already but the status is still pending? Let us recheck it.

request["prompt_token_ids_len"] = len(request["prompt_token_ids"])
input_ids_len = request["prompt_token_ids_len"]
request["max_tokens"] = min(self.max_model_len - input_ids_len, request.get("max_tokens"))
if request.get("reasoning_max_tokens", None) is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里reasoning_max_tokens的逻辑去掉吧

chunk_object_type: str = "chat.completion.chunk"
first_iteration = True
previous_num_tokens = 0
n_param = request.n if request.n is not None else 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用num_choices就可以,不用再加一个n_param,

first_iteration[idx] = False

output = res["outputs"]
reasoning_content = output["reasoning_content"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行删掉


delta_message = DeltaMessage(
reasoning_content="",
reasoning_content=reasoning_content,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行也改成reasoning_content=""

prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=final_res.get("num_cached_tokens", 0)),
prompt_tokens_details=PromptTokenUsageInfo(cached_tokens=sum(num_cached_tokens)),
)
work_process_metrics.e2e_request_latency.observe(time.time() - final_res["metrics"]["request_start_time"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行挪到循环里面就行,不用记latency

request_prompts = request_prompt_ids

num_choices = len(request_prompts)
num_choices = len(request_prompts) * request.n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.n需要判空,下面的n_param = current_req_dict.get("n", 1)可以挪到上面来,用n_param去乘

try:
for idx, prompt in enumerate(request_prompts):
request_id_idx = f"{request_id}-{idx}"
request_id_idx = f"{request_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行可以删掉

for idx, prompt in enumerate(request_prompts):
request_id_idx = f"{request_id}-{idx}"
request_id_idx = f"{request_id}"
current_req_dict = request.to_dict_for_infer(request_id_idx, prompt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里参数直接传request_id

else:
arrival_time = res["metrics"]["arrival_time"] - inference_start_time
if first_iteration:
if first_iteration[idx]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面已经是 for i in range(num_choices)了, first_iteration 不需要定义成 数组了。

)
has_no_token_limit = request.max_tokens is None and request.max_completion_tokens is None
max_tokens = request.max_completion_tokens or request.max_tokens
# output 可能没有num_cached_tokens字段
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不使用中文注释

if logprobs_res and logprobs_res.content is not None:
logprob_contents.extend(logprobs_res.content)
logprob_contents[idx].extend(logprobs_res.content)
if data["finished"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 if 中的代码,建议单独提一个方法,返回一个 choice, append 到 choices 中


num_prompt_tokens += len(prompt_token_ids)

num_prompt_tokens = num_prompt_tokens // request.n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 if request.n is None else request.n

@LiqinruiG LiqinruiG merged commit b5b993e into PaddlePaddle:develop Oct 17, 2025
34 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants