[Feature] support pooling model runner #4301

lizexu123 · 2025-09-28T00:11:38Z

This PR supports actual inference for pooling models

Usage

online serving

launch the serving
Embeddings can be obtained via the EmbeddingCompletionRequest API or the EmbeddingChatRequest API

# Start the service demo
model_path=Qwen3-Embedding-0.6B
# 以下三个参数是必须的 
export ENABLE_V1_KVCACHE_SCHEDULER=1 #启用 KV Cache 块调度器的 V1 版本
export FD_DISABLE_CHUNKED_PREFILL=1 #强制禁用默认的分块预填充（Chunked Prefill）功能
export FD_USE_GET_SAVE_OUTPUT_V1=1  # 是否使用新版本的 get_output 和 save_output 方法

python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} \
    --max-num-seqs 256 --max-model-len 32768 \
    --port 13331 --engine-worker-queue-port 7132 \
    --metrics-port 7431 --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.9 \
    --load-choices "default_v1" \
    --runner pooling \
    --graph-optimization-config '{"use_cudagraph":false}'

Request Method (curl example)

A. EmbeddingCompletionRequest 示例（标准文本输入）

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "input": [
      "This is a sentence for pooling embedding.",
      "Another input text."
    ],
    "user": "test_client"
  }'

B. EmbeddingChatRequest 示例（消息序列输入）

curl -X POST 'YOUR_SERVICE_URL/v1/embeddings' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "text-embedding-chat-model",
    "messages": [
      {"role": "user", "content": "Generate embedding for user query."}
    ]
  }'

…into pooling_emb_3

paddle-bot · 2025-09-28T00:11:43Z

Thanks for your contribution!

…into develop

…into pooling_emb_4

…/FastDeploy into pooling_emb_3

…into pooling_emb_3

…/FastDeploy into pooling_emb_3

…into pooling_emb_3

…/FastDeploy into pooling_emb_3

…into pooling_emb_3

lizexu123 added 4 commits September 22, 2025 20:33

support qwen3-embedding

7716866

fix ci bug

a1e505c

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

815d592

…into pooling_emb_3

support pooling dummy_run

d6d8c15

lizexu123 and others added 25 commits October 8, 2025 16:05

merge develop

5f49791

merge develop

5d7f5e5

merge develop

5fde033

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

001f23d

…into develop

feat: add OpenAIServing

7a87fbc

feat: add ZmqOpenAIServing & OpenAIServingEmbedding

9e63f56

feat: Refine the basic ServingEngine class and introduce ServingContext

fb90036

merge develop

31a4a6b

fix: codestyle

d82146f

fix

d8cce66

delete print

b4f9d9c

pooling model runner

b5f765a

fix: request

71a40a7

merge develop

43fe17f

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

d6df785

…into pooling_emb_4

parallel_config.max_model_len

fe5de8b

fix: request

abfc268

fix: pooling_params

96846f6

feat: _process_chat_template_kwargs

3f9d216

update

f936542

update

ad1eadc

delete is_pooling_model in dummy_run

af9a48f

fix

73e5f07

fix chongtu

7bae906

fd_model

7ba813f

lizexu123 and others added 23 commits October 16, 2025 17:00

fix

3f3b99e

Merge commit 'refs/pull/4345/head' of https://github.com/PaddlePaddle…

66e90ba

…/FastDeploy into pooling_emb_3

fix post_process

e2aeb8f

feat: add post-processing step for pool_output

78de836

merge develop

10c4433

add qwen-2.5-7B-PRM/ernie-rm

33c50c5

fix conflicts & merge

618a6ae

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

0cb1604

…into pooling_emb_3

Merge commit 'refs/pull/4462/head' of https://github.com/PaddlePaddle…

d712874

…/FastDeploy into pooling_emb_3

Merge commit 'refs/pull/4319/head' of https://github.com/PaddlePaddle…

2b95126

…/FastDeploy into pooling_emb_3

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

9ef0ddd

…into pooling_emb_3

bugfix

829710e

update

774acbc

fix: test_serving_embedding

63e22d1

fix test_request_to_batch_dicts

68f642a

Merge commit 'refs/pull/4462/head' of https://github.com/PaddlePaddle…

2770f1b

…/FastDeploy into pooling_emb_3

fix

2d52062

update

8904780

update develop

5c7efb7

support stop

2430943

fix hung

30813f2

support qwen3-embedding-0.6B

f215332

delete print

9e2eea4

lizexu123 force-pushed the pooling_emb_3 branch from e7a529a to 9e2eea4 Compare October 24, 2025 09:43

lizexu123 added 2 commits October 24, 2025 18:00

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

69fff87

…into pooling_emb_3

add test

4aa4305

lizexu123 force-pushed the pooling_emb_3 branch from 90e6549 to 4aa4305 Compare October 24, 2025 13:46

lizexu123 added 3 commits October 27, 2025 11:53

update

ad58715

update

1222943

fix

2efc029

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] support pooling model runner #4301

[Feature] support pooling model runner #4301

Uh oh!

lizexu123 commented Sep 28, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] support pooling model runner #4301

Are you sure you want to change the base?

[Feature] support pooling model runner #4301

Uh oh!

Conversation

lizexu123 commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

online serving

Request Method (curl example)

A. EmbeddingCompletionRequest 示例（标准文本输入）

B. EmbeddingChatRequest 示例（消息序列输入）

Uh oh!

paddle-bot bot commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lizexu123 commented Sep 28, 2025 •

edited

Loading