feat: add --raw-prompt flag to skip chat template wrapping by Krigsexe · Pull Request #3 · xigh/herbert-rs

Krigsexe · 2026-04-20T00:49:26Z

Summary

Add a --raw-prompt CLI flag that passes the prompt directly to the tokenizer without ChatML or Mistral instruct template wrapping.

Motivation

Without --raw-prompt, herbert-rs wraps the prompt in ChatML (<|im_start|>user...), which nests the model's special tokens inside the chat template. The model cannot parse its own tokens and either refuses to answer or generates garbage.

With --raw-prompt, the special tokens are passed directly and the model processes them correctly: language detection, query analysis, source analysis, and grounded answers with citations.

Changes

crates/cli/src/main.rs: +15/-3 lines
- Add --raw-prompt bool flag to Cli struct
- Skip build_prompt() wrapping in single-shot mode when flag is set
- Skip build_prompt() wrapping in reserve hint calculation when flag is set

Testing

Tested with PleIAs/Pleias-RAG-1B (Q4 backend, llama model_type via PR #2) on Ryzen 5 3600 (AVX2):

Without --raw-prompt: model wraps in ChatML, refuses to process sources ("insufficient information")
With --raw-prompt: model correctly generates <|language_start|>French<|language_end|>, <|query_analysis_start|>, <|source_analysis_start|>, <|query_report_start|>Answerable<|query_report_end|> structured output
Decode: 20.2 tok/s, Prefill: 94.8 tok/s

Some models (e.g. PleIAs/Pleias-RAG-1B) use custom special tokens for structured input/output and do not define a chat_template in their tokenizer config. The default ChatML wrapping (<|im_start|>user...) breaks these models because their special tokens get nested inside the chat template instead of being parsed directly. --raw-prompt passes the prompt text directly to the tokenizer without any chat template wrapping, enabling models with custom token protocols to work correctly. Tested with PleIAs/Pleias-RAG-1B (Q4 backend) on Ryzen 5 3600 (AVX2): the model correctly processes <|query_start|>, <|source_start|>, <|source_analysis_start|> tokens and generates structured RAG output with language detection, query analysis, and source grounding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --raw-prompt flag to skip chat template wrapping#3

feat: add --raw-prompt flag to skip chat template wrapping#3
Krigsexe wants to merge 1 commit into
xigh:mainfrom
Krigsexe:add-raw-prompt-flag

Krigsexe commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Krigsexe commented Apr 20, 2026

Summary

Motivation

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants