Skip to content

feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78

Open
lyzustc wants to merge 2 commits into
awslabs:mainfrom
lyzustc:main
Open

feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78
lyzustc wants to merge 2 commits into
awslabs:mainfrom
lyzustc:main

Conversation

@lyzustc

@lyzustc lyzustc commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This PR adds drift-free multi-turn trajectory merging to the slime custom rollout
function and fixes several issues that surfaced when running on full training /
validation datasets (eval episodes timing out and crashing the engine, connection-pool
churn, and unreadably noisy logs). It also brings SlimeRunner and the example scripts
in line with these changes.

Custom rollout function (integration/)

  • feat — auto-merge multi-turn conversations (traces.py). Added
    merge_traces_to_samples: an episode's turns are folded into one training
    Sample per contiguous prefix-extending segment (bridge tokens masked out,
    completions trained), instead of one Sample per turn. Pairs with the gateway's
    cumulative token mode; falls back to one-Sample-per-turn when a turn breaks the
    prefix.
  • feat — cumulative token mode plumbing (rollout.py, gateway.py). New
    cumulative_token_mode / renderer_family config knobs, forwarded to the
    gateway; merge_traces_to_samples is wired in as the rollout's sample builder.
  • feat — validation (rollout.py). Supported doing validation on an input validation
    set periodically with an interval in training. Bounded in-flight agent sessions in validation
    by max_concurrent in yaml config.
  • fix — connection-pool churn (rollout.py). Added max_pool_connections
    so user can set >= max_concurrentto stop boto3 client logging "Connection pool
    is full, discarding connection" under concurrent ACR sessions.
  • fix — log noise (rollout.py, gateway.py). Replaced per-session logging
    with a single per-batch summary (episodes / succeeded / failed / sequences);
    failed episodes log once at INFO with the full ACR session_id for CloudWatch
    lookup. Added gateway_log_level (default warning) to silence the gateway's
    uvicorn/httpx access logs.
  • fix — gateway always uses SGLang /generate (gateway.py). The slime
    backend now enables use_sglang unconditionally and always passes the served
    --model; removed the dead use_sglang config field. (The /generate
    mechanics live in the rllm-model-gateway PR.)

Training scripts & runner

  • runner.py — exposed the new knobs on SlimeRunner
    (sglang_tool_call_parser, sglang_reasoning_parser, cumulative_token_mode,
    renderer_family, max_pool_connections, gateway_log_level,
    sglang_context_length) and added cuda_home to pin CUDA_HOME/LD_LIBRARY_PATH
    (incl. --train-env-vars for the Megatron actors), fixing TransformerEngine's
    "Multiple libcudart" abort. The tool-call parser is now a field instead of a
    hardcoded qwen25.
  • examples/math_agent/train.sh — ported the CUDA-toolchain pinning,
    torch_memory_saver fixup, eval flags, log suppression, and checkpointing from
    the working run; all paths/credentials stay env-var placeholders.
  • config.yaml.example — documented the new max_pool_connections,
    cumulative_token_mode, and renderer_family settings.

This PR must work together with rllm-org/rllm#715.
A follow-up PR with slime environment setup guide in docs changes will come soon.

… and gateway config fix in custom rollout function
# But you must set model family name explicitly if actor_rollout_ref.model.path
# is a local model path. Check supported model families in MODEL_RENDERER_MAP of
# https://github.com/PrimeIntellect-ai/renderers/blob/main/renderers/base.py
renderer_family: "auto"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics of renderer family does not look clear - if i am a user of the trainer class. if this could be inferred from model name then probably do not need to expose this hyperparam ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback. Renderer family cannot be inferred from local model path, in this case user must explicitly set it. I changed the comments here to clarify.

acr_tps_limit: 25 # ACR service TPS quota
max_concurrent: 100 # max concurrent ACR sessions (eval batching)
max_concurrent: 100 # max concurrent ACR sessions
max_pool_connections: 256 # boto3 connection pool size

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like max_concurrent and max_pool_connections are controlling similar things - do we need to specify these two arguments or the effective one is actually the one with lower value ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are different. max_pool_connections are pased to RolloutClient initialize here https://github.com/awslabs/agentcore-rl-toolkit/blob/main/src/agentcore_rl_toolkit/client.py#L387, controlling boto3 connection bool size, not the max number of concurrently running ACR agent sessions.

items = out.split(b"\0")
return [x.decode() for x in items if x]

def _cuda_ld_library_path(self) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this at the runner level? how did slime handle this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants