feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78
feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78lyzustc wants to merge 2 commits into
Conversation
… and gateway config fix in custom rollout function
| # But you must set model family name explicitly if actor_rollout_ref.model.path | ||
| # is a local model path. Check supported model families in MODEL_RENDERER_MAP of | ||
| # https://github.com/PrimeIntellect-ai/renderers/blob/main/renderers/base.py | ||
| renderer_family: "auto" |
There was a problem hiding this comment.
the semantics of renderer family does not look clear - if i am a user of the trainer class. if this could be inferred from model name then probably do not need to expose this hyperparam ?
There was a problem hiding this comment.
Thanks for feedback. Renderer family cannot be inferred from local model path, in this case user must explicitly set it. I changed the comments here to clarify.
| acr_tps_limit: 25 # ACR service TPS quota | ||
| max_concurrent: 100 # max concurrent ACR sessions (eval batching) | ||
| max_concurrent: 100 # max concurrent ACR sessions | ||
| max_pool_connections: 256 # boto3 connection pool size |
There was a problem hiding this comment.
it feels like max_concurrent and max_pool_connections are controlling similar things - do we need to specify these two arguments or the effective one is actually the one with lower value ?
There was a problem hiding this comment.
No, they are different. max_pool_connections are pased to RolloutClient initialize here https://github.com/awslabs/agentcore-rl-toolkit/blob/main/src/agentcore_rl_toolkit/client.py#L387, controlling boto3 connection bool size, not the max number of concurrently running ACR agent sessions.
| items = out.split(b"\0") | ||
| return [x.decode() for x in items if x] | ||
|
|
||
| def _cuda_ld_library_path(self) -> str: |
There was a problem hiding this comment.
do we really need this at the runner level? how did slime handle this?
This PR adds drift-free multi-turn trajectory merging to the slime custom rollout
function and fixes several issues that surfaced when running on full training /
validation datasets (eval episodes timing out and crashing the engine, connection-pool
churn, and unreadably noisy logs). It also brings
SlimeRunnerand the example scriptsin line with these changes.
Custom rollout function (
integration/)traces.py). Addedmerge_traces_to_samples: an episode's turns are folded into one trainingSample per contiguous prefix-extending segment (bridge tokens masked out,
completions trained), instead of one Sample per turn. Pairs with the gateway's
cumulative token mode; falls back to one-Sample-per-turn when a turn breaks the
prefix.
rollout.py,gateway.py). Newcumulative_token_mode/renderer_familyconfig knobs, forwarded to thegateway;
merge_traces_to_samplesis wired in as the rollout's sample builder.rollout.py). Supported doing validation on an input validationset periodically with an interval in training. Bounded in-flight agent sessions in validation
by
max_concurrentin yaml config.rollout.py). Addedmax_pool_connectionsso user can set
>= max_concurrentto stop boto3 client logging "Connection poolis full, discarding connection" under concurrent ACR sessions.
rollout.py,gateway.py). Replaced per-session loggingwith a single per-batch summary (
episodes / succeeded / failed / sequences);failed episodes log once at INFO with the full ACR
session_idfor CloudWatchlookup. Added
gateway_log_level(defaultwarning) to silence the gateway'suvicorn/httpx access logs.
/generate(gateway.py). The slimebackend now enables
use_sglangunconditionally and always passes the served--model; removed the deaduse_sglangconfig field. (The/generatemechanics live in the rllm-model-gateway PR.)
Training scripts & runner
runner.py— exposed the new knobs onSlimeRunner(
sglang_tool_call_parser,sglang_reasoning_parser,cumulative_token_mode,renderer_family,max_pool_connections,gateway_log_level,sglang_context_length) and addedcuda_hometo pin CUDA_HOME/LD_LIBRARY_PATH(incl.
--train-env-varsfor the Megatron actors), fixing TransformerEngine's"Multiple libcudart" abort. The tool-call parser is now a field instead of a
hardcoded
qwen25.examples/math_agent/train.sh— ported the CUDA-toolchain pinning,torch_memory_saverfixup, eval flags, log suppression, and checkpointing fromthe working run; all paths/credentials stay env-var placeholders.
config.yaml.example— documented the newmax_pool_connections,cumulative_token_mode, andrenderer_familysettings.This PR must work together with rllm-org/rllm#715.
A follow-up PR with slime environment setup guide in docs changes will come soon.