Hi,
thanks for the great work on ds4.
I would like to ask whether multi-user / multi-agent serving is part of the roadmap.
As far as I understand, ds4-server currently supports multiple clients, but generation is still serialized through a single inference worker. This makes sense for a single interactive session, but it limits use cases where multiple coding agents are connected at the same time.
The feature I would be interested in is not necessarily full continuous batching immediately, but at least:
- multiple independent sessions
- stable session routing, for example through an X-Session-Id header
- separate KV/cache state per session
- warm session pool
- LRU eviction for inactive sessions
- priority/queue scheduling between clients
In the longer term, continuous batching would obviously be ideal, but even a multi-session pool would already make ds4 much more practical for local multi-agent workflows on systems like DGX Spark.
Is this something you plan to implement, or would contributions in this direction be welcome?
Thanks!
Hi,
thanks for the great work on ds4.
I would like to ask whether multi-user / multi-agent serving is part of the roadmap.
As far as I understand, ds4-server currently supports multiple clients, but generation is still serialized through a single inference worker. This makes sense for a single interactive session, but it limits use cases where multiple coding agents are connected at the same time.
The feature I would be interested in is not necessarily full continuous batching immediately, but at least:
In the longer term, continuous batching would obviously be ideal, but even a multi-session pool would already make ds4 much more practical for local multi-agent workflows on systems like DGX Spark.
Is this something you plan to implement, or would contributions in this direction be welcome?
Thanks!