Follow-up from #654.
In the WebRTC flow shipped in #654, media flows directly browser↔OpenAI and the control plane only proxies the SDP offer/answer (POST /api/v1/sessions/:id/realtime-offer). That's the idiomatic, OpenAI-documented "unified interface" — good.
The open question is where realtime tool-calls are intercepted. Today, when the live model emits a function call, that event arrives on the client's WebRTC data channel, so something client-side has to relay it to POST /api/v1/sessions/:id/tools/:tool. OpenAI's recommended pattern for keeping tool use and business logic off the untrusted client is a server-side "sideband" WebSocket the backend opens to the same realtime session (wss://api.openai.com/v1/realtime?call_id=rtc_xxxxx, bearer auth). The rtc_... call id comes back in the Location header of the SDP calls response.
Why it matters
- Without the sideband, the browser is in the tool-call path — it can see/forge which tools fire.
tools=[...] is an allowlist, which mitigates exposure, but the business logic still depends on the client cooperating.
- The sideband is how we make
tools=[...] a real server-side exposure boundary and how the control plane can drive the session (inject tool outputs, monitor) without the client.
Acceptance criteria (behavior)
- When a session is started with
transport=webrtc, the control plane can open a sideband WebSocket to the same OpenAI session using the call_id from the offer response.
- A realtime function-call emitted by the model is delivered to the control plane over the sideband (not via the browser), routed through
execute/async with X-Session-ID, and the tool result is sent back into the session — all without the client relaying it.
- The browser-bridge path can remain as a fallback but is no longer required for tool calls to work.
Ref: OpenAI "Realtime server controls" / sideband docs (platform.openai.com/docs/guides/realtime-server-controls).
Follow-up from #654.
In the WebRTC flow shipped in #654, media flows directly browser↔OpenAI and the control plane only proxies the SDP offer/answer (
POST /api/v1/sessions/:id/realtime-offer). That's the idiomatic, OpenAI-documented "unified interface" — good.The open question is where realtime tool-calls are intercepted. Today, when the live model emits a function call, that event arrives on the client's WebRTC data channel, so something client-side has to relay it to
POST /api/v1/sessions/:id/tools/:tool. OpenAI's recommended pattern for keeping tool use and business logic off the untrusted client is a server-side "sideband" WebSocket the backend opens to the same realtime session (wss://api.openai.com/v1/realtime?call_id=rtc_xxxxx, bearer auth). Thertc_...call id comes back in theLocationheader of the SDPcallsresponse.Why it matters
tools=[...]is an allowlist, which mitigates exposure, but the business logic still depends on the client cooperating.tools=[...]a real server-side exposure boundary and how the control plane can drive the session (inject tool outputs, monitor) without the client.Acceptance criteria (behavior)
transport=webrtc, the control plane can open a sideband WebSocket to the same OpenAI session using thecall_idfrom the offer response.execute/asyncwithX-Session-ID, and the tool result is sent back into the session — all without the client relaying it.Ref: OpenAI "Realtime server controls" / sideband docs (
platform.openai.com/docs/guides/realtime-server-controls).