Skip to content

Add speech speed control and conversation behavior toggles for gemini-live-2.5-flash-preview-native-audio-09-2025 Live API voice sessions #2029

@ronniehere

Description

@ronniehere

Description

Is your feature request related to a problem? Please describe.

In the Gemini Live API, key real time voice agent controls appear unavailable or not exposed as first class configuration. Specifically:

  • Speech rate is fixed or only indirectly controllable.
  • Session turn taking behavior is not configurable, such as whether the agent should speak first at session start.
  • Interruption and barge in behavior is not configurable, such as whether users can interrupt the agent mid utterance.

These controls are required for production voice agents and contact center deployments. Without them, the experience can feel unnatural, inaccessible, and inconsistent across languages, especially Arabic dialects and English. It also makes it hard to meet different business requirements for inbound versus outbound calls and compliance flows.

Describe the solution you would like

Expose explicit configuration options in the Live API for:

  • Speech speed adjustment for generated audio output.
  • Speak first behavior toggle to control whether the agent initiates the conversation automatically at session start.
  • Allow interruptions toggle to control whether user speech can interrupt or stop the agent while it is speaking.

These settings should be settable at session creation and updateable during an active session without requiring a session restart.

Proposed API capability

A) Speech speed

  • Add a parameter in Live API request configuration to set speech speed for generated audio output.
  • Allow updates mid session.
  • Preserve pitch by default, meaning time stretch rather than pitch shifting, unless explicitly enabled.

Suggested parameter:

  • speech_rate: float

    • Default: 1.0
    • Range: 0.5 to 2.0

B) Speak first toggle

  • speak_first: boolean

    • Default: false
    • When true, the agent produces an initial greeting immediately after session start without waiting for user audio.
    • Works for both inbound and outbound scenarios.

C) Allow interruptions toggle

  • allow_interruptions: boolean

    • Default: true
    • When true, user speech should barge in and interrupt agent playback, and the system should stop current agent audio output promptly.
    • When false, user speech should be ignored or buffered until the agent finishes speaking, depending on session mode.

Optional refinement:

  • interrupt_policy: enum

    • barge_in_stop_audio
    • barge_in_duck_audio
    • no_barge_in

Acceptance criteria

  • Parameters are documented in the Live API reference.
  • Behavior is deterministic and consistent across voices and languages, including Arabic.
  • Parameters can be updated during an active session without restarting the session.
  • Interruption behavior has low latency, meaning agent audio stops quickly when barge in is enabled.
  • Clear errors for invalid values and clearly defined defaults.

Describe alternatives you have considered

  • Prompting the model to speak slower or faster, or to wait for user input. This is unreliable.
  • External audio time stretching for speech rate, which adds latency and can degrade quality.
  • Building custom interruption logic by managing audio streams externally, which is complex and brittle.
  • Switching to another TTS stack that has speech rate and barge in controls, which reduces the value of using Gemini Live API.

Additional context

These controls are foundational for real time voice agents:

  • Speech speed improves clarity and accessibility, especially for medical instructions and confirmations.
  • Speak first is required for outbound calls and for agents that must lead the interaction.
  • Allow interruptions is required for natural conversation, higher user satisfaction, and reduced call duration.

Metadata

Metadata

Assignees

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions