Skip to content

Conversation

Roei-Bracha
Copy link
Contributor

This pull request introduces significant updates to the GeminiRealtimeConfig class and its associated methods within the gemini_v2v_python extension. The changes enhance audio processing, improve connection management, and update configuration defaults for better performance and reliability.
Significantly reduced the response time and experience.

Configuration Updates:

  • Updated the model field in GeminiRealtimeConfig to use a new model version (gemini-2.5-flash-preview-native-audio-dialog) for improved functionality.
  • Renamed audio_chunk_size to audio_buffer_threshold for clarity and updated its usage throughout the code. [1] [2]
  • Changed audio_len_threshold from 5120 to 1024 and made it configurable via audio_buffer_threshold. [1] [2]
  • Enabled transcribe_agent in the property.json file to allow transcription of agent responses.

Audio Processing Enhancements:

  • Added an audio_queue for asynchronous audio processing, enabling non-blocking operations and handling of audio data with improved efficiency. [1] [2]
  • Introduced timeout-based audio buffer flushing and queue management to prevent data loss during high-load scenarios.
  • Created a new _process_audio_queue method to process queued audio data in the background, optimizing real-time audio handling.

Connection Management Improvements:

  • Replaced _loop with _connection_manager, which uses retries with exponential backoff and proper error handling for robust session management.
  • Added _run_session to streamline session initialization and task handling, ensuring better resource management and logging.

Task Handling and Cleanup:

  • Introduced a tasks list to manage all asynchronous tasks, enabling clean cancellation during on_stop. [1] [2]
  • Enhanced on_stop to cancel all running tasks and clean up the session gracefully.

Transcription Updates:

  • Made _send_transcript asynchronous and added lower-priority task creation for input and output transcriptions.

@Roei-Bracha Roei-Bracha changed the title feat: Improve gemini performance feat: improve gemini performance Jun 24, 2025
@Roei-Bracha
Copy link
Contributor Author

@plutoless, can you please review? This improves the Gemini performance dramatically and allows using it in a production environment.


var (
logTag = slog.String("service", "HTTP_SERVER")
MAX_GEMINI_WORKER_COUNT = getMaxGeminiWorkerCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the MAX_GEMINI_WORKER_COUNT configurable via environment variables, which is essential for production deployments where performance tuning is critical. For example, I have rank 3, so I can have much more than 3 conversations at a time.

and len(self.buff) > 0
):
await self._flush_audio_buffer()
ten_env.log_debug("Flushed audio buffer due to timeout")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout-based buffer flush is crucial for handling edge cases in real-time audio processing where audio data might get stuck in the buffer.

Example scenario:

User says "Hello" (short phrase)
Audio data is 800 bytes, below 1024 threshold
Without timeout flush, this audio would never get processed
With 500ms timeout, it gets flushed automatically, ensuring responsiveness

@Roei-Bracha
Copy link
Contributor Author

Roei-Bracha commented Jul 15, 2025

@plutoless Please update me what you think, solved some of your comments and commented on others

@Roei-Bracha Roei-Bracha requested a review from plutoless July 27, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants