-
Notifications
You must be signed in to change notification settings - Fork 947
feat: improve gemini performance #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: improve gemini performance #1017
Conversation
…oved performance and transcription handling
…lization to allow proactive audio
@plutoless, can you please review? This improves the Gemini performance dramatically and allows using it in a production environment. |
|
||
var ( | ||
logTag = slog.String("service", "HTTP_SERVER") | ||
MAX_GEMINI_WORKER_COUNT = getMaxGeminiWorkerCount() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the MAX_GEMINI_WORKER_COUNT
configurable via environment variables, which is essential for production deployments where performance tuning is critical. For example, I have rank 3, so I can have much more than 3 conversations at a time.
ai_agents/agents/ten_packages/extension/gemini_v2v_python/extension.py
Outdated
Show resolved
Hide resolved
ai_agents/agents/ten_packages/extension/gemini_v2v_python/extension.py
Outdated
Show resolved
Hide resolved
and len(self.buff) > 0 | ||
): | ||
await self._flush_audio_buffer() | ||
ten_env.log_debug("Flushed audio buffer due to timeout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout-based buffer flush is crucial for handling edge cases in real-time audio processing where audio data might get stuck in the buffer.
Example scenario:
User says "Hello" (short phrase)
Audio data is 800 bytes, below 1024 threshold
Without timeout flush, this audio would never get processed
With 500ms timeout, it gets flushed automatically, ensuring responsiveness
@plutoless Please update me what you think, solved some of your comments and commented on others |
This pull request introduces significant updates to the
GeminiRealtimeConfig
class and its associated methods within thegemini_v2v_python
extension. The changes enhance audio processing, improve connection management, and update configuration defaults for better performance and reliability.Significantly reduced the response time and experience.
Configuration Updates:
model
field inGeminiRealtimeConfig
to use a new model version (gemini-2.5-flash-preview-native-audio-dialog
) for improved functionality.audio_chunk_size
toaudio_buffer_threshold
for clarity and updated its usage throughout the code. [1] [2]audio_len_threshold
from 5120 to 1024 and made it configurable viaaudio_buffer_threshold
. [1] [2]transcribe_agent
in theproperty.json
file to allow transcription of agent responses.Audio Processing Enhancements:
audio_queue
for asynchronous audio processing, enabling non-blocking operations and handling of audio data with improved efficiency. [1] [2]_process_audio_queue
method to process queued audio data in the background, optimizing real-time audio handling.Connection Management Improvements:
_loop
with_connection_manager
, which uses retries with exponential backoff and proper error handling for robust session management._run_session
to streamline session initialization and task handling, ensuring better resource management and logging.Task Handling and Cleanup:
tasks
list to manage all asynchronous tasks, enabling clean cancellation duringon_stop
. [1] [2]on_stop
to cancel all running tasks and clean up the session gracefully.Transcription Updates:
_send_transcript
asynchronous and added lower-priority task creation for input and output transcriptions.