Conversation
| def _resolve_provider_config(self, config: dict[str, Any]) -> dict[str, Any]: | ||
| """Merge user config with defaults (user takes precedence).""" | ||
| # Extract voice from provider-specific speech_config.voice_config.prebuilt_voice_config.voice_name if present | ||
| provider_voice = None |
There was a problem hiding this comment.
Voice is passed in through the "audio" config.
0580606 to
d469584
Compare
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Nova Sonic configuration constants | ||
| NOVA_INFERENCE_CONFIG = {"maxTokens": 1024, "topP": 0.9, "temperature": 0.7} |
There was a problem hiding this comment.
No need to explicitly provide defaults. Nova already has implicit defaults for these that we can rely on.
| def _resolve_provider_config(self, config: dict[str, Any]) -> dict[str, Any]: | ||
| """Merge user config with defaults (user takes precedence).""" | ||
| # Extract voice from provider-specific audio.output.voice if present | ||
| provider_voice = None |
There was a problem hiding this comment.
Voice provided through "audio" config of type AudioConfig.
| "input_audio_format", | ||
| "output_audio_format", | ||
| "input_audio_transcription", | ||
| "turn_detection", |
There was a problem hiding this comment.
- type always has to be realtime and is already set by us.
- instructions is set by us through system prompt.
- voice is set by us through "audio" config.
- tools is set by us through the passed in
toolsparam. - input_audio_format, output_audio_format, input_audio_transcription, and turn_detection are not top-level configs and so would lead to exceptions if setting.
For more details on supported settings, see https://platform.openai.com/docs/api-reference/realtime-client-events/session/update#realtime_client_events-session-update-session.
d469584 to
8d1461c
Compare
| "max_tokens": "maxTokens", | ||
| "temperature": "temperature", | ||
| "top_p": "topP", | ||
| } |
There was a problem hiding this comment.
Using to promote consistency. We use snake_case everywhere else.
| "input_rate": GEMINI_INPUT_SAMPLE_RATE, | ||
| "output_rate": GEMINI_OUTPUT_SAMPLE_RATE, | ||
| "channels": GEMINI_CHANNELS, | ||
| "format": "pcm", |
There was a problem hiding this comment.
i think we should have a default voice here
There was a problem hiding this comment.
It does work without specifying. I tested that on all the models actually. With that said, we could remove the default voice setting on all configs but I didn't want to make too many changes here.
Description
Isolating the model inference configs to more easily extract them from
provider_configs.Testing
hatch run bidi:prepare: Updated unit tests