Skip to content

Conversation

@doublefx
Copy link

Summary

Adds a browser TTS voice selector with language filtering to the Speech settings, enabling users to choose from available system voices and filter by language.

Addresses #900

Features

Voice Selection UI

  • Dropdown selector showing all available browser TTS voices
  • Language filter to show only voices for selected language(s)
  • Real-time voice preview when selecting from dropdown
  • Automatic detection of voice language from voice name
  • Persistent voice selection (saved to localStorage)

Language Detection

  • Extracts language codes from voice names (e.g., "Google français" → "fr")
  • Supports multilingual voices (e.g., "Microsoft David - English (United States)" → "en-US")
  • Falls back to browser-reported language when available
  • Handles multiple language patterns and formats

User Experience

  • Multi-select language filter with "All Languages" option
  • Voice preview on selection (speaks sample text in selected voice)
  • Smooth integration with existing Speech settings UI
  • Voice persists across sessions and page reloads

Technical Implementation

New Files

  • webui/components/settings/speech/voice-setting-store.js - Alpine.js store managing voice state
  • webui/components/settings/speech/voice.html - Voice selector UI component

Architecture

  • Uses Alpine.js for reactive state management
  • Integrates with existing speech-store.js for voice application
  • Follows Agent Zero's component patterns and styling
  • Robust error handling for missing voices and browser compatibility

Benefits

  • Multilingual support: Users can easily switch between language voices
  • Accessibility: Better control over TTS output for users with specific voice preferences
  • Code-switching: Particularly useful for technical content mixing languages (e.g., French voice speaking English technical terms)

Testing

  • ✅ Voice selection and persistence across reloads
  • ✅ Language filtering with multiple languages
  • ✅ Voice preview functionality
  • ✅ Integration with existing TTS playback
  • ✅ Error handling for edge cases

Screenshots

See issue #900 for UI screenshots and demonstrations.

Frederic Thomas and others added 5 commits January 13, 2026 23:31
Implemented a user-friendly voice selector for browser-based text-to-speech
with two-level selection (language → voice) and conditional visibility based
on Kokoro TTS toggle state.

Features:
- Two dropdown selectors: language filter and voice selection
- Voice preferences stored in localStorage (like microphone selector)
- Automatic migration from settings API to localStorage
- Language prioritization (common languages first)
- Conditional visibility (only shows when Kokoro TTS is disabled)
- Integration with speech-store via voiceSettingStore

New Files:
- webui/components/settings/speech/voice-setting-store.js
  Alpine.js store managing voice selection, language filtering, and
  localStorage persistence
- webui/components/settings/speech/voice.html
  HTML component with language and voice dropdown selectors

Modified Files:
- python/helpers/settings.py
  * Added 'condition' field to SettingsField TypedDict for conditional visibility
  * Added tts_kokoro toggle and tts_browser_voice_section HTML component
  * Implemented conditional field rendering based on other field values
- webui/components/chat/speech/speech-store.js
  * Imported voiceSettingStore for direct integration
  * Removed tts_browser_voice property (now handled by voice store)
  * Modified speakWithBrowser() to call voiceSettingStore.getSelectedVoice()
- webui/index.html
  * Added conditional field visibility logic using x-show directive
  * Evaluates field.condition against section.fields values

Known Issues:
- Hardcoded conditional visibility logic (not scalable)
- Missing localStorage error handling (crashes in private browsing)
- Race condition in voice store initialization
- Inconsistent HTML structure vs microphone.html

Testing:
- Voice selection works and persists across page reloads
- Language filtering correctly updates voice dropdown
- Kokoro toggle correctly shows/hides voice selector
- Migration from settings to localStorage works for existing users

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
P0-1: Add localStorage error handling
- Wrap all localStorage.getItem() calls in try-catch blocks
- Wrap all localStorage.setItem() calls in try-catch blocks
- Add user notification via toast when save fails
- Graceful degradation in private browsing mode and quota exceeded scenarios

P0-2: Add race condition guards
- Add initialization check in getSelectedVoice()
- Return null with warning if voices not loaded yet
- Prevent crashes when TTS triggered before voices load
- Safe fallback to browser default voice

P0-3: Fix conditional visibility logic
- Replace hardcoded conditional logic with scalable helper method
- Add evaluateFieldCondition(condition, fields) to settings.js
- Support any field condition pattern (not just !tts_kokoro)
- Console warnings for conditions referencing unknown fields

All P0 fixes tested manually in UI:
- Private browsing mode compatibility
- Race condition handling on page load
- Conditional visibility toggle behavior
- Full workflow with persistence
- No regression in other Speech settings

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
P1-1: Fix HTML structure in voice.html
- Add full HTML document structure to match microphone.html
- Add <html>, <head>, <body> tags for proper CSS cascade
- Move script import to <head> section
- Add empty <style> section at end of <body>
- Ensures consistency with other Speech section components

P1-2: Standardize logging prefixes
- Add [Microphone Selector] prefix to microphone-setting-store.js
- Add [Speech Store] prefix to all speech-store.js console logs
- Maintain existing [Voice Selector] prefix in voice-setting-store.js
- Improves debugging experience and log filtering

All P1 fixes tested manually in UI:
- No visual regressions
- HTML structure proper in DevTools
- Console logs have consistent prefixes across Speech components

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
P2-1: Fix localStorage key naming consistency
- Rename localStorage key: ttsSelectedVoice → voiceSelectedVoice
- Matches microphone selector pattern (microphoneSelectedDevice)
- Add automatic migration from old key to new key
- Delete old key after successful migration
- Log migration to console for debugging

P2-2: Improve migration error handling
- Add user notification when migration from settings fails
- Use toastFetchError() for non-blocking error toast
- Message: "Could not load voice preferences. Using defaults."
- Maintain existing console error logging

All P2 fixes tested manually in UI:
- Old key automatically migrates to new key
- Old key deleted after migration
- Voice selection persists correctly with new key
- Migration message appears only once (first load after upgrade)
- Fresh user scenario works correctly

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
P3-1: Add loading/error states to UI
- Add isLoading and hasError flags to voice-setting-store
- Display "Loading voices..." message during initialization
- Display "No voices available" message if browser has no voices
- Wrap dropdowns in conditional display based on loading state
- Improves UX for slow browsers or edge cases

P3-2: Add JSDoc type hints
- Add comprehensive JSDoc to voice-setting-store.js
- Add comprehensive JSDoc to microphone-setting-store.js
- Document all properties with type information
- Document all methods with descriptions
- Enables IDE autocomplete and better developer experience
- Type hints for SpeechSynthesisVoice, MediaDeviceInfo, etc.

All P3 fixes tested manually in UI:
- Loading state too fast to see (voices load quickly - good)
- Voice selector works normally after loading
- IDE autocomplete works with JSDoc type hints
- No functional regressions

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant