You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Replace Qwen 2.5 0.5B with Gemma 3 270M for faster inference
- Add stats display showing average STT, LLM, TTS times in ms
- Create useStats composable for timing metrics
- Update chat template to Gemma format
- Reduce total model download from ~380MB to ~210MB
Copy file name to clipboardExpand all lines: README.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Ava uses a pipeline architecture with three WebAssembly-powered stages:
14
14
15
15
1.**Speech Recognition** — Audio from the microphone is captured and processed by Whisper (tiny-en model) running in WASM. The `useWhisper` composable handles audio chunking and streams transcriptions every 2 seconds.
16
16
17
-
2.**Language Model** — Transcribed text is passed to Qwen 0.5B via Wllama (llama.cpp WASM port). The `useConversation` composable orchestrates the flow, triggering inference when speech ends and streaming tokens back as they're generated.
17
+
2.**Language Model** — Transcribed text is passed to Gemma 3 270M via Wllama (llama.cpp WASM port). The `useConversation` composable orchestrates the flow, triggering inference when speech ends and streaming tokens back as they're generated.
18
18
19
19
3.**Speech Synthesis** — Generated text is split at sentence boundaries (`. ! ? ,`) and queued to the browser's native SpeechSynthesis API. This enables low-latency voice output that starts speaking before the full response is complete.
20
20
@@ -29,7 +29,7 @@ All processing happens client-side with zero network requests after initial mode
0 commit comments