Skip to content

Berkay2002/stt-service

Repository files navigation

STT-SERVICE

license last-commit repo-top-language repo-language-count

Built with the tools and technologies:

Flask NumPy Docker Python


Table of Contents

Overview

stt-service is a high-performance, real-time speech-to-text microservice built with Python and FastAPI. It provides enhanced partial and final transcription capabilities with sub-300ms latency for conversational AI applications. The service features dual-mode processing, intelligent voice activity detection, and comprehensive monitoring for production deployment. Designed for seamless integration with conversational AI pipelines, it enables natural real-time interactions with advanced interrupt handling (barge-in) support.


Features

  • Partial and Final Transcription: Real-time partial results (~300ms) with high-quality final transcriptions (~600ms)
  • Enhanced Voice Activity Detection: Smart triggering with utterance boundary detection and barge-in support
  • Dual Processing Modes: Optimized pipelines for speed (partial) vs. accuracy (final) with overlapping audio windows
  • WebSocket Real-time API: FastAPI-based server with concurrent connections and intelligent buffering
  • Conversational AI Optimized: Purpose-built for STT→LLM→TTS pipelines with interrupt handling
  • GPU-Accelerated Processing: Whisper model integration with smart caching and resource management
  • Production Monitoring: Comprehensive metrics, health checks, and performance analysis tools
  • Backward Compatible: Legacy transcription API support for existing integrations
  • Docker & GPU Ready: Complete containerization with CUDA support and scaling configurations
  • Performance Testing: Built-in benchmarking tools for latency validation and optimization

Project Structure

└── stt-service/
    ├── Dockerfile
    ├── Dockerfile.gpu
    ├── app
    │   ├── __init__.py
    │   ├── core
    │   │   ├── audio_stream_processor.py    # Enhanced dual-mode processing
    │   │   ├── websocket_server.py          # FastAPI WebSocket server
    │   │   ├── whisper_handler.py           # Partial/final transcription
    │   │   └── microphone_capture.py
    │   ├── client_examples
    │   │   └── websocket_client_example.py  # Enhanced client with partial/final
    │   ├── main.py                          # CLI with WebSocket mode
    │   ├── monitoring
    │   └── utils
    ├── assets
    │   └── harvard.wav
    ├── docs
    │   ├── MONITORING.md
    │   ├── WebSocket_STT_Architecture.md    # Complete architecture guide
    │   └── PRODUCTION_MONITORING.md
    ├── main.py                              # Service entry point
    ├── test_partial_final_performance.py    # Performance validation
    ├── requirements.txt
    ├── scripts
    │   └── test_endpoints.bat
    └── tests
        ├── __init__.py
        ├── test_error_handler.py
        ├── test_logging.py
        ├── test_monitoring_simple.py
        └── test_realtime.py

Project Index

STT-SERVICE/
__root__
requirements.txt Python package dependencies for running stt-service (see `requirements.txt`).
dockerfile-gpu.txt Optional GPU-enabled Dockerfile snippet for building with CUDA / GPU support.
Dockerfile Production-ready container build for the microservice.
scripts
test_endpoints.bat Windows batch script to exercise HTTP endpoints for quick local testing.
app
main.py Flask application entrypoint and CLI for running the service.
core
whisper_handler.py Model adapter that wraps Whisper (or compatible) transcription models.
realtime_transcription.py Implements realtime transcription flow and request handling utilities.
microphone_capture.py Helpers for capturing audio from a microphone or system input device.
audio_processor.py Audio preprocessing utilities (resampling, chunking, normalization).
monitoring
monitoring_service.py Provides monitoring endpoints and routines used by health checks and probes.
service_monitor.py Background routines to collect service metrics and write JSON test outputs.
utils
logger.py Logging configuration and helpers used throughout the service.
config.py Configuration helpers (environment variables, default settings).
error_handler.py Centralized error handling utilities and testable helpers.
connection_manager.py Network and dependency connection helpers (external model backends, telemetry sinks).

Getting Started

Prerequisites

Before getting started with stt-service, ensure your runtime environment meets the following requirements:

  • Programming Language: Python
  • Package Manager: Pip
  • Container Runtime: Docker

Installation

Install stt-service using one of the following methods:

Build from source:

  1. Clone the stt-service repository:
❯ git clone https://github.com/Berkay2002/stt-service
  1. Navigate to the project directory:
cd stt-service
  1. Install the project dependencies:

Using pip  

❯ pip install -r requirements.txt

Using docker  

❯ docker build -t Berkay2002/stt-service .

Usage

WebSocket Real-time Server (Recommended)

# Start WebSocket server for real-time transcription
❯ python main.py websocket --port 8000

# With custom configuration
❯ python main.py websocket --host 0.0.0.0 --port 8080 --max-connections 100

File Processing Mode

# Process single audio file
❯ python main.py file input.wav

# With word-level timestamps
❯ python main.py file input.wav --timestamps

Microphone Recording Mode

# Record and transcribe from microphone
❯ python main.py microphone --timestamps

Configuration Management

# Show current configuration
❯ python main.py config --show

Docker Deployment

CPU Version:

❯ docker build -t stt-service .
❯ docker run -p 8000:8000 -p 9091:9091 stt-service

GPU Version:

❯ docker build -f Dockerfile.gpu -t stt-service-gpu .
❯ docker run --gpus all -p 8000:8000 -p 9091:9091 stt-service-gpu

Docker Compose with GPU:

❯ docker-compose -f docker-compose.gpu.yml up

Testing

Unit Tests

# Run all unit tests
❯ pytest

# Run specific test modules
❯ pytest tests/test_error_handler.py
❯ pytest tests/test_realtime.py

Performance Testing

# Test partial/final transcription performance
❯ python test_partial_final_performance.py

# Expected output:
# ✅ EXCELLENT: Partial latency < 300ms  
# ✅ EXCELLENT: Final latency < 600ms
# ✅ CONFIRMED: Partial transcription working
# ✅ CONFIRMED: Final transcription working

WebSocket Client Testing

# Interactive client with partial/final support
❯ python app/client_examples/websocket_client_example.py

# Choose option 1 for microphone testing
# Choose option 2 for test audio file
# Choose option 3 for connection stress test

API Endpoint Testing

# Test HTTP endpoints (Windows)
❯ scripts/test_endpoints.bat

# Test WebSocket health
❯ curl http://localhost:8000/health
❯ curl http://localhost:8000/stats

License

This project is protected under the MIT License.


Acknowledgments

Thanks to the open-source projects and communities that make this work possible:

  • OpenAI and the Whisper model authors — for the foundational speech-to-text models and faster-whisper optimizations
  • FastAPI community — for the high-performance async web framework enabling real-time WebSocket processing
  • Python audio ecosystem — NumPy, soundfile, pyaudio, scipy for comprehensive audio processing capabilities
  • GPU acceleration libraries — PyTorch, CUDA toolkit for efficient real-time transcription processing
  • WebSocket and async communities — for enabling seamless real-time communication protocols

Quick Start Guide

  1. Install dependencies:

    pip install -r requirements.txt
  2. Start WebSocket server:

    python main.py websocket
  3. Test with performance validator:

    python test_partial_final_performance.py
  4. Try interactive client:

    python app/client_examples/websocket_client_example.py

WebSocket Endpoints

  • Transcription: ws://localhost:8000/ws/transcribe
  • Health Check: http://localhost:8000/health
  • Statistics: http://localhost:8000/stats
  • Monitoring: http://localhost:9091/health

Performance Targets

  • Partial Results: <300ms average latency
  • Final Results: <600ms average latency
  • Concurrent Connections: 50+ streams
  • Real-time Factor: <0.3 GPU utilization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages