STT-SERVICE

Built with the tools and technologies:

Table of Contents

Overview
Features
Project Structure
- Project Index
Getting Started
License
Acknowledgments

Overview

stt-service is a high-performance, real-time speech-to-text microservice built with Python and FastAPI. It provides enhanced partial and final transcription capabilities with sub-300ms latency for conversational AI applications. The service features dual-mode processing, intelligent voice activity detection, and comprehensive monitoring for production deployment. Designed for seamless integration with conversational AI pipelines, it enables natural real-time interactions with advanced interrupt handling (barge-in) support.

Features

Partial and Final Transcription: Real-time partial results (~300ms) with high-quality final transcriptions (~600ms)
Enhanced Voice Activity Detection: Smart triggering with utterance boundary detection and barge-in support
Dual Processing Modes: Optimized pipelines for speed (partial) vs. accuracy (final) with overlapping audio windows
WebSocket Real-time API: FastAPI-based server with concurrent connections and intelligent buffering
Conversational AI Optimized: Purpose-built for STT→LLM→TTS pipelines with interrupt handling
GPU-Accelerated Processing: Whisper model integration with smart caching and resource management
Production Monitoring: Comprehensive metrics, health checks, and performance analysis tools
Backward Compatible: Legacy transcription API support for existing integrations
Docker & GPU Ready: Complete containerization with CUDA support and scaling configurations
Performance Testing: Built-in benchmarking tools for latency validation and optimization

Project Structure

└── stt-service/
    ├── Dockerfile
    ├── Dockerfile.gpu
    ├── app
    │   ├── __init__.py
    │   ├── core
    │   │   ├── audio_stream_processor.py    # Enhanced dual-mode processing
    │   │   ├── websocket_server.py          # FastAPI WebSocket server
    │   │   ├── whisper_handler.py           # Partial/final transcription
    │   │   └── microphone_capture.py
    │   ├── client_examples
    │   │   └── websocket_client_example.py  # Enhanced client with partial/final
    │   ├── main.py                          # CLI with WebSocket mode
    │   ├── monitoring
    │   └── utils
    ├── assets
    │   └── harvard.wav
    ├── docs
    │   ├── MONITORING.md
    │   ├── WebSocket_STT_Architecture.md    # Complete architecture guide
    │   └── PRODUCTION_MONITORING.md
    ├── main.py                              # Service entry point
    ├── test_partial_final_performance.py    # Performance validation
    ├── requirements.txt
    ├── scripts
    │   └── test_endpoints.bat
    └── tests
        ├── __init__.py
        ├── test_error_handler.py
        ├── test_logging.py
        ├── test_monitoring_simple.py
        └── test_realtime.py

Project Index

STT-SERVICE/

__root__

requirements.txt Python package dependencies for running stt-service (see `requirements.txt`).

dockerfile-gpu.txt Optional GPU-enabled Dockerfile snippet for building with CUDA / GPU support.

Dockerfile Production-ready container build for the microservice.

scripts

test_endpoints.bat Windows batch script to exercise HTTP endpoints for quick local testing.

app

main.py Flask application entrypoint and CLI for running the service.

core

whisper_handler.py Model adapter that wraps Whisper (or compatible) transcription models.

realtime_transcription.py Implements realtime transcription flow and request handling utilities.

microphone_capture.py Helpers for capturing audio from a microphone or system input device.

audio_processor.py Audio preprocessing utilities (resampling, chunking, normalization).

monitoring

monitoring_service.py Provides monitoring endpoints and routines used by health checks and probes.

service_monitor.py Background routines to collect service metrics and write JSON test outputs.

utils

logger.py Logging configuration and helpers used throughout the service.

config.py Configuration helpers (environment variables, default settings).

error_handler.py Centralized error handling utilities and testable helpers.

connection_manager.py Network and dependency connection helpers (external model backends, telemetry sinks).

Getting Started

Prerequisites

Before getting started with stt-service, ensure your runtime environment meets the following requirements:

Programming Language: Python
Package Manager: Pip
Container Runtime: Docker

Installation

Install stt-service using one of the following methods:

Build from source:

Clone the stt-service repository:

❯ git clone https://github.com/Berkay2002/stt-service

Navigate to the project directory:

❯ cd stt-service

Install the project dependencies:

Using pip

❯ pip install -r requirements.txt

Using docker

❯ docker build -t Berkay2002/stt-service .

Usage

WebSocket Real-time Server (Recommended)

# Start WebSocket server for real-time transcription
❯ python main.py websocket --port 8000

# With custom configuration
❯ python main.py websocket --host 0.0.0.0 --port 8080 --max-connections 100

File Processing Mode

# Process single audio file
❯ python main.py file input.wav

# With word-level timestamps
❯ python main.py file input.wav --timestamps

Microphone Recording Mode

# Record and transcribe from microphone
❯ python main.py microphone --timestamps

Configuration Management

# Show current configuration
❯ python main.py config --show

Docker Deployment

CPU Version:

❯ docker build -t stt-service .
❯ docker run -p 8000:8000 -p 9091:9091 stt-service

GPU Version:

❯ docker build -f Dockerfile.gpu -t stt-service-gpu .
❯ docker run --gpus all -p 8000:8000 -p 9091:9091 stt-service-gpu

Docker Compose with GPU:

❯ docker-compose -f docker-compose.gpu.yml up

Testing

Unit Tests

# Run all unit tests
❯ pytest

# Run specific test modules
❯ pytest tests/test_error_handler.py
❯ pytest tests/test_realtime.py

Performance Testing

# Test partial/final transcription performance
❯ python test_partial_final_performance.py

# Expected output:
# ✅ EXCELLENT: Partial latency < 300ms  
# ✅ EXCELLENT: Final latency < 600ms
# ✅ CONFIRMED: Partial transcription working
# ✅ CONFIRMED: Final transcription working

WebSocket Client Testing

# Interactive client with partial/final support
❯ python app/client_examples/websocket_client_example.py

# Choose option 1 for microphone testing
# Choose option 2 for test audio file
# Choose option 3 for connection stress test

API Endpoint Testing

# Test HTTP endpoints (Windows)
❯ scripts/test_endpoints.bat

# Test WebSocket health
❯ curl http://localhost:8000/health
❯ curl http://localhost:8000/stats

License

This project is protected under the MIT License.

Acknowledgments

Thanks to the open-source projects and communities that make this work possible:

OpenAI and the Whisper model authors — for the foundational speech-to-text models and faster-whisper optimizations
FastAPI community — for the high-performance async web framework enabling real-time WebSocket processing
Python audio ecosystem — NumPy, soundfile, pyaudio, scipy for comprehensive audio processing capabilities
GPU acceleration libraries — PyTorch, CUDA toolkit for efficient real-time transcription processing
WebSocket and async communities — for enabling seamless real-time communication protocols

Quick Start Guide

Install dependencies:
```
pip install -r requirements.txt
```
Start WebSocket server:
```
python main.py websocket
```

Test with performance validator:

python test_partial_final_performance.py

Try interactive client:

python app/client_examples/websocket_client_example.py

WebSocket Endpoints

Transcription: ws://localhost:8000/ws/transcribe
Health Check: http://localhost:8000/health
Statistics: http://localhost:8000/stats
Monitoring: http://localhost:9091/health

Performance Targets

Partial Results: <300ms average latency
Final Results: <600ms average latency
Concurrent Connections: 50+ streams
Real-time Factor: <0.3 GPU utilization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STT-SERVICE

Overview

Features

Project Structure

Project Index

Getting Started

Prerequisites

Installation

Usage

WebSocket Real-time Server (Recommended)

File Processing Mode

Microphone Recording Mode

Configuration Management

Docker Deployment

Testing

Unit Tests

Performance Testing

WebSocket Client Testing

API Endpoint Testing

License

Acknowledgments

Quick Start Guide

WebSocket Endpoints

Performance Targets

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
assets		assets
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
README-Docker.md		README-Docker.md
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
main.py		main.py
requirements.txt		requirements.txt
run-gpu.bat		run-gpu.bat
run-gpu.sh		run-gpu.sh
setup_websocket_server.py		setup_websocket_server.py
test_partial_final_performance.py		test_partial_final_performance.py

requirements.txt	Python package dependencies for running stt-service (see `requirements.txt`).
dockerfile-gpu.txt	`Optional GPU-enabled Dockerfile snippet for building with CUDA / GPU support.`
Dockerfile	`Production-ready container build for the microservice.`

whisper_handler.py	`Model adapter that wraps Whisper (or compatible) transcription models.`
realtime_transcription.py	`Implements realtime transcription flow and request handling utilities.`
microphone_capture.py	`Helpers for capturing audio from a microphone or system input device.`
audio_processor.py	`Audio preprocessing utilities (resampling, chunking, normalization).`

monitoring_service.py	`Provides monitoring endpoints and routines used by health checks and probes.`
service_monitor.py	`Background routines to collect service metrics and write JSON test outputs.`

logger.py	`Logging configuration and helpers used throughout the service.`
config.py	`Configuration helpers (environment variables, default settings).`
error_handler.py	`Centralized error handling utilities and testable helpers.`
connection_manager.py	`Network and dependency connection helpers (external model backends, telemetry sinks).`

Berkay2002/stt-service

Folders and files

Latest commit

History

Repository files navigation

STT-SERVICE

Overview

Features

Project Structure

Project Index

Getting Started

Prerequisites

Installation

Usage

WebSocket Real-time Server (Recommended)

File Processing Mode

Microphone Recording Mode

Configuration Management

Docker Deployment

Testing

Unit Tests

Performance Testing

WebSocket Client Testing

API Endpoint Testing

License

Acknowledgments

Quick Start Guide

WebSocket Endpoints

Performance Targets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages