stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
-
A unified interface for speech-to-text in R
-
A way to switch easily between:
{whisper}(native R torch, local GPU/CPU)- OpenAI
/v1/audio/transcriptions(cloud or local servers) {audio.whisper}(whisper.cpp)
-
Designed for scripting, Shiny apps, containers, and reproducible pipelines
- Not a Whisper reimplementation
- Not a model manager
- Not a GPU / CUDA helper
- Not an audio preprocessing toolkit
- Not a replacement for
{whisper}or{audio.whisper}
remotes::install_github("cornball-ai/stt.api")Required dependencies are minimal:
curljsonlite
Optional backends:
{whisper}(recommended, on CRAN){audio.whisper}(whisper.cpp alternative){processx}(Docker helpers)
install.packages("whisper")
remotes::install_github("cornball-ai/stt.api")
library(stt.api)
res <- stt("speech.wav")
res$textThat's it. With {whisper} installed, stt() transcribes locally on GPU or CPU with no configuration needed.
stt.api also supports OpenAI-compatible APIs for cloud or container-based transcription:
set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
res <- stt("speech.wav", backend = "openai")This works with OpenAI, Whisper containers, LM Studio, OpenWebUI, AnythingLLM, or any server implementing /v1/audio/transcriptions.
When you call stt() without specifying a backend, it picks the first available:
{whisper}(native R torch, if installed){audio.whisper}(whisper.cpp, if installed)- OpenAI-compatible API (if
stt.api_baseis set) - Error with guidance
Regardless of backend, stt() always returns the same structure:
list(
text = "Transcribed text",
segments = NULL | data.frame(...),
language = "en",
backend = "api" | "whisper" | "audio.whisper",
raw = <raw backend response>
)This makes it easy to switch backends without changing downstream code.
stt_health()Returns:
list(
ok = TRUE,
backend = "api",
message = "OK"
)Useful for Shiny apps and deployment checks.
Explicit backend choice:
stt("speech.wav", backend = "openai")
stt("speech.wav", backend = "whisper")
stt("speech.wav", backend = "audio.whisper")Automatic selection (default):
stt("speech.wav")stt.api targets the OpenAI-compatible STT spec:
POST /v1/audio/transcriptions
This is intentionally chosen because it is:
- Widely adopted
- Simple
- Supported by many local and hosted services
- Easy to proxy and containerize
If you run Whisper or OpenAI-compatible STT in Docker, stt.api can optionally integrate via {processx}.
Example use cases:
- Starting a local Whisper container
- Checking container health
- Inspecting logs
Docker helpers are explicit and opt-in.
stt.api never starts containers automatically.
options(
stt.api_base = NULL,
stt.api_key = NULL,
stt.timeout = 60
)Setters:
set_stt_base()
set_stt_key()- No silent failures
- Clear messages when a backend is unavailable
- Actionable instructions when configuration is missing
Example:
Error in stt():
No transcription backend available.
Install whisper, install audio.whisper, or set stt.api_base.
stt.api is designed to pair cleanly with tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
- Minimal dependencies
- OpenAI-compatible API focus
- Backend-agnostic design
- Optional Docker support
Installing and maintaining local Whisper backends can be difficult:
- CUDA / cuBLAS issues
- Compiler toolchains
- Platform differences
stt.api lets you decouple your R code from those concerns.
Your transcription code stays the same whether the backend is:
- Local
- Containerized
- Cloud-hosted
- GPU-accelerated
- CPU-only
MIT