daytonaio · codeaustral-oss · May 20, 2026
diff --git a/authors/codeaustral-oss.md b/authors/codeaustral-oss.md
@@ -0,0 +1,7 @@
+Author: CodeAustral OSS Title: OSS Engineering Studio Description: CodeAustral
+OSS builds small, reviewable open-source contributions with a focus on
+maintainer-friendly pull requests, reproducible verification, and practical
+developer tooling for modern engineering teams. Author Image: Author LinkedIn:
+Author Twitter: Company Name: CodeAustral LLC Company Description:
+Maintainer-friendly software engineering, developer tooling, and applied OSS
+workflows.
diff --git a/definitions/20260520_definition_mixed_language_transcription_workflow.md b/definitions/20260520_definition_mixed_language_transcription_workflow.md
@@ -0,0 +1,25 @@
+---
+title: 'Mixed-Language Transcription Workflow'
+description: 'A repeatable process for transcribing recordings that contain more than one spoken language.'
+date: 2026-05-20
+author: 'CodeAustral OSS'
+---
+
+# Mixed-Language Transcription Workflow
+
+## Definition
+
+A mixed-language transcription workflow is a repeatable process for preparing,
+transcribing, reviewing, and storing audio or video recordings that contain more
+than one spoken language. It combines clean input handling, provider selection,
+language-aware prompts, transcript review, and handoff artifacts so the final
+text can be trusted by engineering, support, research, or content teams.
+
+## Context and Usage
+
+Mixed-language workflows are useful when product demos, interviews, customer
+calls, incident reviews, or community recordings switch between languages. A
+good workflow keeps API keys out of source control, runs the same commands in a
+reproducible development environment, records which provider and settings were
+used, and includes a human review pass for names, acronyms, timestamps, and
+domain-specific terms.
diff --git a/guides/20260520_guide_soniox_sapat_mixed_language_transcription.md b/guides/20260520_guide_soniox_sapat_mixed_language_transcription.md
@@ -0,0 +1,313 @@
+---
+title: "Run Soniox transcription with Sapat in Daytona"
+description: "Build a reproducible Daytona workspace for mixed-language transcription using Sapat and Soniox Speech-to-Text."
+date: 2026-05-20
+author: "CodeAustral OSS"
+tags: ["daytona", "python", "transcription", "soniox"]
+---
+
+# Run Soniox transcription with Sapat in Daytona
+
+# Introduction
+
+Transcription gets messy when a recording is not a neat single-language demo.
+Customer calls, conference hallway interviews, product walkthroughs, and
+community recordings often move between English and another language, mention
+product names quickly, and include acronyms that a generic transcript can
+distort. A useful workflow should make the environment reproducible, keep API
+keys private, and leave a short trail showing how the transcript was produced.
+
+This guide shows how to run Sapat, a small Python video transcription tool, in a
+[Daytona workspace](../definitions/20240819_definition_daytona%20workspace.md)
+with a Soniox Speech-to-Text provider. The companion Sapat implementation adds a
+`--api soniox` option that uses the official Soniox Python SDK, sends the MP3
+created by Sapat to an asynchronous Soniox transcription job, waits for the
+result, writes the text file, and cleans up the remote transcription by default.
+
+![Soniox and Sapat transcription workflow](assets/20260520_soniox_sapat_mixed_language_workflow.svg)
+
+## TL;DR
+
+- Use Daytona to create a clean Python workspace for Sapat.
+- Configure Soniox through environment variables, not committed files.
+- Run Sapat with `--api soniox` after the provider branch is installed.
+- Keep a simple run manifest so transcripts are auditable later.
+- Review mixed-language transcripts for names, acronyms, and speaker context.
+
+## Prerequisites
+
+You will need:
+
+- Daytona installed and connected to your preferred IDE.
+- Python 3.10 or later in the workspace.
+- `ffmpeg`, because Sapat converts video files to MP3 before transcription.
+- A Soniox account and project API key.
+- A short `.mp4`, `.m4a`, `.wav`, or `.mp3` recording you are allowed to
+  process.
+
+Do not commit recordings, transcripts with private information, or `.env` files.
+Use a throwaway sample recording when you are testing the flow for the first
+time.
+
+## Step 1: Create the Daytona workspace
+
+Start from the Sapat repository. If the Soniox provider has already been merged
+upstream, create the workspace from the main repository:
+
+```bash
+daytona create https://github.com/nkkko/sapat --code
+```
+
+While the provider pull request is under review, you can test the same workflow
+from the companion branch:
+
+```bash
+daytona create https://github.com/codeaustral-oss/sapat --code
+git switch codeaustral/soniox-provider
+```
+
+Inside the workspace, install Sapat in editable mode:
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+python -m pip install -e .
+```
+
+Confirm the CLI exposes the Soniox provider:
+
+```bash
+sapat --help
+```
+
+The API option should include `soniox` alongside `openai`, `groq`, and `azure`.
+
+## Step 2: Configure Soniox without leaking secrets
+
+Create a local `.env` file. This file should stay outside version control.
+
+```bash
+cat > .env <<'EOF'
+SONIOX_API_KEY=replace-with-your-project-key
+SONIOX_MODEL=stt-async-v4
+SONIOX_DESTROY_AFTER_TRANSCRIPTION=true
+EOF
+```
+
+`SONIOX_API_KEY` authenticates requests to Soniox. `SONIOX_MODEL` selects the
+async speech-to-text model used for recorded files. The cleanup flag keeps the
+workflow tidy by deleting the remote transcription job and uploaded file after
+Sapat has pulled the transcript text.
+
+If you are sharing the repository with a team, commit only a `.env.example` with
+empty values. Never paste API keys into issues, pull requests, screenshots, or
+articles.
+
+## Step 3: Prepare a mixed-language sample
+
+Place your test recording under a local folder such as `samples/`:
+
+```bash
+mkdir -p samples runs
+cp ~/Downloads/customer-demo.mp4 samples/customer-demo.mp4
+```
+
+For a realistic mixed-language test, choose a recording that includes at least
+one language switch and a few domain terms. For example, a developer might say:
+"The webhook retry failed twice, entonces revisamos el payload, and then the
+queue recovered." This kind of sentence is a good stress test because it mixes
+language, product vocabulary, and operational context.
+
+Before uploading any recording to a transcription provider, confirm that you
+have permission to process it and that your data handling matches your team's
+policies.
+
+## Why use Soniox for this workflow?
+
+Sapat already supports OpenAI, Groq, and Azure OpenAI. Soniox is useful when you
+want a transcription provider that is designed around speech-to-text workflows,
+including asynchronous file transcription and automatic handling for recordings
+that may contain more than one language. In a team workflow, that means the
+developer running the transcript job can keep the operational steps simple:
+prepare the file, submit it, wait for the result, save the transcript, and clean
+up the remote job.
+
+The provider choice should still be deliberate. Use Soniox when the recording is
+speech-heavy, when language switching matters, or when the transcript will be
+reviewed and reused later. Use another provider when your team already has
+approved infrastructure, data residency requirements, or billing controls tied
+to that provider. The value of this guide is not that every recording must go
+through Soniox; it is that the workflow remains portable and auditable because
+Sapat keeps the command surface consistent.
+
+## Step 4: Run Sapat with Soniox
+
+Run the transcription command:
+
+```bash
+sapat samples/customer-demo.mp4 \
+  --quality M \
+  --language en \
+  --prompt "Product demo with English and Spanish technical vocabulary" \
+  --temperature 0.3 \
+  --api soniox
+```
+
+Sapat will convert the video to MP3 with `ffmpeg`, submit the audio to Soniox,
+wait for the async job to finish, write a `.txt` file next to the input, and
+remove the temporary MP3. With the cleanup flag enabled, the Soniox provider
+also destroys the remote job after retrieving the transcript.
+
+The result should appear as:
+
+```text
+samples/customer-demo.txt
+```
+
+Open the file and check the first pass. Do not expect any transcription provider
+to know every internal product name or speaker nickname. The goal of the first
+pass is to create a useful draft that can be reviewed quickly.
+
+## Step 5: Save a run manifest
+
+For repeatable work, keep a small run manifest alongside the transcript. This is
+especially useful when several people review recordings or when the transcript
+feeds a downstream search, support, or documentation workflow.
+
+```bash
+cat > runs/customer-demo-20260520.md <<'EOF'
+# customer-demo transcription run
+
+- Input file: samples/customer-demo.mp4
+- Output file: samples/customer-demo.txt
+- Workspace: Daytona
+- Tool: Sapat
+- Provider: Soniox
+- Model: stt-async-v4
+- Quality: M
+- Language hint: en
+- Prompt: Product demo with English and Spanish technical vocabulary
+- Review status: needs human review
+EOF
+```
+
+The manifest should not include API keys, private customer names, payout
+details, or local machine paths that are not useful to another reviewer.
+
+## Step 6: Review the transcript
+
+A mixed-language transcript is not finished when the API returns text. Review it
+with a short checklist:
+
+- Product names, acronyms, and company-specific terms are spelled correctly.
+- Language switches are preserved instead of flattened into one language.
+- The transcript does not expose private information that should be redacted.
+- Action items, decisions, and dates are readable without replaying the audio.
+- Any uncertain words are marked for a second listener instead of guessed.
+
+If the transcript is going into documentation or a customer-facing artifact,
+create a cleaned copy rather than editing the raw transcript in place. Keep the
+raw output, the reviewed transcript, and the manifest separate.
+
+## Step 7: Package the reviewed output
+
+Once the transcript is reviewed, create a small folder for the final artifacts.
+This keeps raw text, edited text, and notes from being mixed together.
+
+```bash
+mkdir -p handoff/customer-demo
+cp samples/customer-demo.txt handoff/customer-demo/raw-transcript.txt
+cp runs/customer-demo-20260520.md handoff/customer-demo/run-manifest.md
+touch handoff/customer-demo/review-notes.md
+```
+
+Use `review-notes.md` for anything a future reader needs to know:
+
+```markdown
+# review notes
+
+- Speaker names normalized: "Gaby" -> "Gabriela".
+- Product term checked: "webhook replay" is correct.
+- Spanish section starts around the customer escalation discussion.
+- Two uncertain words remain marked with `[inaudible]`.
+```
+
+If the transcript will feed a retrieval system, documentation draft, or support
+handoff, create a second edited transcript rather than changing the raw output:
+
+```bash
+cp handoff/customer-demo/raw-transcript.txt \
+  handoff/customer-demo/reviewed-transcript.txt
+```
+
+That separation matters. Raw transcripts help you debug provider behavior,
+reviewed transcripts help teammates read the content, and manifests explain how
+to reproduce or compare future runs.
+
+## Step 8: Keep the workspace clean
+
+After the handoff is complete, remove temporary files that should not remain in
+the repository:
+
+```bash
+rm -f samples/*.mp3
+git status --short
+```
+
+You should see only the files you intentionally created for the guide or for
+your private handoff. If `git status` shows `.env`, raw recordings, transcripts
+with private data, or local cache files, update `.gitignore` or move those files
+outside the repository before committing anything.
+
+## Troubleshooting
+
+**Problem:** `sapat --help` does not show `soniox`.
+
+**Solution:** Confirm you are on the provider branch or a version of Sapat that
+includes the Soniox implementation. Reinstall editable mode with
+`python -m pip install -e .`.
+
+**Problem:** Soniox authentication fails.
+
+**Solution:** Check that `.env` exists in the workspace root and that
+`SONIOX_API_KEY` is set for the active shell. Do not paste the key into logs or
+GitHub comments when asking for help.
+
+**Problem:** The transcript misses product vocabulary.
+
+**Solution:** Use a more specific `--prompt`, keep the recording quality as high
+as practical, and add a human review pass for product names and acronyms.
+
+**Problem:** The recording is too noisy.
+
+**Solution:** Try `--quality H` so the MP3 conversion keeps more audio detail.
+If the original is poor, run a short sample first instead of spending time on a
+large file.
+
+**Problem:** A teammate cannot reproduce your transcript.
+
+**Solution:** Compare the run manifest first. The usual differences are model
+name, prompt wording, input file, or whether the reviewer edited the raw output.
+If the input file is private and cannot be shared, keep a short synthetic sample
+that exercises the same language-switching pattern.
+
+## Conclusion
+
+With Daytona, Sapat, and Soniox, you can keep transcription work reproducible
+without turning it into a heavyweight application. The important pieces are the
+same every time: a clean workspace, private API-key handling, a provider command
+that can be repeated, and a review artifact that explains how the transcript was
+created.
+
+This pattern is especially useful for a
+[mixed-language transcription workflow](../definitions/20260520_definition_mixed_language_transcription_workflow.md),
+where the transcript is only valuable if it preserves both the spoken content
+and the engineering context around it.
+
+## References
+
+- [Sapat repository](https://github.com/nkkko/sapat)
+- [Companion Soniox provider PR](https://github.com/nibzard/sapat/pull/22)
+- [Soniox Speech-to-Text get started](https://soniox.com/docs/stt/get-started)
+- [Soniox Python SDK async transcription](https://soniox.com/docs/sdk/python-SDK/stt/async-transcription)
+- [Daytona documentation](https://www.daytona.io/docs)
diff --git a/guides/assets/20260520_soniox_sapat_mixed_language_workflow.svg b/guides/assets/20260520_soniox_sapat_mixed_language_workflow.svg