diff --git a/authors/assets/images/jalil-hadj-habib.svg b/authors/assets/images/jalil-hadj-habib.svg new file mode 100644 index 00000000..734d0001 --- /dev/null +++ b/authors/assets/images/jalil-hadj-habib.svg @@ -0,0 +1,6 @@ + + + + + JH + diff --git a/authors/jalil-hadj-habib.md b/authors/jalil-hadj-habib.md new file mode 100644 index 00000000..c58b8469 --- /dev/null +++ b/authors/jalil-hadj-habib.md @@ -0,0 +1,8 @@ +Author: Jalil Hadj Habib +Title: Full-Stack Developer +Description: Jalil Hadj Habib is a full-stack developer and information systems engineer focused on Laravel, Vue.js, React, TypeScript, Firebase, APIs, dashboards, and practical workflow tools for business software. +Company Name: +Company Description: +Author Image: /assets/images/jalil-hadj-habib.svg +Company Logo Dark: +Company Logo White: diff --git a/definitions/20260520_definition_transcript_regression_test.md b/definitions/20260520_definition_transcript_regression_test.md new file mode 100644 index 00000000..0d8e0d59 --- /dev/null +++ b/definitions/20260520_definition_transcript_regression_test.md @@ -0,0 +1,24 @@ +--- +title: "Transcript Regression Test" +description: "A repeatable quality check that compares new speech-to-text output against expected transcript phrases or fixtures." +date: 2026-05-20 +author: "Jalil Hadj Habib" +--- + +# Transcript Regression Test + +## Definition + +A transcript regression test is a repeatable quality check for speech-to-text +workflows. It runs known audio fixtures through a transcription pipeline and +compares the generated transcript against expected phrases, terms, or reference +text. + +## Context and Usage + +Transcript regression tests help teams detect output drift when they change a +transcription provider, model, prompt, audio conversion setting, or correction +step. They are especially useful when transcripts contain product names, +technical acronyms, invoice numbers, speaker names, or domain-specific terms +that must not disappear before the text is used in summaries, search indexes, or +downstream AI workflows. diff --git a/guides/20260520_guide_sapat_transcript_regression_tests.md b/guides/20260520_guide_sapat_transcript_regression_tests.md new file mode 100644 index 00000000..4f4b7031 --- /dev/null +++ b/guides/20260520_guide_sapat_transcript_regression_tests.md @@ -0,0 +1,379 @@ +--- +title: "Build Transcript Regression Tests With Sapat" +description: "Use Daytona and Sapat to create repeatable transcript smoke tests before running larger AI transcription batches." +date: 2026-05-20 +author: "Jalil Hadj Habib" +tags: ["daytona", "sapat", "transcription", "testing", "ai"] +--- + +# Build Transcript Regression Tests With Sapat + +## Introduction + +AI transcription tools are easy to run once. They are harder to trust every +week, across different providers, prompts, languages, and audio quality levels. +A provider can change a model behind the scenes. A prompt can improve punctuation +but damage product names. A low quality MP3 conversion can make a clear demo +sound like a support call from a moving train. + +That is why a small transcript regression harness is useful. Instead of sending +every recording through Sapat and discovering issues at the end of a batch, you +keep a few short audio fixtures, expected transcript snippets, and a simple +quality gate in a reproducible [Daytona workspace](). +The workflow gives AI engineers a fast answer to one practical question: +is this Sapat configuration still good enough to trust? + +In this guide, you will create a Daytona workspace for +[`nkkko/sapat`](https://github.com/nkkko/sapat), run Sapat against small fixture +files, record each run in a manifest, and compare the generated transcripts +against expected phrases. The goal is not to build a full speech recognition +benchmark. The goal is a lightweight [transcript regression test]() +that catches obvious quality drift before it reaches production notes, meeting +summaries, release packets, or RAG ingestion pipelines. + +## TL;DR + +- **Create a Daytona workspace** for Sapat so the transcription setup is repeatable. +- **Keep short fixture recordings** that cover names, acronyms, numbers, and noisy speech. +- **Run Sapat with one provider at a time** using `--api`, `--quality`, `--language`, `--prompt`, `--temperature`, and optional `--correct`. +- **Store a manifest** for every smoke run so outputs are comparable. +- **Fail early** when required phrases are missing from the generated transcript. + +## How the Harness Works + +The harness has four parts: + +1. fixture recordings that represent the content you care about; +2. expected snippets that must appear in the transcript; +3. a repeatable Sapat command for each provider or quality setting; +4. a checker script that compares transcript output with expectations. + +![Transcript regression harness flow](assets/images/20260520_sapat_transcript_regression_tests_flow.svg) + +This is deliberately smaller than a formal word error rate benchmark. Formal +benchmarks need aligned transcripts, stable audio corpora, and scoring rules. +For day-to-day engineering work, the first useful gate is simpler: + +- Did the provider keep the product name? +- Did it preserve the API acronym? +- Did it capture the invoice number? +- Did the correction pass introduce or remove critical words? +- Did the same fixture pass yesterday but fail after a prompt change? + +Sapat already gives you the important controls for this harness. The current CLI +accepts a file or directory input, uses `ffmpeg` to convert media to MP3, writes +a `.txt` sidecar beside the input file, and supports `--api openai`, `--api groq`, +or `--api azure`. It also exposes `--quality`, `--language`, `--prompt`, +`--temperature`, and `--correct`, which are exactly the knobs that tend to change +transcript output. + +## Step 1: Create the Daytona Workspace + +Install Daytona if it is not already available on your machine: + +```bash +curl -L https://download.daytona.io/daytona/install.sh | sudo bash +``` + +Create a workspace from the Sapat repository: + +```bash +daytona create https://github.com/nkkko/sapat --code +``` + +Inside the workspace terminal, confirm the project layout: + +```bash +ls +find src/sapat -maxdepth 3 -type f | sort +``` + +You should see the Sapat package, including the Click-based CLI in +`src/sapat/script.py` and provider implementations under `src/sapat/transcription`. +The important behavior for this guide is: + +- `sapat ` processes one video file or all `.mp4` files in a directory; +- `--api` chooses `openai`, `groq`, or `azure`; +- `--quality` chooses MP3 conversion quality: `L`, `M`, or `H`; +- `--correct` runs an LLM correction pass after transcription; +- the generated transcript is saved as a `.txt` file beside the source media. + +## Step 2: Install Dependencies and Configure Secrets + +Create a virtual environment and install dependencies: + +```bash +python -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt +pip install -e . +``` + +Confirm the CLI is available: + +```bash +sapat --help +``` + +Then create a local `.env` file. Sapat supports Azure OpenAI, Groq, and OpenAI. +Use only the provider you plan to test first: + +```env +# Groq +GROQCLOUD_API_KEY=your_groq_key +GROQCLOUD_MODEL=whisper-large-v3-turbo +GROQCLOUD_API_ENDPOINT=https://api.groq.com/openai/v1/audio/transcriptions +GROQCLOUD_MODEL_NAME_CHAT=llama3-8b-8192 +``` + +Keep `.env` out of Git. Daytona gives you a reproducible workspace, but secrets +should still be local environment data, not repository content. + +## Step 3: Create Fixture and Expectation Files + +Create a small test area: + +```bash +mkdir -p transcript-tests/fixtures transcript-tests/expected transcript-tests/runs +``` + +Add two or three short `.mp4` files to `transcript-tests/fixtures`. Keep them +small: 10 to 45 seconds each is enough. Good fixtures cover the words that are +expensive to lose: + +| Fixture | What it should test | Example required phrases | +| --- | --- | --- | +| `api-demo.mp4` | acronyms and endpoint names | `Sapat`, `Groq`, `OpenAI`, `webhook` | +| `support-call.mp4` | noisy speech and numbers | `invoice 4729`, `Friday`, `refund` | +| `release-note.mp4` | product names and action items | `beta dashboard`, `migration`, `owner` | + +For each fixture, create a matching expectation file. Example: + +```bash +cat > transcript-tests/expected/api-demo.expected.txt <<'EOF' +Sapat +Groq +OpenAI +webhook +EOF +``` + +These files do not need to contain the full transcript. They contain terms that +must survive the transcription path. This keeps the gate maintainable when +providers produce slightly different punctuation or sentence breaks. + +## Step 4: Run a Smoke Transcription With Sapat + +Start with one fixture and one provider: + +```bash +sapat transcript-tests/fixtures/api-demo.mp4 \ + --api groq \ + --quality M \ + --language en \ + --prompt "Technical product demo with API names: Sapat, Groq, OpenAI, webhook." \ + --temperature 0.3 +``` + +Sapat converts the media to MP3, transcribes the audio, removes the temporary +MP3, and writes: + +```text +transcript-tests/fixtures/api-demo.txt +``` + +Read the transcript before automating anything: + +```bash +sed -n '1,120p' transcript-tests/fixtures/api-demo.txt +``` + +If the output is wildly wrong, adjust only one variable at a time. For example, +try `--quality H` before changing the prompt, or try `--temperature 0` before +turning on `--correct`. Regression tests are useful because they keep those +choices visible. + +## Step 5: Save a Run Manifest + +Create a small manifest after each test run: + +```bash +cat > transcript-tests/runs/$(date -u +%Y%m%dT%H%M%SZ)-groq-api-demo.json <<'EOF' +{ + "fixture": "api-demo.mp4", + "provider": "groq", + "quality": "M", + "language": "en", + "temperature": 0.3, + "correct": false, + "prompt": "Technical product demo with API names: Sapat, Groq, OpenAI, webhook.", + "output": "api-demo.txt" +} +EOF +``` + +The manifest is not complicated, but it matters. When a transcript changes, you +can see whether the provider, quality level, prompt, correction setting, or input +file changed with it. + +## Step 6: Add a Phrase Gate + +Create a simple checker: + +```bash +cat > transcript-tests/check_transcript.py <<'PY' +from pathlib import Path +import sys + +if len(sys.argv) != 3: + print("usage: check_transcript.py EXPECTED_FILE TRANSCRIPT_FILE") + raise SystemExit(2) + +expected_path = Path(sys.argv[1]) +transcript_path = Path(sys.argv[2]) + +expected = [ + line.strip().casefold() + for line in expected_path.read_text(encoding="utf-8").splitlines() + if line.strip() +] +transcript = transcript_path.read_text(encoding="utf-8").casefold() + +missing = [phrase for phrase in expected if phrase not in transcript] + +if missing: + print("Missing required transcript phrases:") + for phrase in missing: + print(f"- {phrase}") + raise SystemExit(1) + +print(f"PASS: {transcript_path.name} includes {len(expected)} required phrases") +PY +``` + +Run the gate: + +```bash +python transcript-tests/check_transcript.py \ + transcript-tests/expected/api-demo.expected.txt \ + transcript-tests/fixtures/api-demo.txt +``` + +This is a smoke test, not a final editor. It should fail loudly when important +words disappear and stay quiet when punctuation changes. + +## Step 7: Compare Quality and Correction Settings + +Now run the same fixture with a second configuration: + +```bash +sapat transcript-tests/fixtures/api-demo.mp4 \ + --api groq \ + --quality H \ + --language en \ + --prompt "Technical product demo with API names: Sapat, Groq, OpenAI, webhook." \ + --temperature 0.3 \ + --correct +``` + +Run the same phrase gate again. If both `M` and `H` pass, keep `M` unless the +full transcript shows quality problems. If `--correct` changes a required phrase, +do not use it blindly for that content type. Correction can improve readability, +but it can also normalize technical names into more common words. + +For a larger comparison, use a table in your run notes: + +| Fixture | Provider | Quality | Correct | Phrase gate | Human note | +| --- | --- | --- | --- | --- | --- | +| `api-demo.mp4` | Groq | M | no | pass | Best cost-quality tradeoff | +| `api-demo.mp4` | Groq | H | no | pass | No visible improvement | +| `api-demo.mp4` | Groq | H | yes | fail | Corrected `Sapat` to `support` | + +This gives your team a simple decision log before they process a directory of +customer calls, podcast episodes, demos, or engineering meetings. + +## Step 8: Run a Directory Batch Only After the Gate Passes + +Once your fixtures pass, run Sapat on a directory: + +```bash +sapat transcript-tests/fixtures \ + --api groq \ + --quality M \ + --language en \ + --prompt "Technical conversations with product names, acronyms, and API terms." \ + --temperature 0.3 +``` + +Sapat processes every `.mp4` file in the directory. After the run, check each +expected file against its transcript: + +```bash +for expected in transcript-tests/expected/*.expected.txt; do + name="$(basename "$expected" .expected.txt)" + python transcript-tests/check_transcript.py \ + "$expected" \ + "transcript-tests/fixtures/$name.txt" +done +``` + +If the loop passes, you have enough confidence to process the larger source +folder using the same provider and settings. + +## Common Issues and Troubleshooting + +**Problem:** `sapat` is not found after installation. + +**Solution:** activate the virtual environment again with +`source .venv/bin/activate`, or run `pip install -e .` inside the Daytona +workspace. + +**Problem:** `ffmpeg` is missing. + +**Solution:** install it in the workspace image or terminal: + +```bash +sudo apt-get update +sudo apt-get install -y ffmpeg +``` + +**Problem:** the transcript is empty or very short. + +**Solution:** verify that the source file has an audio track. Run: + +```bash +ffprobe transcript-tests/fixtures/api-demo.mp4 +``` + +**Problem:** provider terms are wrong even with a good recording. + +**Solution:** add those terms to the `--prompt` value and rerun the fixture. If +the issue only appears after `--correct`, disable correction or tighten the +correction prompt in the provider implementation before using it in production. + +**Problem:** a directory batch overwrote a previous `.txt` output. + +**Solution:** copy important transcripts into a timestamped run folder after each +batch. Sapat writes sidecar `.txt` files next to the media, so the harness should +treat those files as current outputs, not permanent archives. + +## Conclusion + +Sapat gives AI engineers a practical CLI for converting videos to MP3, +transcribing them through OpenAI, Groq, or Azure OpenAI, and saving sidecar text +files. Daytona gives the same team a repeatable place to run that workflow +without rebuilding the environment every time. + +The transcript regression harness connects those pieces. With a few fixtures, +expected phrases, run manifests, and a small checker script, you can catch drift +before it hits real content. That makes Sapat more useful for production-like +workflows where transcripts feed release notes, support summaries, knowledge +bases, incident reviews, or AI search. + +## References + +- [Sapat repository](https://github.com/nkkko/sapat) +- [Daytona repository](https://github.com/daytonaio/daytona) +- [OpenAI audio transcription API](https://platform.openai.com/docs/guides/speech-to-text) +- [Groq speech-to-text documentation](https://console.groq.com/docs/speech-to-text) +- [Azure OpenAI audio documentation](https://learn.microsoft.com/azure/ai-services/openai/whisper-quickstart) diff --git a/guides/assets/images/20260520_sapat_transcript_regression_tests_flow.svg b/guides/assets/images/20260520_sapat_transcript_regression_tests_flow.svg new file mode 100644 index 00000000..946d4ee8 --- /dev/null +++ b/guides/assets/images/20260520_sapat_transcript_regression_tests_flow.svg @@ -0,0 +1,39 @@ + + + Sapat Transcript Regression Harness + A small quality gate before larger transcription batches in Daytona + + + Audio Fixtures + short MP4 files + known terms + + + Sapat Run + provider and flags + MP3 conversion + + + Transcript + sidecar .txt file + current output + + + Phrase Gate + expected snippets + pass or fail + + + + + + + Run Manifest + fixture, provider, quality, language, prompt, temperature, correction, output path + + + + + + +