daytonaio · mitre88 · May 24, 2026
diff --git a/authors/dr_alex_mitre.md b/authors/dr_alex_mitre.md
@@ -0,0 +1,6 @@
+Author: Dr Alex Mitre Title: Software Developer Description: Dr Alex Mitre
+builds web, Swift, JavaScript, and Python applications, with experience in
+public-sector software, education technology, and reproducible AI workflows.
+Author Image: [https://avatars.githubusercontent.com/u/30060514?v=4] Author
+LinkedIn: Author Twitter: Company Name: Independent Company Description:
+Independent software development and AI workflow automation.
diff --git a/definitions/20260524_definition_baidu_speech_recognition.md b/definitions/20260524_definition_baidu_speech_recognition.md
@@ -0,0 +1,25 @@
+---
+title: 'Baidu Speech Recognition'
+description:
+  'A Baidu Cloud speech-to-text service for converting short audio clips into
+  text through REST APIs and SDKs.'
+date: 2026-05-24
+author: 'Dr Alex Mitre'
+---
+
+# Baidu Speech Recognition
+
+## Definition
+
+Baidu Speech Recognition is a Baidu Cloud speech-to-text service that converts
+spoken audio into text. Its short speech recognition API accepts complete audio
+clips, uses a language model selected by `dev_pid`, and returns recognized text
+in a JSON response.
+
+## Context and Usage
+
+In developer workflows, Baidu Speech Recognition can be used as a transcription
+backend for short demo clips, command recordings, meeting snippets, and
+language-specific speech-to-text tests. A reproducible setup stores credentials
+in environment variables, normalizes audio before submission, and keeps the
+generated transcript with the source clip and validation notes.
diff --git a/guides/20260524_baidu_speech_sapat_daytona.md b/guides/20260524_baidu_speech_sapat_daytona.md
@@ -0,0 +1,186 @@
+---
+title: 'Run Baidu Speech Recognition with Sapat'
+description:
+  'Use Daytona and Sapat to transcribe short clips with Baidu Speech
+  Recognition from a reproducible workspace.'
+date: 2026-05-24
+author: 'Dr Alex Mitre'
+tags: ['ai', 'speech-to-text', 'daytona']
+---
+
+# Run Baidu Speech Recognition with Sapat
+
+# Introduction
+
+Sapat is a small command-line transcription tool that converts video files to
+MP3, sends the audio to a speech-to-text provider, and writes a sidecar text
+file. That makes it useful for demos, meeting clips, QA recordings, and short
+research notes where the transcript should be reproducible from one command.
+
+This guide shows how to run Sapat with a Baidu Speech Recognition provider from
+a Daytona workspace. The workflow keeps provider secrets in environment
+variables, uses Sapat's existing file and directory processing model, and
+documents the validation path before you use the transcript downstream.
+
+## TL;DR
+
+- Use a Daytona workspace so the ffmpeg and Python setup is repeatable.
+- Install the Sapat branch that adds `--api baidu` support.
+- Store `BAIDU_API_KEY` and `BAIDU_SECRET_KEY` in `.env`, not in Git.
+- Use short audio clips because Baidu's short speech API is designed for files
+  up to 60 seconds.
+- Verify the generated `.txt` output before using it in summaries, tickets, or
+  customer-facing notes.
+
+## Step 1: Create a Daytona workspace
+
+Create a workspace from the Sapat repository:
+
+```bash
+daytona create https://github.com/nkkko/sapat --code
+```
+
+Open the workspace terminal and confirm that Python and ffmpeg are available:
+
+```bash
+python --version
+ffmpeg -version
+```
+
+Sapat uses ffmpeg to extract audio from the input video before it calls the
+speech-to-text provider. Keeping that dependency inside the workspace makes the
+same command easier to rerun later.
+
+## Step 2: Install the Baidu-enabled Sapat branch
+
+Install the companion provider implementation:
+
+```bash
+pip install \
+  git+https://github.com/mitre88/sapat.git@add-baidu-transcription-provider
+```
+
+Confirm that the new provider is available:
+
+```bash
+sapat --help
+```
+
+The `--api` option should include `baidu` alongside the existing OpenAI, Groq,
+and Azure choices.
+
+## Step 3: Configure Baidu credentials safely
+
+Create a local `.env` file in the workspace root:
+
+```bash
+cat > .env <<'EOF'
+BAIDU_API_KEY=replace_with_your_api_key
+BAIDU_SECRET_KEY=replace_with_your_secret_key
+BAIDU_DEV_PID=1737
+BAIDU_SAMPLE_RATE=16000
+EOF
+```
+
+Use `BAIDU_DEV_PID=1737` for English clips and `BAIDU_DEV_PID=1537` for
+Mandarin clips. Leave `.env` uncommitted. A good workspace habit is to check the
+Git state before and after each transcription run:
+
+```bash
+git status --short
+```
+
+If `.env` appears in the output, add it to `.gitignore` before continuing.
+
+## Step 4: Prepare a short clip
+
+Baidu's short speech endpoint is meant for complete audio files under 60
+seconds. If the source video is longer, cut a smaller sample first:
+
+```bash
+ffmpeg -i long-demo.mp4 -t 45 -c copy baidu-demo-clip.mp4
+```
+
+For production workflows, split longer recordings into reviewed chunks and keep
+a manifest with the source file, clip window, language, and expected topic.
+That makes transcript review and reruns traceable.
+
+## Step 5: Run Sapat with Baidu
+
+Run Sapat on the prepared clip:
+
+```bash
+sapat baidu-demo-clip.mp4 --api baidu --language en --quality M
+```
+
+Sapat will create `baidu-demo-clip.txt` next to the input file. The Baidu
+provider converts audio to mono 16 kHz MP3 before sending the request, fetches a
+Baidu access token from the configured API key and secret, and then submits the
+base64-encoded audio to Baidu's short speech recognition endpoint.
+
+For Mandarin clips, use:
+
+```bash
+sapat baidu-demo-clip.mp4 --api baidu --language zh-CN --quality M
+```
+
+## Step 6: Review and record the result
+
+Open the transcript and check it against the source clip:
+
+```bash
+sed -n '1,120p' baidu-demo-clip.txt
+```
+
+Record a small run note with the command, clip duration, language, provider, and
+review status. For example:
+
+```text
+source: baidu-demo-clip.mp4
+duration: 45 seconds
+provider: baidu
+language: en
+command: sapat baidu-demo-clip.mp4 --api baidu --language en --quality M
+review: checked for speaker names, product names, and missing sentences
+```
+
+That run note is useful when the transcript becomes input to a bug report,
+customer summary, release note, or retrieval dataset.
+
+## Common Issues and Troubleshooting
+
+**Problem:** `BAIDU_API_KEY and BAIDU_SECRET_KEY must be set.`
+
+**Solution:** Confirm the `.env` file is in the workspace root and that the key
+names match exactly. Restart the shell if your workflow exports variables
+outside `.env`.
+
+**Problem:** The API returns an audio quality or format error.
+
+**Solution:** Keep clips short, use one audio channel, and let the Baidu
+provider convert the input to 16 kHz MP3. If the source file has unusual audio,
+normalize it first with ffmpeg and rerun Sapat.
+
+**Problem:** The transcript is empty or misses domain words.
+
+**Solution:** Verify that the `BAIDU_DEV_PID` matches the spoken language.
+Then rerun with a smaller clip and compare the result before processing a larger
+batch.
+
+## Conclusion
+
+You now have a reproducible Daytona workflow for short Baidu-backed
+transcription jobs in Sapat. The important parts are keeping credentials out of
+Git, using short source clips that fit Baidu's API model, validating the output
+before reuse, and keeping the command plus review notes with the generated
+transcript.
+
+## References
+
+- [Companion Sapat PR](https://github.com/nibzard/sapat/pull/47)
+- [Baidu short speech API](https://cloud.baidu.com/doc/SPEECH/s/Jlbxdezuf)
+- [Baidu Python SDK reference](https://cloud.baidu.com/doc/SPEECH/s/0lbxfnc9b)
+- [Sapat repository](https://github.com/nkkko/sapat)
+- [Daytona repository](https://github.com/daytonaio/daytona)
+
+![Baidu Sapat workflow](/assets/20260524_baidu_sapat_transcription_flow.svg)
diff --git a/guides/assets/20260524_baidu_sapat_transcription_flow.svg b/guides/assets/20260524_baidu_sapat_transcription_flow.svg