This yt-transcript repository provides a comprehensive setup guide for using AI tools on macOS/Linux for voice processing and language modeling. The toolchain includes:
- Whisper.cpp for voice-to-text
- CLI LLM for command-line language models
- Ollama for local LLM execution
- Cantonese-specific fine-tuned model
yt-transcript/
├── models/ # Whisper model files
├── audios/ # Processed audio files
├── transcripts/ # Custom processing scripts
└── README.md
Homebrew - Package manager for macOS and Linux
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Install the following applications via Homebrew:
brew install yt-dlp # YouTube-DL fork (https://github.com/yt-dlp/yt-dlp)
brew install whisper-cpp # Whisper.cpp implementation (https://github.com/ggml-org/whisper.cpp)
brew install ollama # Open-source LLM platform (https://ollama.com)
brew install llm # CLI LLM client (https://llm.datasette.io)git clone https://github.com/kiuckhuang/yt-transcript.git
cd yt-transcript
mkdir -p {models,audios,transcripts}Use ollama pull to download modle
ollama pull qwen3:4bEdit ~/Library/"Application Support"/io.datasette.llm/extra-openai-models.yaml (Change to your own user profile path)
- model_id: qwen3:4b
model_name: qwen3:4b
aliases: ["qwen3_4b"]
api_base: "http://localhost:11434/v1"
- model_id: qwen3-32b
model_name: qwen3-32b
aliases: ["qwen3_32b"]
api_base: "http://192.168.1.8:8080/v1"
-
Base Whisper Models
Download from: https://huggingface.co/ggerganov/whisper.cpp -
Cantonese Fine-tuned Model
Download from: https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml
curl -L -o models/whisper-large-v3-cantonese.bf16.bin 'https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml/resolve/main/whisper-large-v3-cantonese.bf16.bin?download=true'- Use
yt-dlpto download audio content
yt-dlp -f 'ba[acodec^=mp3]/ba/b' -x --audio-format mp3 -o audios/beyond_kol2025.mp3 "https://www.youtube.com/watch?v=9fLILe-SReU"- Process audio files with
whisper-cppusing appropriate models
whisper-cli -m models/whisper-large-v3-cantonese.bf16.bin -l auto audios/beyond_kol2025.mp3 -olrc -fa -sns --output-file transcripts/beyond_kol2025- Use
llm
ollamafor language model inference
cat transcripts/beyond_kol2025.lrc | llm -m qwen3_4b -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary /no_think"- Google gemini-2.0-flash
cat transcripts/beyond_kol2025.lrc | llm -m gemini-2.0-flash -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary"- Ensure all models are placed in the
models/directory - Check https://llm.datasette.io for CLI LLM configuration options
- Ollama models can be managed with
ollama pull <model-name>