AI Toolchain Setup Guide

This yt-transcript repository provides a comprehensive setup guide for using AI tools on macOS/Linux for voice processing and language modeling. The toolchain includes:

Whisper.cpp for voice-to-text
CLI LLM for command-line language models
Ollama for local LLM execution
Cantonese-specific fine-tuned model

Project Structure

yt-transcript/
├── models/                # Whisper model files
├── audios/                # Processed audio files
├── transcripts/           # Custom processing scripts
└── README.md

🛠 Installation

Prerequisites

Homebrew - Package manager for macOS and Linux

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Tools

Install the following applications via Homebrew:

brew install yt-dlp      # YouTube-DL fork (https://github.com/yt-dlp/yt-dlp)
brew install whisper-cpp # Whisper.cpp implementation (https://github.com/ggml-org/whisper.cpp)
brew install ollama      # Open-source LLM platform (https://ollama.com)
brew install llm         # CLI LLM client (https://llm.datasette.io)

Download the project

git clone https://github.com/kiuckhuang/yt-transcript.git
cd yt-transcript
mkdir -p {models,audios,transcripts}

Ollama model pull

Use ollama pull to download modle

ollama pull qwen3:4b

LLM Model Config

Edit ~/Library/"Application Support"/io.datasette.llm/extra-openai-models.yaml (Change to your own user profile path)

- model_id: qwen3:4b
  model_name: qwen3:4b
  aliases: ["qwen3_4b"]
  api_base: "http://localhost:11434/v1"
- model_id: qwen3-32b
  model_name: qwen3-32b
  aliases: ["qwen3_32b"]
  api_base: "http://192.168.1.8:8080/v1"

Non-Local LLM models with API (Google Gemini, avoid shock billing, make sure use only free models)

Whisper Models

Base Whisper Models
Download from: https://huggingface.co/ggerganov/whisper.cpp
Cantonese Fine-tuned Model
Download from: https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml

curl -L -o models/whisper-large-v3-cantonese.bf16.bin 'https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml/resolve/main/whisper-large-v3-cantonese.bf16.bin?download=true'

Sample Usage

Use yt-dlp to download audio content

yt-dlp -f 'ba[acodec^=mp3]/ba/b' -x --audio-format mp3 -o audios/beyond_kol2025.mp3 "https://www.youtube.com/watch?v=9fLILe-SReU"

Process audio files with whisper-cpp using appropriate models

whisper-cli -m models/whisper-large-v3-cantonese.bf16.bin -l auto audios/beyond_kol2025.mp3 -olrc -fa -sns --output-file transcripts/beyond_kol2025

Use llm

ollama for language model inference

cat transcripts/beyond_kol2025.lrc | llm -m qwen3_4b -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary /no_think"

Google gemini-2.0-flash

cat transcripts/beyond_kol2025.lrc | llm -m gemini-2.0-flash -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary"

📝 Notes

Ensure all models are placed in the models/ directory
Check https://llm.datasette.io for CLI LLM configuration options
Ollama models can be managed with ollama pull <model-name>

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
beyongd-kol-example.sh		beyongd-kol-example.sh
example_by_id.sh		example_by_id.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Toolchain Setup Guide

Project Structure

🛠 Installation

Prerequisites

Tools

Download the project

Ollama model pull

LLM Model Config

Non-Local LLM models with API (Google Gemini, avoid shock billing, make sure use only free models)

Whisper Models

Sample Usage

📝 Notes

🌐 Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Toolchain Setup Guide

Project Structure

🛠 Installation

Prerequisites

Tools

Download the project

Ollama model pull

LLM Model Config

Non-Local LLM models with API (Google Gemini, avoid shock billing, make sure use only free models)

Whisper Models

Sample Usage

📝 Notes

🌐 Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages