Skip to content

kiuckhuang/yt-transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Toolchain Setup Guide

This yt-transcript repository provides a comprehensive setup guide for using AI tools on macOS/Linux for voice processing and language modeling. The toolchain includes:

  • Whisper.cpp for voice-to-text
  • CLI LLM for command-line language models
  • Ollama for local LLM execution
  • Cantonese-specific fine-tuned model

Project Structure

yt-transcript/
├── models/                # Whisper model files
├── audios/                # Processed audio files
├── transcripts/           # Custom processing scripts
└── README.md

🛠 Installation

Prerequisites

Homebrew - Package manager for macOS and Linux

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Tools

Install the following applications via Homebrew:

brew install yt-dlp      # YouTube-DL fork (https://github.com/yt-dlp/yt-dlp)
brew install whisper-cpp # Whisper.cpp implementation (https://github.com/ggml-org/whisper.cpp)
brew install ollama      # Open-source LLM platform (https://ollama.com)
brew install llm         # CLI LLM client (https://llm.datasette.io)

Download the project

git clone https://github.com/kiuckhuang/yt-transcript.git
cd yt-transcript
mkdir -p {models,audios,transcripts}

Ollama model pull

Use ollama pull to download modle

ollama pull qwen3:4b

LLM Model Config

Edit ~/Library/"Application Support"/io.datasette.llm/extra-openai-models.yaml (Change to your own user profile path)

- model_id: qwen3:4b
  model_name: qwen3:4b
  aliases: ["qwen3_4b"]
  api_base: "http://localhost:11434/v1"
- model_id: qwen3-32b
  model_name: qwen3-32b
  aliases: ["qwen3_32b"]
  api_base: "http://192.168.1.8:8080/v1"

Non-Local LLM models with API (Google Gemini, avoid shock billing, make sure use only free models)

Whisper Models

  1. Base Whisper Models
    Download from: https://huggingface.co/ggerganov/whisper.cpp

  2. Cantonese Fine-tuned Model
    Download from: https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml

curl -L -o models/whisper-large-v3-cantonese.bf16.bin 'https://huggingface.co/kiuckhuang/whisper-large-v3-cantonese-ggml/resolve/main/whisper-large-v3-cantonese.bf16.bin?download=true'

Sample Usage

  1. Use yt-dlp to download audio content
yt-dlp -f 'ba[acodec^=mp3]/ba/b' -x --audio-format mp3 -o audios/beyond_kol2025.mp3 "https://www.youtube.com/watch?v=9fLILe-SReU"
  1. Process audio files with whisper-cpp using appropriate models
whisper-cli -m models/whisper-large-v3-cantonese.bf16.bin -l auto audios/beyond_kol2025.mp3 -olrc -fa -sns --output-file transcripts/beyond_kol2025
  1. Use llm
  • ollama for language model inference
cat transcripts/beyond_kol2025.lrc | llm -m qwen3_4b -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary /no_think"
  • Google gemini-2.0-flash
cat transcripts/beyond_kol2025.lrc | llm -m gemini-2.0-flash -s "show with Traditional Hong Kong Chinese, list the items discuss in the video transcript, in point form, make summary"

📝 Notes

  • Ensure all models are placed in the models/ directory
  • Check https://llm.datasette.io for CLI LLM configuration options
  • Ollama models can be managed with ollama pull <model-name>

🌐 Resources

About

Transcript YT video with yt-dlp, Whisper and LLM summary

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages