I'm a hardcore Joe Rogan fan. Through this project, I want to watch some interesting short tangents that have appeared on the past shows of the JRE.
Steps:
- Get YT video link and download the video in 4k
- Get subtitles with timestamps
- Provided by Youtube
- Extracted using AI
- Prompt LLM to find a short story within the whole video and return exact uttered text with subtitles.
- Use the timestamps to piece together the short story and then export the video
- Upload it to my new channel
- Better Algo to concatenate clips
- Add subtitle texts to the video
- Portrait mode - how to focus on the speaker - need computer vision?
- zoom in and out??
- Pass in an hour long podcast transcript??
Immediate tasks:
- Do 4k video download - done
- Write code to split and combine video and audio
- Install FFMPEG
- Split and combine
- Automate using AI
TODO - 15/1-19/1
- make prompts better, currently stories aren't that great.
- claude returns good results but is costly.
- zoom a bit
- modify prompt to return title description of the video. save original video metadata
- extract subtitles with word-by-word timestamp, and use animation.
- host whisper on GCP
- Add effects to subtitles, spread words across the whole timestamp
- Host on GCP - perpetual
TODO - 4/1
- Split and combine video and audio
- Publish to Youtube Channel
- Add subtitles to video
- Download video -> Get Subs -> Process subtitles -> LLM prompt -> csv result -> split, combine
- handle the case when multiple lines of text are there at the same time - check the offsetting function
- handle the case of combining audio after merging frames
- short video aspect ratio
- focus on speaker
- make code modular - use classes ffs!!!
- Make subtitles bigger, and appear on multiple lines if long
- Make code run on multiple processors - done(beautiful work!)
- Add zoom-in, zoom-out etc features when transitioning between speakers