Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
a93f4d8
Add multi-series output paths and refactor processors
dam2452 Feb 9, 2026
a05c153
Hoist imports to top-level; adjust pylint formatting
dam2452 Feb 9, 2026
2804a73
Add safe resize and use in reference processor
dam2452 Feb 9, 2026
540336f
refaktor
dam2452 Feb 9, 2026
9cfce98
.
dam2452 Feb 9, 2026
e622921
Restructure preprocessor and add ES mappings
dam2452 Feb 10, 2026
3afc6c0
Lower scene min length and raise beam size
dam2452 Feb 10, 2026
0250bd2
Support per-series configs and selective pipelines
dam2452 Feb 11, 2026
fa146b2
Make FFmpegWrapper helpers private; fix typings
dam2452 Feb 11, 2026
023720b
Add interlacing detection and refactor scene code
dam2452 Feb 11, 2026
5ea6a7c
Add force_deinterlace and improve detection
dam2452 Feb 11, 2026
6dba0a5
Refactor pipeline and add search CLI
dam2452 Feb 11, 2026
1589b50
Switch to Qwen3-VL-Embedding & update descriptions
dam2452 Feb 11, 2026
de75fc8
Remove redundant docstrings and comments
dam2452 Feb 11, 2026
289e04d
Privatize helper methods & cleanup dead code
dam2452 Feb 11, 2026
a7b6193
Add dataclass fixer; refactor pipeline and configs
dam2452 Feb 11, 2026
6e245e0
Refactor BaseProcessor flow and defaults
dam2452 Feb 11, 2026
5b5599f
Restructure packages and update processors
dam2452 Feb 11, 2026
641bf2f
Move lib to services and add validation step
dam2452 Feb 11, 2026
84bcf48
Standardize step module names with _step suffix
dam2452 Feb 12, 2026
e3cd018
Add resolution analysis step and refactor CLI search
dam2452 Feb 12, 2026
5e8e75b
Support global pipeline steps; drop _executed flags
dam2452 Feb 12, 2026
deeb26b
Refactor resolution analysis and update step modules
dam2452 Feb 12, 2026
b7dc97e
Improve FFmpeg, interlace detection & transcode
dam2452 Feb 12, 2026
83ee831
Update kiepscy.json
dam2452 Feb 12, 2026
9043390
Refactor config, IO and search CLI
dam2452 Feb 12, 2026
2c8b165
Update transcoding_step.py
dam2452 Feb 12, 2026
078fa85
Refactor pipeline, executor, and CLI internals
dam2452 Feb 13, 2026
8663d2b
Refactor: static methods, typing and renames
dam2452 Feb 13, 2026
a163ee8
Add batch processing and model pool support
dam2452 Feb 13, 2026
1860252
Use attribute and fix DDGS import
dam2452 Feb 13, 2026
5924d49
Use config.video_bitrate_mbps property
dam2452 Feb 13, 2026
c677d6e
Add thread-safety and multi in-progress state
dam2452 Feb 13, 2026
98c7114
Refine bitrate scaling; lower parallel episodes
dam2452 Feb 13, 2026
e02b0fd
Increase default max_parallel_episodes to 3
dam2452 Feb 13, 2026
2f114ba
Parallelize video resolution scanning
dam2452 Feb 13, 2026
320a9d8
Refactor transcode bitrates & add resolution check
dam2452 Feb 14, 2026
4588e37
Add OutputDescriptor system and refactor steps
dam2452 Feb 15, 2026
56e1704
Set default frame export to 1 frame & 1080p
dam2452 Feb 16, 2026
937c218
Register object_detections earlier in pipeline
dam2452 Feb 16, 2026
aa146d6
Refactor PipelineStep flow and caching
dam2452 Feb 16, 2026
0ffaa8d
Add source_video_path and fix threadpool order
dam2452 Feb 16, 2026
b1e44f6
Add artifact registry and timestamp frames
dam2452 Feb 16, 2026
22f7653
Add state sync CLI and filesystem reconstructor
dam2452 Feb 16, 2026
2232aef
Make get_output_descriptors public
dam2452 Feb 16, 2026
61fe081
Make output subdir optional and snap to keyframes
dam2452 Feb 17, 2026
7cb1d7a
Refactor FFmpeg usage and bitrate config
dam2452 Feb 17, 2026
692385a
Improve ffmpeg logging and add batch info log
dam2452 Feb 17, 2026
b737e88
Silence ffmpeg and improve interlace logs
dam2452 Feb 17, 2026
de9dacf
Merge branch 'main' into Multi-Series-Support-&-Data-Isolation-(Prepr…
dam2452 Feb 17, 2026
bc15db4
Remove ES index mappings and add local types
dam2452 Feb 17, 2026
410c27a
Parallelize frame export, improve ffmpeg
dam2452 Feb 18, 2026
4a55ea1
Reduce default frames_per_scene to 1
dam2452 Feb 18, 2026
86915b9
Add segment filter steps and refactor transcription
dam2452 Feb 19, 2026
32bac73
Add image hasher device & hex hashes
dam2452 Feb 19, 2026
d72f841
Skip completed episode steps and load cache
dam2452 Feb 19, 2026
3753a3f
Add char ref processor and infra updates
dam2452 Feb 21, 2026
48b97da
Add embedding steps, face clusterer, and episode fallbacks
dam2452 Feb 22, 2026
8fa2611
Replace Qwen-VL with vLLM embedding backend
dam2452 Feb 24, 2026
ed5abe6
Use pooling runner and allow remote code in LLM
dam2452 Feb 24, 2026
3a4261c
Use embed() instead of encode() for embeddings
dam2452 Feb 24, 2026
0b00a3e
Split transcriptions, document outputs & archives
dam2452 Feb 27, 2026
9877625
Fix cv2 lint, hashing path, and segment_range
dam2452 Feb 27, 2026
5172ac0
Add deploy_to_nas script and package init
dam2452 Feb 27, 2026
fb9752d
Update deploy_to_nas.py
dam2452 Feb 28, 2026
e06b918
vLLM: switch model, adjust sampling, 256K context
dam2452 Mar 9, 2026
759c5e3
Add transcription import step and config
dam2452 Mar 9, 2026
1ac92d4
Preserve bitrate for same-res, improve logs
dam2452 Mar 9, 2026
92f8646
Update vLLM install, client and configs
dam2452 Mar 10, 2026
98ac7da
Skip character images; improve models & clustering
dam2452 Mar 11, 2026
7f32017
Use per-episode file layout & refactor validators
dam2452 Mar 12, 2026
b1ddf23
Add BaseTranscriptionStep and refactor steps
dam2452 Mar 13, 2026
3bf23be
Add sejm_demo series config
dam2452 Mar 13, 2026
b9f0ed9
Add pipeline_mode and lower missing-image error
dam2452 Mar 14, 2026
f8fdd0c
Add global-completion flag and exhausted marker handling
dam2452 Mar 15, 2026
03fa082
Update reference_downloader.py
dam2452 Mar 15, 2026
fd1611b
Add search_query_template to scraping config
dam2452 Mar 15, 2026
eee5f11
Use RapidAPI for Google Search; add SerpAPI
dam2452 Mar 15, 2026
69e21ab
Add search engines and lower min image size
dam2452 Mar 15, 2026
f26e38f
Refactor image search and reference downloader
dam2452 Mar 15, 2026
c4a8a46
Use browser-based DuckDuckGo, refactor image scraping
dam2452 Mar 16, 2026
2dc3ffa
Add chunked Whisper transcription for long audio
dam2452 Mar 17, 2026
07617a6
Remove output_data dirs from preprocessor Dockerfile
dam2452 Mar 18, 2026
9a519ee
Add script to split double-episode videos
dam2452 Mar 27, 2026
d770797
Add scribe comparison script and ElevenLabs tweaks
dam2452 Mar 27, 2026
04c8a56
Add series face clustering and cluster-based refs
dam2452 Mar 29, 2026
ef0afb9
Remove legacy face clustering step and validator
dam2452 Mar 29, 2026
439156f
Update config.py
dam2452 Mar 29, 2026
9512662
Face clustering: stricter filters, noise & labels
dam2452 Mar 30, 2026
e3132ef
Update cluster_folder_manager.py
dam2452 Apr 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
* text=auto

*.sh text eol=lf
*.py text eol=lf
Dockerfile text eol=lf
*.dockerignore text eol=lf
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,4 @@ cookies.txt
test_episodes.json
/models
/tmp
/preprocessor/scripts/scribe_compare
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
fail_fast: false
exclude: '^(bot/RANCZO-WIDEO/|bot/RANCZO-TRANSKRYPCJE/)'
exclude: '^(bot/RANCZO-WIDEO/|bot/RANCZO-TRANSKRYPCJE/|scripts/)'
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
Expand Down Expand Up @@ -37,6 +37,7 @@ repos:
- id: chmod
args: ["755"]
files: (.*scripts\/.*.py$|\.sh$)
exclude: (^preprocessor/entrypoint\.sh$|^preprocessor/scripts/)
- id: remove-tabs
args: [--whitespaces-count, '4']
- repo: https://github.com/PyCQA/isort
Expand Down
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
4.0.1
2 changes: 1 addition & 1 deletion bot/services/reindex/reindex_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from bot.services.reindex.video_path_transformer import VideoPathTransformer
from bot.services.reindex.zip_extractor import ZipExtractor
from bot.settings import settings
from preprocessor.search.elastic_manager import ElasticSearchManager
from preprocessor.search.elastic_manager import ElasticSearchManager # pylint: disable=no-name-in-module
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from preprocessor.search.elastic_manager import ElasticSearchManager # pylint: disable=no-name-in-module
from preprocessor.search.elastic_manager import ElasticSearchManager



@dataclass
Expand Down
13 changes: 8 additions & 5 deletions preprocessor/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,18 @@ RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-cache-dir --upgrade pip setuptools wheel \
&& pip install --no-cache-dir \
-r /app/requirements.txt \
vllm==0.13.0 \
--extra-index-url https://pypi.nvidia.com \
&& pip install --no-cache-dir --pre vllm \
--extra-index-url https://wheels.vllm.ai/nightly \
&& pip install --no-cache-dir \
git+https://github.com/huggingface/transformers.git@main \
&& pip uninstall -y flashinfer \
&& pip uninstall -y onnxruntime \
&& pip install --no-cache-dir \
onnxruntime-gpu==1.21.0 \
--extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ \
&& pip uninstall -y nvidia-cudnn-cu11 || true \
&& pip install --no-cache-dir --force-reinstall --no-deps nvidia-cudnn-cu12 \
&& pip uninstall -y nvidia-nccl-cu11 || true \
&& pip install --no-cache-dir --force-reinstall --no-deps nvidia-nccl-cu12

Expand All @@ -58,10 +64,7 @@ RUN mkdir -p \
/models/whisper \
/models/insightface \
/models/ultralytics \
/models/emotion_model \
/app/output_data/characters \
/app/output_data/scraped_pages \
/app/output_data/processing_metadata
/models/emotion_model

COPY bot /app/bot
COPY preprocessor /app/preprocessor
Expand Down
Loading
Loading