Description
Implement parallel processing for batch operations to significantly improve performance on large document sets.
Current behavior
- Documents processed sequentially
- Single-threaded
Proposed behavior
# Process with 4 parallel workers
ragctl batch ./documents --workers 4 --output ./chunks/
Expected improvements
- 3-5x speedup on multi-core systems
- Better CPU utilization
- Configurable worker count
Tasks
Technical considerations
- Use
concurrent.futures.ProcessPoolExecutor for CPU-bound OCR
- Use
ThreadPoolExecutor for I/O-bound operations
- Ensure thread-safe history writing
- Handle keyboard interrupt gracefully
Description
Implement parallel processing for batch operations to significantly improve performance on large document sets.
Current behavior
Proposed behavior
# Process with 4 parallel workers ragctl batch ./documents --workers 4 --output ./chunks/Expected improvements
Tasks
--workers/-joption (default: 1)Technical considerations
concurrent.futures.ProcessPoolExecutorfor CPU-bound OCRThreadPoolExecutorfor I/O-bound operations