-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi Constantin,
First of all — great work on PyQvd! It's the most well-known Python library for QVD files, and the API design is clean and well-documented.
I'm the author of qvdrs — a QVD library with the core engine written in Rust and Python bindings via PyO3 + Arrow zero-copy bridge (pip install qvdrs). I'd like to explore the possibility of collaboration.
Performance comparison
Benchmarks on the same machine with real QVD files, PyQvd 2.3.2 vs qvdrs 0.5.0:
Read
| File | Rows | Cols | PyQvd | qvdrs | Speedup |
|---|---|---|---|---|---|
| 11 KB | 12 | 4 | 0.013s | 0.000s | 29x |
| 62 KB | 125 | 45 | 0.012s | 0.001s | 11x |
| 2.3 MB | 21,523 | 10 | 0.214s | 0.011s | 20x |
| 35 MB | 1,695,048 | 7 | 5.96s | 0.26s | 23x |
| 480 MB | 11,994,296 | 4 | 64.5s | 2.1s | 31x |
| 560 MB | 5,458,618 | 24 | 65.2s | 3.9s | 17x |
| 1.7 GB | 87,617,047 | 8 | >10 min (killed) | 23.4s | >25x |
Write
| File | Rows | Cols | PyQvd | qvdrs | Speedup |
|---|---|---|---|---|---|
| 35 MB | 1,695,048 | 7 | 7.8s | 0.022s | 351x |
| 480 MB | 11,994,296 | 4 | 50.9s | 0.61s | 83x |
Features only in qvdrs
| Feature | qvdrs |
|---|---|
| Streaming EXISTS() filtered read: 1.7GB, 87.6M rows -> 20.4M rows x 3 cols | 9.0s |
| EXISTS() + save to QVD | 13.3s |
| Parquet <-> QVD conversion | yes |
| DuckDB native integration (register QVD as SQL tables) | yes |
| DataFusion SQL queries on QVD | yes |
| Arrow RecordBatch zero-copy (pandas, Polars, DuckDB) | yes |
| CLI tool (convert, inspect, head, filter) | yes |
| Binary-identical output to Qlik Sense (MD5 verified) | yes |
What each project brings
PyQvd — mature, clean API with QvdTable (filter_by, join, sort, append, insert), 25 stars, established user base, good documentation on readthedocs, pure Python — easy to understand and debug.
qvdrs — Rust core (17-350x faster), handles multi-GB files, streaming reader, EXISTS() filter (2.5x faster than Qlik Sense), Parquet/Arrow/DuckDB/DataFusion integration, binary-identical QVD output to Qlik Sense.
Development approach
The qvdrs codebase is developed with the help of Claude (Opus) — Anthropic's AI coding assistant. This significantly accelerates development: implementing new features, writing tests, debugging, and maintaining code quality. If you're open to collaboration, this is a powerful tool that can be used for joint development as well — it handles Rust, Python, and the QVD binary format equally well.
Proposal
A few possible directions:
- Contribute to qvdrs — your QVD format expertise and API design skills would be very valuable. We could adopt PyQvd's richer QvdTable API (filter_by, join, sort, etc.) for the Python bindings.
- qvdrs as an optional backend for PyQvd — keep PyQvd's API, but optionally use qvdrs for I/O (similar to how pandas uses pyarrow). Users get the familiar API with Rust performance.
- Joint development — combine efforts under one project.
No pressure at all — just reaching out since we're both solving the same problem. Happy to discuss.
Stanislav
https://github.com/bintocher/qvdrs