-
Notifications
You must be signed in to change notification settings - Fork 2
LibBlosc2: New codec #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54 +/- ##
===========================================
- Coverage 98.24% 86.55% -11.69%
===========================================
Files 5 4 -1
Lines 456 186 -270
===========================================
- Hits 448 161 -287
- Misses 8 25 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
If I understand correctly, the goal is to implement the HDF5 filter 32026 in https://github.com/silx-kit/hdf5plugin/blob/v5.1.0/src/PyTables/hdf5-blosc2/src/blosc2_filter.c According to my reading of https://github.com/Blosc/c-blosc2/blob/v2.17.1/README_EXTENSION_FILENAMES.rst, there are also .b2frame and .b2nd formats. Therefore, the format here should be called |
|
My goal is to implement a stand-alone blosc2 compressor/decompressor. I did not intend to connect it to HDF5, although that should be possible. The format I am implementing uses "super-chunks" which were introduced in blosc2. They allow compressing more than 2 GByte of data. Blosc2 still supports the compression methods used by blosc1 with their size limit. It would be possible to add support for this in |
|
The |
|
ping |
|
|
||
| # There's more unused/unchecked data | ||
| c[end-50] = 0x40 | ||
| # BROKEN @test_throws Blosc2DecodingError decode(Blosc2DecodeOptions(), c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay for a format not to checksum everything.
| # Finally, this corruption has an effect | ||
| c[end-100] = 0x40 | ||
| # Windows segfaults in this call with exit code 3221226356, | ||
| # indicating a heap corruption. That's clearly a bug in c-blosc2. | ||
| # It seems c-blosc2 does not checksum its compressed data. | ||
| if !Sys.iswindows() | ||
| @test_throws Blosc2DecodingError decode(Blosc2DecodeOptions(), c) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the file that is causing the segfault.
bad_file.txt
It would be good to see if this crashes https://github.com/Blosc/c-blosc2/blob/main/examples/decompress_file.c as well.
Until this is resolved, the documentation for this package should have warnings not to use the package with potentially invalid inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, decompress_file does segfault on this input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bug in blosc2 or the example. This issue isn't with checksums, it is probably blosc2 missing a bounds check somewhere. Can you report this upstream?
|
Question about the status of this PR. Should it be merged as is and then the problems fixed in future PRs before an initial release? Do you want to continue working on this PR branch, or do you want to close the PR? |
|
I am still interested in this PR. I don't know your development routine, but it might be best to stay as PR until it is ready. What is missing? To my knowledge there are only a few minor items outstanding: Tests and warnings to users. There is a new version of Blosc2 available, I can check for the segfault with that version and otherwise report the problem upstream. |
|
Sounds good. The other thing missing is using the documented Schunk overhead instead of a guess (or manually creating the schunks in Julia if the C library may go over the proposed limit). |
This is a first stab at implementing a Blosc2 codec. I believe the implementation is correct. I am looking for feedback.