-
Notifications
You must be signed in to change notification settings - Fork 100
Fix: zstd decompression is truncated to the first frame in the stream #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Ok. This particular .zstd file has multiple frames in it. async-compression can only handle one frame, which is wrong. A stream can have as many frames as it wants. See this code in compression-codecs: impl Decode for ZstdDecoder {
fn decode(...) -> Result<bool, ...) {
let status = self
.decoder
.get_mut()
.run_on_buffers(input.unwritten(), output.unwritten_mut())?;
input.advance(status.bytes_read);
output.advance(status.bytes_written);
Ok(status.remaining == 0)
}
}See https://docs.rs/zstd/latest/zstd/stream/raw/struct.Status.html --
Do not take my word for this being correct, I do not know, but if you do - Ok(status.remaining == 0)
+ Ok(status.remaining == 0 && status.bytes_read == 0)Then this test passes. |
0718e9d to
e5dd0d5
Compare
This fails with 131072 bytes read, which is coincidentally zstd's max block size.
e5dd0d5 to
3b6506a
Compare
|
The failing test |
|
Thanks, maybe we need to properly support multi-frame, like how gzip does it (IIRC gzip has a reset implementation) |
|
zstd does have a async-compression already has a Our decoder assumes that you know about if multiple frames, and cal |
|
Hmmm. I guess this test does pass with However, the reason I went so deep on this was I REALLY expected ZstdDecoder to do what (echo hello | zstd; echo goodbye | zstd) | zstd -d
hello
goodbye
(echo hello | gzip; echo goodbye | gzip) | gzip -d
hello
goodbyeio::copy(&mut ZstdDecoder::new(stdin), &mut stdout).unwrap();
"hello"Obviously that will be a breaking change so probably requires a version bump, but it may also be a "fixing change" by making the code work like a lot of people already assumed it did. |
|
I think we'd need better documentation for that. And optionally, we may have a |
This fails with 131072 bytes read, which is coincidentally zstd's max block size. Bug discovered in facebook/buck2#1103.
Clearly poll_read is returning "ready" with 0 bytes read when it should still be pending. I have not done a bisect to find when this bug comes from, but you could do it with a script to apply this commit as a patch at each bisect eval point.