Skip to content

Conversation

@cormacrelf
Copy link

This fails with 131072 bytes read, which is coincidentally zstd's max block size. Bug discovered in facebook/buck2#1103.

Clearly poll_read is returning "ready" with 0 bytes read when it should still be pending. I have not done a bisect to find when this bug comes from, but you could do it with a script to apply this commit as a patch at each bisect eval point.

@cormacrelf
Copy link
Author

Ok. This particular .zstd file has multiple frames in it. async-compression can only handle one frame, which is wrong. A stream can have as many frames as it wants.

See this code in compression-codecs:

impl Decode for ZstdDecoder {
    fn decode(...) -> Result<bool, ...) {
        let status = self
            .decoder
            .get_mut()
            .run_on_buffers(input.unwritten(), output.unwritten_mut())?;
        input.advance(status.bytes_read);
        output.advance(status.bytes_written);
        Ok(status.remaining == 0)
    }
}

See https://docs.rs/zstd/latest/zstd/stream/raw/struct.Status.html --

remaining: usize

Number of bytes expected for next input.

If remaining = 0, then we are at the end of a frame.

If remaining > 0, then it’s just a hint for how much there is still to read.

Do not take my word for this being correct, I do not know, but if you do

-        Ok(status.remaining == 0)
+        Ok(status.remaining == 0 && status.bytes_read == 0)

Then this test passes.

@cormacrelf cormacrelf changed the title Failing test - zstd bufread decoding is truncated when using tokio::io::copy Failing test - zstd decompression is truncated to the first frame in the stream Oct 30, 2025
This fails with 131072 bytes read, which is coincidentally zstd's max block size.
@cormacrelf cormacrelf changed the title Failing test - zstd decompression is truncated to the first frame in the stream Fix: zstd decompression is truncated to the first frame in the stream Oct 30, 2025
@cormacrelf
Copy link
Author

The failing test zstd::futures::bufread::decompress::trailer is one that also assumes zstd is single frame only. Disabling it only for zstd is proving tricky. The macros are pretty involved, hopefully I can leave this to someone more familiar with the codebase. I guess you could also add a ZstdDecoder::new_single_frame method to restore the old behaviour, but again routing that through the test macros is tricky.

@NobodyXu
Copy link
Collaborator

Thanks, maybe we need to properly support multi-frame, like how gzip does it (IIRC gzip has a reset implementation)

@NobodyXu
Copy link
Collaborator

NobodyXu commented Nov 2, 2025

zstd does have a reinit call though it is a no-op, but it might be intended as no reinitialization is required

async-compression already has a reinit calling it though, maybe you just need to call reinit and continue? @cormacrelf

Our decoder assumes that you know about if multiple frames, and cal reinit then keep decoding if it's the case.

https://github.com/Nullus157/async-compression/blob/main/crates/compression-codecs/src/zstd/decoder.rs#L65

@cormacrelf
Copy link
Author

Hmmm. I guess this test does pass with decoder.multiple_members(true);. I guess I was holding it wrong. Great that async-compression does handle this and it seems quite well. Sorry for bugging you so much about it and claiming it doesn't work!

However, the reason I went so deep on this was I REALLY expected ZstdDecoder to do what zstd -d does without further configuration, so much so that I spent many hours eliminating a dozen other ways this could have failed. I cannot recommend strenuously enough that you make multiple_members(true) the default. People who are looking to decompress one frame at a time know what they are doing and will go looking for an API to do that. People who do not even know that zstd has such a thing as multiple frames will just assume it works like zstd -d. The same goes for gzip, whose gzip -d works the same way.

(echo hello | zstd; echo goodbye | zstd) | zstd -d
hello
goodbye
(echo hello | gzip; echo goodbye | gzip) | gzip -d
hello
goodbye
io::copy(&mut ZstdDecoder::new(stdin), &mut stdout).unwrap();
"hello"

Obviously that will be a breaking change so probably requires a version bump, but it may also be a "fixing change" by making the code work like a lot of people already assumed it did.

@NobodyXu
Copy link
Collaborator

NobodyXu commented Nov 3, 2025

I think we'd need better documentation for that.

And optionally, we may have a new_with_multiple_members method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants