Fix: zstd decompression is truncated to the first frame in the stream #400

cormacrelf · 2025-10-30T00:11:39Z

This fails with 131072 bytes read, which is coincidentally zstd's max block size. Bug discovered in facebook/buck2#1103.

Clearly poll_read is returning "ready" with 0 bytes read when it should still be pending. I have not done a bisect to find when this bug comes from, but you could do it with a script to apply this commit as a patch at each bisect eval point.

cormacrelf · 2025-10-30T01:04:36Z

Ok. This particular .zstd file has multiple frames in it. async-compression can only handle one frame, which is wrong. A stream can have as many frames as it wants.

See this code in compression-codecs:

impl Decode for ZstdDecoder {
    fn decode(...) -> Result<bool, ...) {
        let status = self
            .decoder
            .get_mut()
            .run_on_buffers(input.unwritten(), output.unwritten_mut())?;
        input.advance(status.bytes_read);
        output.advance(status.bytes_written);
        Ok(status.remaining == 0)
    }
}

See https://docs.rs/zstd/latest/zstd/stream/raw/struct.Status.html --

remaining: usize

Number of bytes expected for next input.

If remaining = 0, then we are at the end of a frame.

If remaining > 0, then it’s just a hint for how much there is still to read.

Do not take my word for this being correct, I do not know, but if you do

-        Ok(status.remaining == 0)
+        Ok(status.remaining == 0 && status.bytes_read == 0)

Then this test passes.

This fails with 131072 bytes read, which is coincidentally zstd's max block size.

cormacrelf · 2025-10-30T07:49:46Z

The failing test zstd::futures::bufread::decompress::trailer is one that also assumes zstd is single frame only. Disabling it only for zstd is proving tricky. The macros are pretty involved, hopefully I can leave this to someone more familiar with the codebase. I guess you could also add a ZstdDecoder::new_single_frame method to restore the old behaviour, but again routing that through the test macros is tricky.

NobodyXu · 2025-10-30T10:53:59Z

Thanks, maybe we need to properly support multi-frame, like how gzip does it (IIRC gzip has a reset implementation)

NobodyXu · 2025-11-02T13:50:04Z

zstd does have a reinit call though it is a no-op, but it might be intended as no reinitialization is required

async-compression already has a reinit calling it though, maybe you just need to call reinit and continue? @cormacrelf

Our decoder assumes that you know about if multiple frames, and cal reinit then keep decoding if it's the case.

https://github.com/Nullus157/async-compression/blob/main/crates/compression-codecs/src/zstd/decoder.rs#L65

cormacrelf · 2025-11-03T01:48:41Z

Hmmm. I guess this test does pass with decoder.multiple_members(true);. I guess I was holding it wrong. Great that async-compression does handle this and it seems quite well. Sorry for bugging you so much about it and claiming it doesn't work!

However, the reason I went so deep on this was I REALLY expected ZstdDecoder to do what zstd -d does without further configuration, so much so that I spent many hours eliminating a dozen other ways this could have failed. I cannot recommend strenuously enough that you make multiple_members(true) the default. People who are looking to decompress one frame at a time know what they are doing and will go looking for an API to do that. People who do not even know that zstd has such a thing as multiple frames will just assume it works like zstd -d. The same goes for gzip, whose gzip -d works the same way.

(echo hello | zstd; echo goodbye | zstd) | zstd -d
hello
goodbye
(echo hello | gzip; echo goodbye | gzip) | gzip -d
hello
goodbye

io::copy(&mut ZstdDecoder::new(stdin), &mut stdout).unwrap();
"hello"

Obviously that will be a breaking change so probably requires a version bump, but it may also be a "fixing change" by making the code work like a lot of people already assumed it did.

NobodyXu · 2025-11-03T07:09:25Z

I think we'd need better documentation for that.

And optionally, we may have a new_with_multiple_members method

cormacrelf changed the title ~~Failing test - zstd bufread decoding is truncated when using tokio::io::copy~~ Failing test - zstd decompression is truncated to the first frame in the stream Oct 30, 2025

cormacrelf force-pushed the tokio-io-copy branch from 0718e9d to e5dd0d5 Compare October 30, 2025 07:13

cormacrelf added 2 commits October 30, 2025 18:13

Failing test - cannot decode multi-frame zstd streams

f77e69a

This fails with 131072 bytes read, which is coincidentally zstd's max block size.

zstd: fix multi-frame stream decoding

3b6506a

cormacrelf force-pushed the tokio-io-copy branch from e5dd0d5 to 3b6506a Compare October 30, 2025 07:14

cormacrelf changed the title ~~Failing test - zstd decompression is truncated to the first frame in the stream~~ Fix: zstd decompression is truncated to the first frame in the stream Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: zstd decompression is truncated to the first frame in the stream #400

Fix: zstd decompression is truncated to the first frame in the stream #400

cormacrelf commented Oct 30, 2025

Uh oh!

cormacrelf commented Oct 30, 2025

Uh oh!

cormacrelf commented Oct 30, 2025

Uh oh!

NobodyXu commented Oct 30, 2025

Uh oh!

NobodyXu commented Nov 2, 2025 •

edited

Loading

Uh oh!

cormacrelf commented Nov 3, 2025

Uh oh!

NobodyXu commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: zstd decompression is truncated to the first frame in the stream #400

Are you sure you want to change the base?

Fix: zstd decompression is truncated to the first frame in the stream #400

Conversation

cormacrelf commented Oct 30, 2025

Uh oh!

cormacrelf commented Oct 30, 2025

Uh oh!

cormacrelf commented Oct 30, 2025

Uh oh!

NobodyXu commented Oct 30, 2025

Uh oh!

NobodyXu commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cormacrelf commented Nov 3, 2025

Uh oh!

NobodyXu commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NobodyXu commented Nov 2, 2025 •

edited

Loading