Skip to content

DROID processed dataset on ModelScope: most *.tar.zst are not valid Zstandard #4

@ThomasLewandovski

Description

@ThomasLewandovski

Hi authors, thanks a lot for releasing the code and dataset links — this is a really solid and practical piece of work.

I downloaded hutchinsonian/droid_processed from ModelScope, but most files under droid_processed_data_tar/ look incorrect.

No matter which method I use, almost all droid_processed_data_XXXX.tar.zst fail with:
zstd: unsupported format
and in the rare case zstd is recognized, it can still be truncated: Read error: premature end
I've tried these methods as README provided as follows,
Git LFS: git lfs clone https://www.modelscope.cn/hutchinsonian/droid_processed.git data/droid/_modelscope_droid_processed
ModelScope SDK / snapshot_download:

 from modelscope.hub.snapshot_download import snapshot_download  
              snapshot_download(
                  repo_id="hutchinsonian/droid_processed",
                  repo_type="dataset",
                  revision="master",
                  local_dir="data/droid/_modelscope_droid_processed"
              )

Same result for both.

I checked the first 4 bytes of all *.tar.zst files. Zstandard should start with magic 28 b5 2f fd, but in my case:
Total shards: 188
ZSTD magic matched: 1
Magic mismatch: 187
So most files are named .tar.zst but don’t seem to be Zstandard files at all.

Is the dataset currently incomplete/corrupted on ModelScope?
If yes, could you re-upload/fix the tar shards, or share a verified alternative download (or checksums) that you know works?

Thanks again — I really appreciate the open-source release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions