Skip to content

[Track] Additional output formats (ORC, Lance) #5

@mprammer

Description

@mprammer

Add ORC and Lance as additional columnar outputs alongside the existing Parquet (canonical) and Vortex (optional) artifacts.

Per format

  • New convert stage (or generalised convert that switches on format).
  • Extend sources.json: per-format convert.<fmt> + convert.<fmt>_skip_reason, mirroring the Vortex pair.
  • Update validate_manifest invariants.
  • Outputs at outputs/v{n}/<slug>/<fmt>/<slug>.<ext>.
  • Regen docs/datasets.md + docs/snapshot.json.

Formats in scope

  • ORC — Apache ORC. PyArrow has read-only support; need an external writer (e.g. pyorc, or PyArrow + orc-tools).
  • Lance — Lance v2 via pylance.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesttracking-issueShared implementation context for work likely to span multiple PRs.
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions