Skip to content

[FEATURE]: Add Diagnostic Verification Reporting (HTML, Markdown, and detailed failure diagnostics) #87

@abhranshu

Description

@abhranshu

Feature and its Use Cases

Currently, when a user runs verify_dataset.py, the verification report is limited to:

An ASCII table printed to the terminal showing PASS/FAIL per check
An optional raw JSON file via the --json flag
This works for quick checks, but it falls short in two critical areas:

Problem 1: No shareable, readable reports
The ASCII table only makes sense in a terminal. If a researcher wants to share verification results with a team, attach them to a paper, or post them in a GitHub issue/PR — there's no way to do it. The JSON output is machine-readable but not human-friendly.

Problem 2: No diagnostic details when verification fails
When a check fails (e.g., processed_sha256 shows FAIL), the report only shows the expected vs actual hash. It gives zero guidance on why it failed. The user is left guessing:

Did the environment change? Which packages differ?
Did the processed output change? Where exactly does it diverge?
Is it a Python version issue? A platform issue?

Additional Context

No response

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions