Skip to content

Benchmark NASA POWER Zarr backend options#79

Merged
peetmate merged 2 commits into
mainfrom
codex/issue-54-nasa-power-zarr
Jun 23, 2026
Merged

Benchmark NASA POWER Zarr backend options#79
peetmate merged 2 commits into
mainfrom
codex/issue-54-nasa-power-zarr

Conversation

@peetmate

Copy link
Copy Markdown
Contributor

Summary

Add a reproducible benchmark harness and decision note for issue #54 to evaluate NASA POWER public Zarr stores against the current toolkit NASA POWER point-API backend.

Problem / Motivation

Issue #54 asked whether NASA POWER public Zarr stores could improve multi-site / long-timeseries runtime enough to justify backend changes.

This needed evidence first, not a speculative rewrite.

Scope

  • In scope:
    • add reproducible benchmark harness for API vs temporal Zarr vs spatial Zarr
    • record benchmark findings in issue note
    • commit compact summary CSV artifacts used in the recommendation
  • Out of scope:
    • changing production nasa_power backend
    • altering cache layout or toolkit source contracts
    • adding new runtime dependencies to stable package install

Implementation Notes

  • Added analysis/run_nasa_power_zarr_benchmark.py
    • benchmarks current API backend (api_point)
    • benchmarks temporal public Zarr (zarr_temporal)
    • benchmarks spatial public Zarr (zarr_spatial)
    • checks shared-variable value agreement vs API
    • records coverage gaps and timing summaries
  • Updated analysis/issues/issue_nasa_power_zarr_followup.md
    • converted earlier idea note into evidence-backed decision memo
    • documented chunk geometry, runtime results, coverage gap, and recommendation
  • Added summary CSV artifacts:
    • analysis/nasa_power_zarr_benchmark_summary_api_temporal.csv
    • analysis/nasa_power_zarr_benchmark_summary_api_temporal_sites10.csv
    • analysis/nasa_power_zarr_benchmark_summary_spatial_short10.csv

Data / Access / Runtime Notes

  • Harness uses public NASA POWER S3 Zarr stores plus live NASA POWER API calls.
  • Harness currently requires local analysis extras (zarr, s3fs) to rerun.
  • Those extras are deliberately not promoted into stable package runtime by this PR because recommendation is still "no backend switch".
  • Raw JSON benchmark payloads were kept local and not committed; compact CSV summaries + note are committed.

Testing

  • Commands run:
    • .venv/bin/python -m py_compile analysis/run_nasa_power_zarr_benchmark.py
    • .venv/bin/python -m pytest -q tests/test_packaging_metadata.py
    • live benchmark runs:
      • API vs temporal, 1/5 sites, short10/one_year/ten_year
      • API vs temporal, 10 sites, one_year/ten_year
      • spatial, 1/5 sites, short10
  • Key findings:
    • public Zarr stores matched API values on shared variables to tiny float-rounding differences (~3e-05 max abs diff in sample)
    • public Zarr stores lacked toolkit solar_radiation variable (ALLSKY_SFC_SW_DWN)
    • temporal Zarr got relatively closer as workload grew, but stayed slower than current API backend up to 10 sites / 10 years
    • spatial Zarr was a poor fit for point time-series workloads
  • Not tested:
    • production backend swap, because evidence did not support making one

Reviewer Guidance

Review focus:

  • analysis/run_nasa_power_zarr_benchmark.py
  • analysis/issues/issue_nasa_power_zarr_followup.md
  • summary CSVs for consistency with memo conclusions

Main question:

  • does evidence support current recommendation to keep API backend as default and avoid premature Zarr migration?

Related Issues

Checklist

  • I reviewed my own diff for unrelated changes.
  • I used a dedicated branch for this work.
  • I added or updated tests where needed.
  • I documented any auth, data, cache, or runtime implications.
  • I included reviewer test steps or validation notes.
  • I linked the relevant issue(s).

@peetmate peetmate merged commit b376f4d into main Jun 23, 2026
2 checks passed
@peetmate peetmate deleted the codex/issue-54-nasa-power-zarr branch June 23, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluate NASA POWER Zarr backend for multi-site long-timeseries fetches

1 participant