Skip to content

Library overhaul#3

Merged
JavierLopatin merged 24 commits into
masterfrom
library-overhaul
May 27, 2026
Merged

Library overhaul#3
JavierLopatin merged 24 commits into
masterfrom
library-overhaul

Conversation

@JavierLopatin

Copy link
Copy Markdown
Owner

No description provided.

- Add phenopy/__init__.py to register the `pheno` xarray accessor and
  expose the public API (was missing -> package was not importable)
- Fix broken relative imports (`from utils` -> `from .utils`)
- Fix latent bugs: raise('str')->ValueError (x3), bare except,
  missing functools.reduce, scipy trapz->trapezoid (removed in
  SciPy >=1.14), guard optional KDEpy import, repair computeChunkSize
- Remove dead/unused imports across modules; drop phantom odc.ui dep;
  make folium/pyproj lazy (the [plot] extra)
- Add pyproject.toml (hatchling) with optional extras
  ([plot]/[dask]/[fit]/[kde]/[fast]/[test]/[docs])
- Add environment-dev.yml (conda dev env), .gitignore, and a pytest
  smoke test; remove the root __init__.py footgun

Verified end-to-end (import + PhenoShape->PhenoLSP + pytest) on both the
legacy stack and a modern one (numpy 2.4 / scipy 1.17 / xarray 2026).
- Add phenopy/io.py with load_sample() for the bundled example datasets
- Add golden regression tests freezing the 16 LSP metrics on the SIF and
  ndvi samples (a safety net before the Phase 2 composable refactor)
- Make dask a core dependency (PhenoShape/PhenoLSP use chunk + map_blocks)
- Add type hints to the public API and a py.typed marker
- Lint and format the codebase with ruff (config in pyproject); exclude the
  user-maintained example notebook from tooling
- Add GitHub Actions CI: lint + test matrix (py3.10-3.12 on Linux/macOS)
  and a build + twine-check job
- Add MkDocs (Material + mkdocstrings) documentation scaffolding
- Rewrite the README (install, quickstart, features; fix the broken logo
  path and the "Python < 3.6" typo)

Verified: ruff clean, pytest green, wheel builds and passes twine check,
docs build. The wheel bundles only the small SIF/ndvi samples (kndvi and
figures are excluded).
- Add phenopy/reconstruction.py with a RECONSTRUCTORS registry and a
  get_reconstructor() dispatcher (unknown names fall back to scipy
  interp1d kinds, preserving the historical `interpolType` behavior)
- Reimplement Wave-1 reconstruction methods from primary literature:
  Savitzky-Golay, Whittaker (via whittaker-eilers), double-logistic
  (Beck 2006, Elmore 2012) and asymmetric Gaussian; the parametric fits
  fall back to NaN per pixel on non-convergence so they map safely
- Refactor utils._getPheno to dispatch through the registry; the default
  "linear" path stays exactly np.interp (golden tests unchanged), and an
  optional recon_params is threaded through the pipeline
- Move _KDE into reconstruction.py
- Add unit tests (recover a known double-logistic, SG denoises, fallback)
- Add phenopy/extraction.py with an EXTRACTORS registry and get_extractor():
  seasonal_median (the historical phentype=1), trs (relative-amplitude
  threshold), der (derivative extrema) and curvature (inflection)
- Refactor _getLSPmetrics2 to delegate SOS/EOS to the selected extractor;
  the default reproduces the median method exactly (golden unchanged)
- Expose `extraction`/`extract_params` on PhenoLSP (threaded via _parseLSP)
- Add unit tests (SOS<=POS<=EOS per method, TRS threshold, PhenoLSP runs
  with every extractor on the sample data)
- Thread recon_params through PhenoShape, and recon_params / extraction /
  extract_params through get_timeseries_metrics, so the interannual
  moving-window orchestrator can drive any reconstruction x extraction
  combination (e.g. dlog_beck x moving-window x trs)
- Add phenopy/trends.py: per-pixel Theil-Sen slope + Mann-Kendall p-value
  over a metric time series (apply_ufunc, dask-aware); Mann-Kendall is
  implemented inline so trends need only numpy + scipy
- Export trend, list_reconstructors, list_extractors from the package
- Tests: trend sign/significance/edge cases, and a get_timeseries_metrics
  smoke test confirming the extraction axis is threaded through
- Replace the map_blocks + template + computePheno-state machinery in
  PhenoShape/PhenoLSP with xarray.apply_ufunc(vectorize=True,
  dask="parallelized"). PhenoLSP/RMSE now read xnew from the 'doy'
  dimension instead of fragile accessor state
- Add a `chunks` argument to PhenoShape for out-of-core / chunked
  processing (the time axis is kept whole); remove the now-dead
  _getPheno2D / _parseLSP / _assemble helpers
- Verified numerically identical to the previous implementation (golden
  tests unchanged) and that the chunked Dask path matches eager exactly
- Add RMSE + Dask-equivalence tests, a benchmark (benchmarks/bench.py),
  and a performance docs page (chunking, schedulers, the GIL caveat)

Note: raw CPU speed-up for the pure-Python kernels needs a GIL-releasing
Numba kernel (the planned [fast] extra); today the Dask path's win is
memory / out-of-core scalability, not wall-clock on in-memory rasters.
- Add phenopy/_numba.py with an @njit(parallel=True) kernel that reconstructs
  a (time, y, x) cube via linear interpolation + moving average in compiled,
  GIL-released code (with a pure-Python no-op fallback when numba is absent)
- PhenoShape auto-uses it when eligible (linear method, in-memory, no NaNs,
  no custom params). The DOY sort index is computed in NumPy and passed in so
  the tie-ordering of duplicate DOYs matches the pure-Python path exactly
  (important for multi-year pooling, e.g. the SIF sample)
- ~11x faster linear PhenoShape on an 8500-pixel x 920-step raster, with
  identical results: the golden tests pass with the numba path active, and an
  explicit numba-vs-pure-Python equivalence test is added
- Update benchmarks/bench.py and the performance docs
- Add phenopy/anomaly.py with anomaly(): per-pixel deviation from the
  climatological PhenoShape, a standardised z-score, and an empirical RFD
  percentile (Reference Frequency Distribution, npphen-style). Values near 0
  are extreme negative anomalies, near 1 extreme positive, ~0.5 typical
- KDE-capable climatology (interpolType="KDE" gives the fully non-parametric
  reference); works with any reconstruction method otherwise
- Export phenopy.anomaly; add tests (detects an injected dry year) and wire
  trends + anomalies into the docs (API + quickstart)
- Add phenopy/season.py with n_seasons(phenoshape): per-pixel count of growing
  cycles using scipy.signal.find_peaks with an amplitude-relative prominence
  threshold, so noise isn't counted as a season. Handles the doy dim/coord like
  get_curvature and is dask-aware
- Export phenopy.n_seasons; add tests (bimodal=2, unimodal=1, flat=0, and the
  time-dim+doy-coord input) and wire it into the docs (API + quickstart)
- Add phenopy/uncertainty.py with uncertainty(): a per-pixel bootstrap that
  resamples the observations with replacement, recomputes the 16 LSP metrics
  for each replicate, and returns their standard deviation (a non-parametric
  estimate of each metric's sampling uncertainty)
- Centralise the 16-band order as utils.LSP_BANDS (single source of truth; the
  Pheno accessor now references it) -- golden output is unchanged
- Export phenopy.uncertainty; add a test and wire it into the docs
- Add examples/tutorial.ipynb: an English, end-to-end tutorial exercising every
  public function (load_sample/list_*; all 8 reconstruction methods; all 4
  extraction methods; PhenoShape/PhenoLSP/RMSE; get_curvature/
  classify_vector_numeric; n_seasons; reorder_southern_hemisphere;
  get_timeseries_metrics; trend; anomaly; uncertainty; chunking/Numba; and the
  PhenoPlot/plot_with_southern_doy/display_map helpers) on the bundled SIF/ndvi
  samples. Executed end-to-end with zero errors and plots embedded.
- A final section documents what the bundled data cannot cover and the data
  needed (large-scale/out-of-core, Northern-Hemisphere phenology, multi-cropping
  for NOS, cloud/QA-flagged data for robustness and the planned QA weighting,
  and an independent reference for anomaly skill).
- Add ipykernel + nbconvert to the dev env and the [docs] extra; link the
  tutorial from the README.
reorder_southern_hemisphere relabels `doy` as a day-of-season index (still
1-365, starting in July), so a plain plot keeps a 1-365 axis. The tutorial now
shows a before/after (calendar DOY with the austral peak split at New Year vs
the reordered, centred curve) and uses plot_with_southern_doy on the reordered
shape to display the real Southern-Hemisphere calendar DOY (...183, 365, 1, 182).
… many years

- southern=True maps the x-axis to the Southern-Hemisphere day-of-year (austral
  season centred) and relabels ticks to the real SH calendar DOY
- the per-year legend is placed outside the plot, on the left, by default
- with many years (> many_years, default 8) the points use a continuous
  colormap (cmap, default "viridis") + a year colorbar instead of a large legend
- honour the nan_replace argument; save with bbox_inches="tight"
- add Agg-backend smoke tests; the tutorial's PhenoPlot cell now uses southern=True
_relabel_southern called set_xticks(get_xticks()); because the tick locator's
values overshoot the data, the view expanded to ~(-100, 400), so the curve
(day-of-season ~2-359) looked compressed and "not continuous" across the axis.
Now the ticks are filtered to the current view and xlim is restored, so the
curve fills a data-driven axis (~[-16, 377]) with correct SH calendar-DOY labels
(e.g. 183, 283, 18, 118).
Enrich the 14 markdown sections of examples/tutorial.ipynb so each function
documents what it does, its key parameters, and how to read its output:

- intro: add the 3 composable axes (reconstruction / extraction / temporal)
- PhenoLSP: table of the 16 LSP metrics + the 4 SOS/EOS extractors
- get_curvature: correct to a per-pixel scalar (summed 2nd differences),
  distinct from the curvature extractor
- n_seasons: prominence/distance/min_height and NOS>=2 interpretation
- PhenoShape, RMSE, get_timeseries_metrics, trend, anomaly, uncertainty,
  performance and plotting sections expanded

Markdown-only edits; executed code outputs are unchanged.
phenopy.qa.qa_to_weight decodes any xarray quality band into per-observation
weights in [0, 1], covering the four common QA patterns -- remap, good_classes,
bad_bits, max_value/min_value -- via a declarative dict, a built-in registry key
(MOD13Q1, LANDSAT_C2, S2_SCL), or a custom callable for exotic sensors. It is
sensor- and source-agnostic (works on EE, local rasters, NetCDF), so Earth Engine
is never a dependency; missing QA is treated as weight 0. Exported from the
package; 9 tests. These weights feed the upcoming weighted reconstruction.
- examples/earthengine_torres_del_paine.ipynb: pull cloud-contaminated NDVI plus
  the raw QA band from MODIS / Landsat / Sentinel-2 over Patagonia via xee, then
  decode QA with phenopy.qa. GEE is an example bridge only (gee_to_xarray), not a
  library dependency.
- pyproject: add optional [gee] extra (earthengine-api, xee, netCDF4).
- plotting: migrate display_map off the deprecated pyproj.transform/Proj API to
  Transformer.from_crs(always_xy=True); also handles projected CRS correctly.
- gitignore: ignore examples/data/ (locally cached EE downloads).
… + upper-envelope (wTSM/Chen)

Add a weights= path to axis-1 reconstruction so per-observation QA weights (e.g.
from phenopy.qa) drive the fit, plus an iterative upper-envelope method:

- PhenoShape gains weights= (a (time, y, x) DataArray), threaded through
  _getPheno0/_getPheno to the reconstructor via a second apply_ufunc input.
  weights=None keeps the unweighted path byte-identical (golden tests unchanged).
- whittaker: with weights, collapse duplicate (pooled) DOYs to a weighted mean and
  smooth the irregular series with per-observation weights.
- new "upper_envelope" reconstructor (Chen 2004 / TIMESAT wTSM): iteratively lifts
  below-fit samples to the fitted curve, tracking the noise-free upper envelope
  without needing a QA band.

Validated on a synthetic downward-contaminated series and the real cloudy Torres
del Paine fixture, where both recover the vegetation signal clouds otherwise drag
down (e.g. Landsat peak NDVI 0.46 -> 0.93). 5 new tests.
Now that weighted reconstruction is implemented, §6 compares unweighted vs
QA-weighted Whittaker vs upper-envelope (wTSM/Chen) on a cloudy pixel, §8 is a
recap, and §7 writes the cache to examples/data/ robustly whether the notebook's
CWD is the repo root (VS Code) or the examples/ folder (Jupyter Lab).
In RMSE(segment=True) the helper grid was chunked with computed_stack.chunks,
which is None for in-memory inputs -> .chunk(None) is deprecated. Chunk it only
when the input is actually dask-backed, matching it via .chunksizes.
Extend the per-pixel LSP output from 16 to 18 metrics:
- trough: the curve's minimum (baseline) value, in VI units
- mos: middle of season, the DOY midpoint between SOS and EOS (NaN when LOS is)

Both are appended to LSP_BANDS and the _getLSPmetrics2 array, so the existing 16
metrics keep their position and values; also wired into get_timeseries_metrics.
Golden fixtures regenerated (18 metrics). 55 tests pass, ruff clean.
Add trough/mos to the §3 metric table and re-execute the notebook so the
PhenoLSP output (and all downstream cells) reflect the 18-metric result.
…nce guide

- benchmarks/bench.py: add an extraction benchmark (PhenoLSP eager vs chunked +
  Dask 'processes'), alongside the existing linear-reconstruction Numba vs
  pure-Python comparison; both assert parallel == serial.
- docs/performance.md: a "which knob when" guide; document Dask scaling honestly
  (processes-scheduler start-up overhead -> modest gains on small in-memory
  rasters, real value is out-of-core), and the design note on why extraction has
  no Numba kernel (one reference implementation + Dask, rather than a second
  byte-identical copy).
'phenopy' is taken on PyPI by an unrelated clinical-phenotyping tool, so the
library is renamed to PhenoSensing (Phenology + remote Sensing):

- phenopy/ -> phenosensing/ (import phenosensing); the xarray accessor stays
  da.pheno
- accessor module phenopy.py -> accessor.py (removes the phenopy.phenopy shadow)
- update pyproject (name/paths/extras), README, docs, mkdocs, CI (--cov),
  environment-dev.yml (env name), benchmarks, every test, and both example
  notebooks (tutorial re-executed under the new name)
- stop tracking stray __pycache__ / .ipynb_checkpoints that had been committed

Behaviour unchanged: 55 tests pass, ruff clean, golden byte-identical.
@JavierLopatin JavierLopatin merged commit 9f54065 into master May 27, 2026
2 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant