Library overhaul by JavierLopatin · Pull Request #3 · JavierLopatin/PhenoSensing

JavierLopatin · 2026-05-27T00:47:25Z

No description provided.

- Add phenopy/__init__.py to register the `pheno` xarray accessor and expose the public API (was missing -> package was not importable) - Fix broken relative imports (`from utils` -> `from .utils`) - Fix latent bugs: raise('str')->ValueError (x3), bare except, missing functools.reduce, scipy trapz->trapezoid (removed in SciPy >=1.14), guard optional KDEpy import, repair computeChunkSize - Remove dead/unused imports across modules; drop phantom odc.ui dep; make folium/pyproj lazy (the [plot] extra) - Add pyproject.toml (hatchling) with optional extras ([plot]/[dask]/[fit]/[kde]/[fast]/[test]/[docs]) - Add environment-dev.yml (conda dev env), .gitignore, and a pytest smoke test; remove the root __init__.py footgun Verified end-to-end (import + PhenoShape->PhenoLSP + pytest) on both the legacy stack and a modern one (numpy 2.4 / scipy 1.17 / xarray 2026).

- Add phenopy/io.py with load_sample() for the bundled example datasets - Add golden regression tests freezing the 16 LSP metrics on the SIF and ndvi samples (a safety net before the Phase 2 composable refactor) - Make dask a core dependency (PhenoShape/PhenoLSP use chunk + map_blocks) - Add type hints to the public API and a py.typed marker - Lint and format the codebase with ruff (config in pyproject); exclude the user-maintained example notebook from tooling - Add GitHub Actions CI: lint + test matrix (py3.10-3.12 on Linux/macOS) and a build + twine-check job - Add MkDocs (Material + mkdocstrings) documentation scaffolding - Rewrite the README (install, quickstart, features; fix the broken logo path and the "Python < 3.6" typo) Verified: ruff clean, pytest green, wheel builds and passes twine check, docs build. The wheel bundles only the small SIF/ndvi samples (kndvi and figures are excluded).

- Add phenopy/reconstruction.py with a RECONSTRUCTORS registry and a get_reconstructor() dispatcher (unknown names fall back to scipy interp1d kinds, preserving the historical `interpolType` behavior) - Reimplement Wave-1 reconstruction methods from primary literature: Savitzky-Golay, Whittaker (via whittaker-eilers), double-logistic (Beck 2006, Elmore 2012) and asymmetric Gaussian; the parametric fits fall back to NaN per pixel on non-convergence so they map safely - Refactor utils._getPheno to dispatch through the registry; the default "linear" path stays exactly np.interp (golden tests unchanged), and an optional recon_params is threaded through the pipeline - Move _KDE into reconstruction.py - Add unit tests (recover a known double-logistic, SG denoises, fallback)

- Add phenopy/extraction.py with an EXTRACTORS registry and get_extractor(): seasonal_median (the historical phentype=1), trs (relative-amplitude threshold), der (derivative extrema) and curvature (inflection) - Refactor _getLSPmetrics2 to delegate SOS/EOS to the selected extractor; the default reproduces the median method exactly (golden unchanged) - Expose `extraction`/`extract_params` on PhenoLSP (threaded via _parseLSP) - Add unit tests (SOS<=POS<=EOS per method, TRS threshold, PhenoLSP runs with every extractor on the sample data)

- Thread recon_params through PhenoShape, and recon_params / extraction / extract_params through get_timeseries_metrics, so the interannual moving-window orchestrator can drive any reconstruction x extraction combination (e.g. dlog_beck x moving-window x trs) - Add phenopy/trends.py: per-pixel Theil-Sen slope + Mann-Kendall p-value over a metric time series (apply_ufunc, dask-aware); Mann-Kendall is implemented inline so trends need only numpy + scipy - Export trend, list_reconstructors, list_extractors from the package - Tests: trend sign/significance/edge cases, and a get_timeseries_metrics smoke test confirming the extraction axis is threaded through

- Replace the map_blocks + template + computePheno-state machinery in PhenoShape/PhenoLSP with xarray.apply_ufunc(vectorize=True, dask="parallelized"). PhenoLSP/RMSE now read xnew from the 'doy' dimension instead of fragile accessor state - Add a `chunks` argument to PhenoShape for out-of-core / chunked processing (the time axis is kept whole); remove the now-dead _getPheno2D / _parseLSP / _assemble helpers - Verified numerically identical to the previous implementation (golden tests unchanged) and that the chunked Dask path matches eager exactly - Add RMSE + Dask-equivalence tests, a benchmark (benchmarks/bench.py), and a performance docs page (chunking, schedulers, the GIL caveat) Note: raw CPU speed-up for the pure-Python kernels needs a GIL-releasing Numba kernel (the planned [fast] extra); today the Dask path's win is memory / out-of-core scalability, not wall-clock on in-memory rasters.

- Add phenopy/_numba.py with an @njit(parallel=True) kernel that reconstructs a (time, y, x) cube via linear interpolation + moving average in compiled, GIL-released code (with a pure-Python no-op fallback when numba is absent) - PhenoShape auto-uses it when eligible (linear method, in-memory, no NaNs, no custom params). The DOY sort index is computed in NumPy and passed in so the tie-ordering of duplicate DOYs matches the pure-Python path exactly (important for multi-year pooling, e.g. the SIF sample) - ~11x faster linear PhenoShape on an 8500-pixel x 920-step raster, with identical results: the golden tests pass with the numba path active, and an explicit numba-vs-pure-Python equivalence test is added - Update benchmarks/bench.py and the performance docs

- Add phenopy/anomaly.py with anomaly(): per-pixel deviation from the climatological PhenoShape, a standardised z-score, and an empirical RFD percentile (Reference Frequency Distribution, npphen-style). Values near 0 are extreme negative anomalies, near 1 extreme positive, ~0.5 typical - KDE-capable climatology (interpolType="KDE" gives the fully non-parametric reference); works with any reconstruction method otherwise - Export phenopy.anomaly; add tests (detects an injected dry year) and wire trends + anomalies into the docs (API + quickstart)

- Add phenopy/season.py with n_seasons(phenoshape): per-pixel count of growing cycles using scipy.signal.find_peaks with an amplitude-relative prominence threshold, so noise isn't counted as a season. Handles the doy dim/coord like get_curvature and is dask-aware - Export phenopy.n_seasons; add tests (bimodal=2, unimodal=1, flat=0, and the time-dim+doy-coord input) and wire it into the docs (API + quickstart)

- Add phenopy/uncertainty.py with uncertainty(): a per-pixel bootstrap that resamples the observations with replacement, recomputes the 16 LSP metrics for each replicate, and returns their standard deviation (a non-parametric estimate of each metric's sampling uncertainty) - Centralise the 16-band order as utils.LSP_BANDS (single source of truth; the Pheno accessor now references it) -- golden output is unchanged - Export phenopy.uncertainty; add a test and wire it into the docs

- Add examples/tutorial.ipynb: an English, end-to-end tutorial exercising every public function (load_sample/list_*; all 8 reconstruction methods; all 4 extraction methods; PhenoShape/PhenoLSP/RMSE; get_curvature/ classify_vector_numeric; n_seasons; reorder_southern_hemisphere; get_timeseries_metrics; trend; anomaly; uncertainty; chunking/Numba; and the PhenoPlot/plot_with_southern_doy/display_map helpers) on the bundled SIF/ndvi samples. Executed end-to-end with zero errors and plots embedded. - A final section documents what the bundled data cannot cover and the data needed (large-scale/out-of-core, Northern-Hemisphere phenology, multi-cropping for NOS, cloud/QA-flagged data for robustness and the planned QA weighting, and an independent reference for anomaly skill). - Add ipykernel + nbconvert to the dev env and the [docs] extra; link the tutorial from the README.

reorder_southern_hemisphere relabels `doy` as a day-of-season index (still 1-365, starting in July), so a plain plot keeps a 1-365 axis. The tutorial now shows a before/after (calendar DOY with the austral peak split at New Year vs the reordered, centred curve) and uses plot_with_southern_doy on the reordered shape to display the real Southern-Hemisphere calendar DOY (...183, 365, 1, 182).

… many years - southern=True maps the x-axis to the Southern-Hemisphere day-of-year (austral season centred) and relabels ticks to the real SH calendar DOY - the per-year legend is placed outside the plot, on the left, by default - with many years (> many_years, default 8) the points use a continuous colormap (cmap, default "viridis") + a year colorbar instead of a large legend - honour the nan_replace argument; save with bbox_inches="tight" - add Agg-backend smoke tests; the tutorial's PhenoPlot cell now uses southern=True

_relabel_southern called set_xticks(get_xticks()); because the tick locator's values overshoot the data, the view expanded to ~(-100, 400), so the curve (day-of-season ~2-359) looked compressed and "not continuous" across the axis. Now the ticks are filtered to the current view and xlim is restored, so the curve fills a data-driven axis (~[-16, 377]) with correct SH calendar-DOY labels (e.g. 183, 283, 18, 118).

Enrich the 14 markdown sections of examples/tutorial.ipynb so each function documents what it does, its key parameters, and how to read its output: - intro: add the 3 composable axes (reconstruction / extraction / temporal) - PhenoLSP: table of the 16 LSP metrics + the 4 SOS/EOS extractors - get_curvature: correct to a per-pixel scalar (summed 2nd differences), distinct from the curvature extractor - n_seasons: prominence/distance/min_height and NOS>=2 interpretation - PhenoShape, RMSE, get_timeseries_metrics, trend, anomaly, uncertainty, performance and plotting sections expanded Markdown-only edits; executed code outputs are unchanged.

phenopy.qa.qa_to_weight decodes any xarray quality band into per-observation weights in [0, 1], covering the four common QA patterns -- remap, good_classes, bad_bits, max_value/min_value -- via a declarative dict, a built-in registry key (MOD13Q1, LANDSAT_C2, S2_SCL), or a custom callable for exotic sensors. It is sensor- and source-agnostic (works on EE, local rasters, NetCDF), so Earth Engine is never a dependency; missing QA is treated as weight 0. Exported from the package; 9 tests. These weights feed the upcoming weighted reconstruction.

- examples/earthengine_torres_del_paine.ipynb: pull cloud-contaminated NDVI plus the raw QA band from MODIS / Landsat / Sentinel-2 over Patagonia via xee, then decode QA with phenopy.qa. GEE is an example bridge only (gee_to_xarray), not a library dependency. - pyproject: add optional [gee] extra (earthengine-api, xee, netCDF4). - plotting: migrate display_map off the deprecated pyproj.transform/Proj API to Transformer.from_crs(always_xy=True); also handles projected CRS correctly. - gitignore: ignore examples/data/ (locally cached EE downloads).

… + upper-envelope (wTSM/Chen) Add a weights= path to axis-1 reconstruction so per-observation QA weights (e.g. from phenopy.qa) drive the fit, plus an iterative upper-envelope method: - PhenoShape gains weights= (a (time, y, x) DataArray), threaded through _getPheno0/_getPheno to the reconstructor via a second apply_ufunc input. weights=None keeps the unweighted path byte-identical (golden tests unchanged). - whittaker: with weights, collapse duplicate (pooled) DOYs to a weighted mean and smooth the irregular series with per-observation weights. - new "upper_envelope" reconstructor (Chen 2004 / TIMESAT wTSM): iteratively lifts below-fit samples to the fitted curve, tracking the noise-free upper envelope without needing a QA band. Validated on a synthetic downward-contaminated series and the real cloudy Torres del Paine fixture, where both recover the vegetation signal clouds otherwise drag down (e.g. Landsat peak NDVI 0.46 -> 0.93). 5 new tests.

Now that weighted reconstruction is implemented, §6 compares unweighted vs QA-weighted Whittaker vs upper-envelope (wTSM/Chen) on a cloudy pixel, §8 is a recap, and §7 writes the cache to examples/data/ robustly whether the notebook's CWD is the repo root (VS Code) or the examples/ folder (Jupyter Lab).

In RMSE(segment=True) the helper grid was chunked with computed_stack.chunks, which is None for in-memory inputs -> .chunk(None) is deprecated. Chunk it only when the input is actually dask-backed, matching it via .chunksizes.

Extend the per-pixel LSP output from 16 to 18 metrics: - trough: the curve's minimum (baseline) value, in VI units - mos: middle of season, the DOY midpoint between SOS and EOS (NaN when LOS is) Both are appended to LSP_BANDS and the _getLSPmetrics2 array, so the existing 16 metrics keep their position and values; also wired into get_timeseries_metrics. Golden fixtures regenerated (18 metrics). 55 tests pass, ruff clean.

Add trough/mos to the §3 metric table and re-execute the notebook so the PhenoLSP output (and all downstream cells) reflect the 18-metric result.

…nce guide - benchmarks/bench.py: add an extraction benchmark (PhenoLSP eager vs chunked + Dask 'processes'), alongside the existing linear-reconstruction Numba vs pure-Python comparison; both assert parallel == serial. - docs/performance.md: a "which knob when" guide; document Dask scaling honestly (processes-scheduler start-up overhead -> modest gains on small in-memory rasters, real value is out-of-core), and the design note on why extraction has no Numba kernel (one reference implementation + Dask, rather than a second byte-identical copy).

'phenopy' is taken on PyPI by an unrelated clinical-phenotyping tool, so the library is renamed to PhenoSensing (Phenology + remote Sensing): - phenopy/ -> phenosensing/ (import phenosensing); the xarray accessor stays da.pheno - accessor module phenopy.py -> accessor.py (removes the phenopy.phenopy shadow) - update pyproject (name/paths/extras), README, docs, mkdocs, CI (--cov), environment-dev.yml (env name), benchmarks, every test, and both example notebooks (tutorial re-executed under the new name) - stop tracking stray __pycache__ / .ipynb_checkpoints that had been committed Behaviour unchanged: 55 tests pass, ruff clean, golden byte-identical.

JavierLopatin added 24 commits May 25, 2026 18:38

fix(rmse): avoid deprecated .chunk(None) in segmented RMSE

0ec8809

In RMSE(segment=True) the helper grid was chunked with computed_stack.chunks, which is None for in-memory inputs -> .chunk(None) is deprecated. Chunk it only when the input is actually dask-backed, matching it via .chunksizes.

docs(tutorial): refresh for the 18 LSP metrics (trough, mos)

646c2e0

Add trough/mos to the §3 metric table and re-execute the notebook so the PhenoLSP output (and all downstream cells) reflect the 18-metric result.

JavierLopatin merged commit 9f54065 into master May 27, 2026
2 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library overhaul#3

Library overhaul#3
JavierLopatin merged 24 commits into
masterfrom
library-overhaul

JavierLopatin commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JavierLopatin commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant