Library overhaul#3
Merged
Merged
Conversation
- Add phenopy/__init__.py to register the `pheno` xarray accessor and
expose the public API (was missing -> package was not importable)
- Fix broken relative imports (`from utils` -> `from .utils`)
- Fix latent bugs: raise('str')->ValueError (x3), bare except,
missing functools.reduce, scipy trapz->trapezoid (removed in
SciPy >=1.14), guard optional KDEpy import, repair computeChunkSize
- Remove dead/unused imports across modules; drop phantom odc.ui dep;
make folium/pyproj lazy (the [plot] extra)
- Add pyproject.toml (hatchling) with optional extras
([plot]/[dask]/[fit]/[kde]/[fast]/[test]/[docs])
- Add environment-dev.yml (conda dev env), .gitignore, and a pytest
smoke test; remove the root __init__.py footgun
Verified end-to-end (import + PhenoShape->PhenoLSP + pytest) on both the
legacy stack and a modern one (numpy 2.4 / scipy 1.17 / xarray 2026).
- Add phenopy/io.py with load_sample() for the bundled example datasets - Add golden regression tests freezing the 16 LSP metrics on the SIF and ndvi samples (a safety net before the Phase 2 composable refactor) - Make dask a core dependency (PhenoShape/PhenoLSP use chunk + map_blocks) - Add type hints to the public API and a py.typed marker - Lint and format the codebase with ruff (config in pyproject); exclude the user-maintained example notebook from tooling - Add GitHub Actions CI: lint + test matrix (py3.10-3.12 on Linux/macOS) and a build + twine-check job - Add MkDocs (Material + mkdocstrings) documentation scaffolding - Rewrite the README (install, quickstart, features; fix the broken logo path and the "Python < 3.6" typo) Verified: ruff clean, pytest green, wheel builds and passes twine check, docs build. The wheel bundles only the small SIF/ndvi samples (kndvi and figures are excluded).
- Add phenopy/reconstruction.py with a RECONSTRUCTORS registry and a get_reconstructor() dispatcher (unknown names fall back to scipy interp1d kinds, preserving the historical `interpolType` behavior) - Reimplement Wave-1 reconstruction methods from primary literature: Savitzky-Golay, Whittaker (via whittaker-eilers), double-logistic (Beck 2006, Elmore 2012) and asymmetric Gaussian; the parametric fits fall back to NaN per pixel on non-convergence so they map safely - Refactor utils._getPheno to dispatch through the registry; the default "linear" path stays exactly np.interp (golden tests unchanged), and an optional recon_params is threaded through the pipeline - Move _KDE into reconstruction.py - Add unit tests (recover a known double-logistic, SG denoises, fallback)
- Add phenopy/extraction.py with an EXTRACTORS registry and get_extractor(): seasonal_median (the historical phentype=1), trs (relative-amplitude threshold), der (derivative extrema) and curvature (inflection) - Refactor _getLSPmetrics2 to delegate SOS/EOS to the selected extractor; the default reproduces the median method exactly (golden unchanged) - Expose `extraction`/`extract_params` on PhenoLSP (threaded via _parseLSP) - Add unit tests (SOS<=POS<=EOS per method, TRS threshold, PhenoLSP runs with every extractor on the sample data)
- Thread recon_params through PhenoShape, and recon_params / extraction / extract_params through get_timeseries_metrics, so the interannual moving-window orchestrator can drive any reconstruction x extraction combination (e.g. dlog_beck x moving-window x trs) - Add phenopy/trends.py: per-pixel Theil-Sen slope + Mann-Kendall p-value over a metric time series (apply_ufunc, dask-aware); Mann-Kendall is implemented inline so trends need only numpy + scipy - Export trend, list_reconstructors, list_extractors from the package - Tests: trend sign/significance/edge cases, and a get_timeseries_metrics smoke test confirming the extraction axis is threaded through
- Replace the map_blocks + template + computePheno-state machinery in PhenoShape/PhenoLSP with xarray.apply_ufunc(vectorize=True, dask="parallelized"). PhenoLSP/RMSE now read xnew from the 'doy' dimension instead of fragile accessor state - Add a `chunks` argument to PhenoShape for out-of-core / chunked processing (the time axis is kept whole); remove the now-dead _getPheno2D / _parseLSP / _assemble helpers - Verified numerically identical to the previous implementation (golden tests unchanged) and that the chunked Dask path matches eager exactly - Add RMSE + Dask-equivalence tests, a benchmark (benchmarks/bench.py), and a performance docs page (chunking, schedulers, the GIL caveat) Note: raw CPU speed-up for the pure-Python kernels needs a GIL-releasing Numba kernel (the planned [fast] extra); today the Dask path's win is memory / out-of-core scalability, not wall-clock on in-memory rasters.
- Add phenopy/_numba.py with an @njit(parallel=True) kernel that reconstructs a (time, y, x) cube via linear interpolation + moving average in compiled, GIL-released code (with a pure-Python no-op fallback when numba is absent) - PhenoShape auto-uses it when eligible (linear method, in-memory, no NaNs, no custom params). The DOY sort index is computed in NumPy and passed in so the tie-ordering of duplicate DOYs matches the pure-Python path exactly (important for multi-year pooling, e.g. the SIF sample) - ~11x faster linear PhenoShape on an 8500-pixel x 920-step raster, with identical results: the golden tests pass with the numba path active, and an explicit numba-vs-pure-Python equivalence test is added - Update benchmarks/bench.py and the performance docs
- Add phenopy/anomaly.py with anomaly(): per-pixel deviation from the climatological PhenoShape, a standardised z-score, and an empirical RFD percentile (Reference Frequency Distribution, npphen-style). Values near 0 are extreme negative anomalies, near 1 extreme positive, ~0.5 typical - KDE-capable climatology (interpolType="KDE" gives the fully non-parametric reference); works with any reconstruction method otherwise - Export phenopy.anomaly; add tests (detects an injected dry year) and wire trends + anomalies into the docs (API + quickstart)
- Add phenopy/season.py with n_seasons(phenoshape): per-pixel count of growing cycles using scipy.signal.find_peaks with an amplitude-relative prominence threshold, so noise isn't counted as a season. Handles the doy dim/coord like get_curvature and is dask-aware - Export phenopy.n_seasons; add tests (bimodal=2, unimodal=1, flat=0, and the time-dim+doy-coord input) and wire it into the docs (API + quickstart)
- Add phenopy/uncertainty.py with uncertainty(): a per-pixel bootstrap that resamples the observations with replacement, recomputes the 16 LSP metrics for each replicate, and returns their standard deviation (a non-parametric estimate of each metric's sampling uncertainty) - Centralise the 16-band order as utils.LSP_BANDS (single source of truth; the Pheno accessor now references it) -- golden output is unchanged - Export phenopy.uncertainty; add a test and wire it into the docs
- Add examples/tutorial.ipynb: an English, end-to-end tutorial exercising every public function (load_sample/list_*; all 8 reconstruction methods; all 4 extraction methods; PhenoShape/PhenoLSP/RMSE; get_curvature/ classify_vector_numeric; n_seasons; reorder_southern_hemisphere; get_timeseries_metrics; trend; anomaly; uncertainty; chunking/Numba; and the PhenoPlot/plot_with_southern_doy/display_map helpers) on the bundled SIF/ndvi samples. Executed end-to-end with zero errors and plots embedded. - A final section documents what the bundled data cannot cover and the data needed (large-scale/out-of-core, Northern-Hemisphere phenology, multi-cropping for NOS, cloud/QA-flagged data for robustness and the planned QA weighting, and an independent reference for anomaly skill). - Add ipykernel + nbconvert to the dev env and the [docs] extra; link the tutorial from the README.
reorder_southern_hemisphere relabels `doy` as a day-of-season index (still 1-365, starting in July), so a plain plot keeps a 1-365 axis. The tutorial now shows a before/after (calendar DOY with the austral peak split at New Year vs the reordered, centred curve) and uses plot_with_southern_doy on the reordered shape to display the real Southern-Hemisphere calendar DOY (...183, 365, 1, 182).
… many years - southern=True maps the x-axis to the Southern-Hemisphere day-of-year (austral season centred) and relabels ticks to the real SH calendar DOY - the per-year legend is placed outside the plot, on the left, by default - with many years (> many_years, default 8) the points use a continuous colormap (cmap, default "viridis") + a year colorbar instead of a large legend - honour the nan_replace argument; save with bbox_inches="tight" - add Agg-backend smoke tests; the tutorial's PhenoPlot cell now uses southern=True
_relabel_southern called set_xticks(get_xticks()); because the tick locator's values overshoot the data, the view expanded to ~(-100, 400), so the curve (day-of-season ~2-359) looked compressed and "not continuous" across the axis. Now the ticks are filtered to the current view and xlim is restored, so the curve fills a data-driven axis (~[-16, 377]) with correct SH calendar-DOY labels (e.g. 183, 283, 18, 118).
Enrich the 14 markdown sections of examples/tutorial.ipynb so each function documents what it does, its key parameters, and how to read its output: - intro: add the 3 composable axes (reconstruction / extraction / temporal) - PhenoLSP: table of the 16 LSP metrics + the 4 SOS/EOS extractors - get_curvature: correct to a per-pixel scalar (summed 2nd differences), distinct from the curvature extractor - n_seasons: prominence/distance/min_height and NOS>=2 interpretation - PhenoShape, RMSE, get_timeseries_metrics, trend, anomaly, uncertainty, performance and plotting sections expanded Markdown-only edits; executed code outputs are unchanged.
phenopy.qa.qa_to_weight decodes any xarray quality band into per-observation weights in [0, 1], covering the four common QA patterns -- remap, good_classes, bad_bits, max_value/min_value -- via a declarative dict, a built-in registry key (MOD13Q1, LANDSAT_C2, S2_SCL), or a custom callable for exotic sensors. It is sensor- and source-agnostic (works on EE, local rasters, NetCDF), so Earth Engine is never a dependency; missing QA is treated as weight 0. Exported from the package; 9 tests. These weights feed the upcoming weighted reconstruction.
- examples/earthengine_torres_del_paine.ipynb: pull cloud-contaminated NDVI plus the raw QA band from MODIS / Landsat / Sentinel-2 over Patagonia via xee, then decode QA with phenopy.qa. GEE is an example bridge only (gee_to_xarray), not a library dependency. - pyproject: add optional [gee] extra (earthengine-api, xee, netCDF4). - plotting: migrate display_map off the deprecated pyproj.transform/Proj API to Transformer.from_crs(always_xy=True); also handles projected CRS correctly. - gitignore: ignore examples/data/ (locally cached EE downloads).
… + upper-envelope (wTSM/Chen) Add a weights= path to axis-1 reconstruction so per-observation QA weights (e.g. from phenopy.qa) drive the fit, plus an iterative upper-envelope method: - PhenoShape gains weights= (a (time, y, x) DataArray), threaded through _getPheno0/_getPheno to the reconstructor via a second apply_ufunc input. weights=None keeps the unweighted path byte-identical (golden tests unchanged). - whittaker: with weights, collapse duplicate (pooled) DOYs to a weighted mean and smooth the irregular series with per-observation weights. - new "upper_envelope" reconstructor (Chen 2004 / TIMESAT wTSM): iteratively lifts below-fit samples to the fitted curve, tracking the noise-free upper envelope without needing a QA band. Validated on a synthetic downward-contaminated series and the real cloudy Torres del Paine fixture, where both recover the vegetation signal clouds otherwise drag down (e.g. Landsat peak NDVI 0.46 -> 0.93). 5 new tests.
Now that weighted reconstruction is implemented, §6 compares unweighted vs QA-weighted Whittaker vs upper-envelope (wTSM/Chen) on a cloudy pixel, §8 is a recap, and §7 writes the cache to examples/data/ robustly whether the notebook's CWD is the repo root (VS Code) or the examples/ folder (Jupyter Lab).
In RMSE(segment=True) the helper grid was chunked with computed_stack.chunks, which is None for in-memory inputs -> .chunk(None) is deprecated. Chunk it only when the input is actually dask-backed, matching it via .chunksizes.
Extend the per-pixel LSP output from 16 to 18 metrics: - trough: the curve's minimum (baseline) value, in VI units - mos: middle of season, the DOY midpoint between SOS and EOS (NaN when LOS is) Both are appended to LSP_BANDS and the _getLSPmetrics2 array, so the existing 16 metrics keep their position and values; also wired into get_timeseries_metrics. Golden fixtures regenerated (18 metrics). 55 tests pass, ruff clean.
Add trough/mos to the §3 metric table and re-execute the notebook so the PhenoLSP output (and all downstream cells) reflect the 18-metric result.
…nce guide - benchmarks/bench.py: add an extraction benchmark (PhenoLSP eager vs chunked + Dask 'processes'), alongside the existing linear-reconstruction Numba vs pure-Python comparison; both assert parallel == serial. - docs/performance.md: a "which knob when" guide; document Dask scaling honestly (processes-scheduler start-up overhead -> modest gains on small in-memory rasters, real value is out-of-core), and the design note on why extraction has no Numba kernel (one reference implementation + Dask, rather than a second byte-identical copy).
'phenopy' is taken on PyPI by an unrelated clinical-phenotyping tool, so the library is renamed to PhenoSensing (Phenology + remote Sensing): - phenopy/ -> phenosensing/ (import phenosensing); the xarray accessor stays da.pheno - accessor module phenopy.py -> accessor.py (removes the phenopy.phenopy shadow) - update pyproject (name/paths/extras), README, docs, mkdocs, CI (--cov), environment-dev.yml (env name), benchmarks, every test, and both example notebooks (tutorial re-executed under the new name) - stop tracking stray __pycache__ / .ipynb_checkpoints that had been committed Behaviour unchanged: 55 tests pass, ruff clean, golden byte-identical.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.