Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ module,last_inspected,issue,severity_max,categories_found,notes
aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings."
classify,2026-06-25,3508,MEDIUM,4;5,"Audited 2026-06-25 (agent-a5f16f6137723fc77 worktree, branch deep-sweep-metadata-classify-2026-06-25). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. All 10 public classifiers (binary/reclassify/quantile/natural_breaks/equal_interval/std_mean/head_tail_breaks/percentiles/maximum_breaks/box_plot) re-emit name=, dims=agg.dims, coords=agg.coords, attrs=agg.attrs at the xr.DataArray constructor, so Cat 1 attrs (res/crs/transform/nodatavals), Cat 2 coords (values+dtype), Cat 3 dims, and .name all preserved and identical across the 4 backends. NEW MEDIUM finding #3508 (Cat 4 + Cat 5): binary() output dtype differed by backend -- _cpu_binary allocated dtype=data.dtype so numpy/dask+numpy returned the input dtype while _run_cupy_binary used dtype='f4'. The docstring documents float32 and every other classifier emits float32 via _bin/_cpu_bin; binary was the only outlier, and for integer input the numpy path returned an integer dtype that can't hold the NaN sentinel. The _cpu_binary float32 fix + verify_dtype=True backend tests + a float64/float32/int32 dtype test landed on main via the duplicate accuracy-sweep PR #3514; PR #3513 was then rebased onto that and is now scoped to the remaining piece: _run_dask_cupy_binary passed an untyped meta=cupy.array(()) (float64) so the lazy dask+cupy array advertised float64 while computing float32 -- the same advertised-vs-computed mismatch class as aspect #2682 / focal #3217. #3513 types the meta as cupy.array((), dtype='f4') and asserts the lazy dtype in test_binary_dask_cupy. Full classify suite passes, GPU paths run live. The sibling classifiers' dask+cupy helpers (_run_dask_cupy_bin and friends) share the same untyped meta and likely the same latent lazy-dtype mismatch (out of scope, follow-up). Cat 4 nodatavals-vs-NaN is the library-wide attrs=agg.attrs convention, not classify-specific (documented, not fixed). No CRITICAL/HIGH/LOW findings."
contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)."
convolution,2026-07-02,3618,MEDIUM,4;5,"convolution_2d preserves name/attrs(crs,res,transform,nodatavals,_FillValue)/coords/dims on all 4 backends; MEDIUM: dask backends declared float64 via meta=np.array(()) while numpy/cupy return float32 for int/float32 input (declared dtype also != computed float32 chunks); fixed by passing promoted dtype into meta; issue #3618"
corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings."
cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim-<hash> from map_overlap, asarray-<hash> from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings."
focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
Expand Down
10 changes: 6 additions & 4 deletions xrspatial/convolution.py
Original file line number Diff line number Diff line change
Expand Up @@ -390,14 +390,15 @@ def _convolve_2d_numpy_boundary(data, kernel, boundary='nan'):


def _convolve_2d_dask_numpy(data, kernel, boundary='nan'):
data = data.astype(_promote_float(data.dtype))
fdtype = _promote_float(data.dtype)
data = data.astype(fdtype)
pad_h = kernel.shape[0] // 2
pad_w = kernel.shape[1] // 2
_func = partial(_convolve_2d_numpy, kernel=kernel)
out = data.map_overlap(_func,
depth=(pad_h, pad_w),
boundary=_boundary_to_dask(boundary),
meta=np.array(()),
meta=np.array((), dtype=fdtype),
**_dask_task_name_kwargs('xrspatial.convolve_2d'))
return out

Expand Down Expand Up @@ -465,14 +466,15 @@ def _convolve_2d_cupy(data, kernel, boundary='nan'):


def _convolve_2d_dask_cupy(data, kernel, boundary='nan'):
data = data.astype(_promote_float(data.dtype))
fdtype = _promote_float(data.dtype)
data = data.astype(fdtype)
pad_h = kernel.shape[0] // 2
pad_w = kernel.shape[1] // 2
_func = partial(_convolve_2d_cupy, kernel=kernel)
out = data.map_overlap(_func,
depth=(pad_h, pad_w),
boundary=_boundary_to_dask(boundary, is_cupy=True),
meta=cupy.array(()),
meta=cupy.array((), dtype=fdtype),
**_dask_task_name_kwargs('xrspatial.convolve_2d'))
return out

Expand Down
16 changes: 15 additions & 1 deletion xrspatial/tests/test_convolution.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,27 @@
import dask.array as da
import numpy as np
import pytest
import xarray as xr

from xrspatial.convolution import circle_kernel, convolve_2d, custom_kernel


KERNEL = circle_kernel(1, 1, 1)


@pytest.mark.parametrize("dtype", [np.int32, np.float32, np.float64])
def test_convolve_2d_dask_dtype_matches_numpy(dtype):
# The dask backend used to advertise float64 via ``meta=np.array(())``
# even when the eager numpy backend promoted an int32/float32 raster to
# float32, so ``result.dtype`` disagreed across backends and the lazy
# dtype did not match the computed chunks. The declared dask dtype must
# equal both the eager dtype and the actually-computed chunk dtype.
data = np.arange(64, dtype=dtype).reshape(8, 8)
eager = convolve_2d(data, KERNEL)
lazy = convolve_2d(da.from_array(data, chunks=(4, 4)), KERNEL)
assert lazy.dtype == eager.dtype
assert lazy.compute().dtype == eager.dtype


def test_convolve_2d_rejects_boolean_dtype():
# Boolean DataArrays used to crash deep inside numba with a
# cryptic TypingError; _validate_raster should reject up front.
Expand Down
Loading