diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 3704566f7..75aa8f71b 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -2,6 +2,7 @@ module,last_inspected,issue,severity_max,categories_found,notes aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." classify,2026-06-25,3508,MEDIUM,4;5,"Audited 2026-06-25 (agent-a5f16f6137723fc77 worktree, branch deep-sweep-metadata-classify-2026-06-25). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. All 10 public classifiers (binary/reclassify/quantile/natural_breaks/equal_interval/std_mean/head_tail_breaks/percentiles/maximum_breaks/box_plot) re-emit name=, dims=agg.dims, coords=agg.coords, attrs=agg.attrs at the xr.DataArray constructor, so Cat 1 attrs (res/crs/transform/nodatavals), Cat 2 coords (values+dtype), Cat 3 dims, and .name all preserved and identical across the 4 backends. NEW MEDIUM finding #3508 (Cat 4 + Cat 5): binary() output dtype differed by backend -- _cpu_binary allocated dtype=data.dtype so numpy/dask+numpy returned the input dtype while _run_cupy_binary used dtype='f4'. The docstring documents float32 and every other classifier emits float32 via _bin/_cpu_bin; binary was the only outlier, and for integer input the numpy path returned an integer dtype that can't hold the NaN sentinel. The _cpu_binary float32 fix + verify_dtype=True backend tests + a float64/float32/int32 dtype test landed on main via the duplicate accuracy-sweep PR #3514; PR #3513 was then rebased onto that and is now scoped to the remaining piece: _run_dask_cupy_binary passed an untyped meta=cupy.array(()) (float64) so the lazy dask+cupy array advertised float64 while computing float32 -- the same advertised-vs-computed mismatch class as aspect #2682 / focal #3217. #3513 types the meta as cupy.array((), dtype='f4') and asserts the lazy dtype in test_binary_dask_cupy. Full classify suite passes, GPU paths run live. The sibling classifiers' dask+cupy helpers (_run_dask_cupy_bin and friends) share the same untyped meta and likely the same latent lazy-dtype mismatch (out of scope, follow-up). Cat 4 nodatavals-vs-NaN is the library-wide attrs=agg.attrs convention, not classify-specific (documented, not fixed). No CRITICAL/HIGH/LOW findings." contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." +convolution,2026-07-02,3618,MEDIUM,4;5,"convolution_2d preserves name/attrs(crs,res,transform,nodatavals,_FillValue)/coords/dims on all 4 backends; MEDIUM: dask backends declared float64 via meta=np.array(()) while numpy/cupy return float32 for int/float32 input (declared dtype also != computed float32 chunks); fixed by passing promoted dtype into meta; issue #3618" corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings." cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim- from map_overlap, asarray- from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings." focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." diff --git a/xrspatial/convolution.py b/xrspatial/convolution.py index b1b08bfb7..f14bc9be3 100644 --- a/xrspatial/convolution.py +++ b/xrspatial/convolution.py @@ -390,14 +390,15 @@ def _convolve_2d_numpy_boundary(data, kernel, boundary='nan'): def _convolve_2d_dask_numpy(data, kernel, boundary='nan'): - data = data.astype(_promote_float(data.dtype)) + fdtype = _promote_float(data.dtype) + data = data.astype(fdtype) pad_h = kernel.shape[0] // 2 pad_w = kernel.shape[1] // 2 _func = partial(_convolve_2d_numpy, kernel=kernel) out = data.map_overlap(_func, depth=(pad_h, pad_w), boundary=_boundary_to_dask(boundary), - meta=np.array(()), + meta=np.array((), dtype=fdtype), **_dask_task_name_kwargs('xrspatial.convolve_2d')) return out @@ -465,14 +466,15 @@ def _convolve_2d_cupy(data, kernel, boundary='nan'): def _convolve_2d_dask_cupy(data, kernel, boundary='nan'): - data = data.astype(_promote_float(data.dtype)) + fdtype = _promote_float(data.dtype) + data = data.astype(fdtype) pad_h = kernel.shape[0] // 2 pad_w = kernel.shape[1] // 2 _func = partial(_convolve_2d_cupy, kernel=kernel) out = data.map_overlap(_func, depth=(pad_h, pad_w), boundary=_boundary_to_dask(boundary, is_cupy=True), - meta=cupy.array(()), + meta=cupy.array((), dtype=fdtype), **_dask_task_name_kwargs('xrspatial.convolve_2d')) return out diff --git a/xrspatial/tests/test_convolution.py b/xrspatial/tests/test_convolution.py index dff4af4a7..e6d226f02 100644 --- a/xrspatial/tests/test_convolution.py +++ b/xrspatial/tests/test_convolution.py @@ -1,13 +1,27 @@ +import dask.array as da import numpy as np import pytest import xarray as xr from xrspatial.convolution import circle_kernel, convolve_2d, custom_kernel - KERNEL = circle_kernel(1, 1, 1) +@pytest.mark.parametrize("dtype", [np.int32, np.float32, np.float64]) +def test_convolve_2d_dask_dtype_matches_numpy(dtype): + # The dask backend used to advertise float64 via ``meta=np.array(())`` + # even when the eager numpy backend promoted an int32/float32 raster to + # float32, so ``result.dtype`` disagreed across backends and the lazy + # dtype did not match the computed chunks. The declared dask dtype must + # equal both the eager dtype and the actually-computed chunk dtype. + data = np.arange(64, dtype=dtype).reshape(8, 8) + eager = convolve_2d(data, KERNEL) + lazy = convolve_2d(da.from_array(data, chunks=(4, 4)), KERNEL) + assert lazy.dtype == eager.dtype + assert lazy.compute().dtype == eager.dtype + + def test_convolve_2d_rejects_boolean_dtype(): # Boolean DataArrays used to crash deep inside numba with a # cryptic TypingError; _validate_raster should reject up front.