Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464

blackencino · 2026-02-11T07:42:52Z

Part of #458 — stacked on #463

Review note: This PR is stacked on PR1 (#463). The diff includes PR1's changes since both target main.
After PR1 merges, this PR will be rebased and the diff will show only PR2 changes.

Summary

Ported 6 additional ops to dispatch_table + new dispatch tools (14 dispatch tables total counting fwd+bwd):
- ActiveVoxelsInBoundsMask — forEachActiveVoxel, device-only dispatch table
- CoarseIjkForFineGrid — forEachActiveVoxel, device-only dispatch table (gains CPU support)
- CoordsInGrid — dispatch::for_each + jagged_in (device x int_stype x contiguity)
- IjkToInvIndex — dispatch::for_each + jagged_in (device x int_stype x contiguity)
- DownsampleGridAvgPool (fwd+bwd) — forEachActiveVoxel over coarse grid with dual-grid accessor (device x float_stype x contiguity)
- GridEdgeNetwork — forEachActiveVoxel, device-only, multi-output tensors
Removed FVDB_DISPATCH_KERNEL_DEVICE from 9 call sites:
- GridBatch.cpp (3): coordsInGrid, ijkToInvIndex, gridEdgeNetwork
- GridBatchImpl.cu (1): activeVoxelsInBoundsMask
- AvgPoolGrid.cpp (2): forward + backward
- BuildCoarseGridFromFine.cu (2): CUDA + PrivateUse1 specializations
- BuildGridForConv.cu (1): conv ijk shortcut path
All op headers are now type-erased (no template <torch::DeviceType>)
All precondition checks live inside the ops now, GridBatch calls to ops are thin.

New patterns established

Device-only dispatch tables (1 axis) — ActiveVoxelsInBoundsMask, CoarseIjkForFineGrid, GridEdgeNetwork
forEachJaggedElementChannel* proven unnecessary — CoordsInGrid and IjkToInvIndex use for_each + jagged_in directly (old code used forEachJaggedElementChannelCUDA/CPU with numChannels=1)
Dual-grid voxel iteration — DownsampleGridAvgPool iterates coarse grid, reads fine grid via separate accessor (mirrors UpsampleGridNearest)
Multi-output from forEachActiveVoxel — GridEdgeNetwork writes to 4 output tensors per voxel

Iteration pattern notes

DownsampleGridAvgPool replaces per-(voxel, channel) GPU thread parallelism with a per-voxel sequential channel loop (matching UpsampleGridNearest). This trades per-channel parallelism for fewer redundant NanoVDB tree traversals in the pooling window. See comments in the op.
ActiveVoxelsInBoundsMask loses a leaf-level hasOverlap(leaf.bbox()) early-out that the old code had. The old check was per-thread (not cooperative), and forEachActiveVoxel does not expose the leaf reference. Documented in comments; a future forEachLeaf-based variant could restore it.
All other ops have identical GPU memory access patterns to the old code.

Test plan

All gtests pass
Python tests pass
No old macros (FVDB_DISPATCH_KERNEL, AT_DISPATCH_V2) in any new/modified code

Signed-off-by: Christopher Horvath <[email protected]>

blackencino added 10 commits February 10, 2026 17:34

GridBatch C++ refactor proposal

c9ee8ee

Signed-off-by: Christopher Horvath <[email protected]>

Added a preprocessing do-this-first refactor proposal

e04a715

Signed-off-by: Christopher Horvath <[email protected]>

Fixed a slight conceptual error in preprocess

d8b6865

Signed-off-by: Christopher Horvath <[email protected]>

First round of op cleanup done, tests pass

46fc09b

Signed-off-by: Christopher Horvath <[email protected]>

Added a few more ops to new dispatch/forEach

2e8681d

Signed-off-by: Christopher Horvath <[email protected]>

Removed SimpleOpHelper, ported more ops to new dispatch/forEach

fa6f99e

Signed-off-by: Christopher Horvath <[email protected]>

Fixed include guards

f848435

Signed-off-by: Christopher Horvath <[email protected]>

Fixed include guards again

ab02807

Signed-off-by: Christopher Horvath <[email protected]>

Updated a slightly more complex op to new dispatch

8d5111a

Signed-off-by: Christopher Horvath <[email protected]>

Updated more ops to use new dispatch/forEach variants

5851164

Signed-off-by: Christopher Horvath <[email protected]>

blackencino requested a review from a team as a code owner February 11, 2026 07:42

blackencino requested review from areidmeyer and swahtz February 11, 2026 07:42

Fixed include guards. Forever and ever.

13bb91f

Signed-off-by: Christopher Horvath <[email protected]>

blackencino removed request for areidmeyer and swahtz February 11, 2026 08:01

Moved all precondition checks out of GridBatch.cpp and into the ops

5cf534a

Signed-off-by: Christopher Horvath <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464

Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464

Uh oh!

blackencino commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464

Are you sure you want to change the base?

Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464

Uh oh!

Conversation

blackencino commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New patterns established

Iteration pattern notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blackencino commented Feb 11, 2026 •

edited

Loading