Dispatch preprocessing: port 6 more ops (bounds mask, coarse ijk, coords, inv index, avg pool, edge network) #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of #458 — stacked on #463
Summary
dispatch_table+ new dispatch tools (14 dispatch tables total counting fwd+bwd):ActiveVoxelsInBoundsMask—forEachActiveVoxel, device-only dispatch tableCoarseIjkForFineGrid—forEachActiveVoxel, device-only dispatch table (gains CPU support)CoordsInGrid—dispatch::for_each+jagged_in(device x int_stype x contiguity)IjkToInvIndex—dispatch::for_each+jagged_in(device x int_stype x contiguity)DownsampleGridAvgPool(fwd+bwd) —forEachActiveVoxelover coarse grid with dual-grid accessor (device x float_stype x contiguity)GridEdgeNetwork—forEachActiveVoxel, device-only, multi-output tensorsFVDB_DISPATCH_KERNEL_DEVICEfrom 9 call sites:GridBatch.cpp(3):coordsInGrid,ijkToInvIndex,gridEdgeNetworkGridBatchImpl.cu(1):activeVoxelsInBoundsMaskAvgPoolGrid.cpp(2): forward + backwardBuildCoarseGridFromFine.cu(2): CUDA + PrivateUse1 specializationsBuildGridForConv.cu(1): conv ijk shortcut pathtemplate <torch::DeviceType>)GridBatchcalls to ops are thin.New patterns established
ActiveVoxelsInBoundsMask,CoarseIjkForFineGrid,GridEdgeNetworkforEachJaggedElementChannel*proven unnecessary —CoordsInGridandIjkToInvIndexusefor_each+jagged_indirectly (old code usedforEachJaggedElementChannelCUDA/CPUwith numChannels=1)DownsampleGridAvgPooliterates coarse grid, reads fine grid via separate accessor (mirrorsUpsampleGridNearest)forEachActiveVoxel—GridEdgeNetworkwrites to 4 output tensors per voxelIteration pattern notes
DownsampleGridAvgPoolreplaces per-(voxel, channel) GPU thread parallelism with a per-voxel sequential channel loop (matchingUpsampleGridNearest). This trades per-channel parallelism for fewer redundant NanoVDB tree traversals in the pooling window. See comments in the op.ActiveVoxelsInBoundsMaskloses a leaf-levelhasOverlap(leaf.bbox())early-out that the old code had. The old check was per-thread (not cooperative), andforEachActiveVoxeldoes not expose the leaf reference. Documented in comments; a futureforEachLeaf-based variant could restore it.Test plan
FVDB_DISPATCH_KERNEL,AT_DISPATCH_V2) in any new/modified code