Torchvision objective API PoC #6108

mdabek-nvidia · 2025-12-04T10:52:16Z

Category:

New feature

Description:

Torchvision objective API.
This proof of concept is currently limited to few selected operators and composing them into a single pipeline.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-4309

Implemented: - Compose, - Resize, - CenterCrop Signed-off-by: Marek Dabek <[email protected]>

Signed-off-by: Marek Dabek <[email protected]>

…oTensor Signed-off-by: Marek Dabek <[email protected]>

- nvcc for CUDA 12.4 and newer uses a different allocator than the default one from glibc and this clashes with asan. In the asan build we install plugins that are compiled on the fly. We cannot turn off sanitizers as DALI is imported and the python signatures are generated on the fly. This makes sure the extra env variable that is passed only to the stub generation process during video plugin installation test so we can preload asan only for this part and not affect the nvcc invocations. Signed-off-by: Janusz Lisiecki <[email protected]>

Signed-off-by: Michał Zientkiewicz <[email protected]>

…y checks in Tensor and TensorList python bindings (NVIDIA#6054) * Add layout dimensionality checks in Tensor and TensorList python bindings. * Fix layouts in backend_impl tests. * Fix layouts in FW iterator test. ----- Signed-off-by: Michal Zientkiewicz <[email protected]>

NVIDIA#6050) Signed-off-by: Joaquin Anton Guirao <[email protected]>

…IDIA#6058) - assumes that in the absence of a keyframe in the video, the first frame can be treated as such. - when reusing the old index, the number of frames in the video is not set according to the index but inferred from the video, which may not match the index build (as it can skip frames with negative timestamps). Signed-off-by: Janusz Lisiecki <[email protected]>

The Mode documentation main landing page Signed-off-by: Marek Dabek <[email protected]>

Signed-off-by: Michał Zientkiewicz <[email protected]>

This PR brings two classes that are foundational for the Dynamic Mode: Tensor and Batch. Both classes can wrap either an actual object, backed by Tensor[List]<CPU|GPU> or InvocationResult or results of indexing or composition. A batch can be: * TensorList<CPU|GPU> * a list of Tensor objects A tensor can be: * Tensor<CPU|GPU> * a Batch and an index in the batch * a TensorSlice object There are also utility functions, tensor, as_tensor and batch, as_batch. The ones without as_ construct a tensor or batch, respectively, and perform a copy. The functions with as_ prefix avoid the copy, if possible. Batch access: as of this PR, Batch deliberately doesn't include __getitem__ or __len__. It can be iterated to yield tensors, but it cannot be indexed. To access the list of tensors, use .tensors property; to apply batched slicing, use .slice property. __getitem__ will be added later and will implement either tensor access or batched slicing, which remains to be decided. ------ Signed-off-by: Michal Zientkiewicz <[email protected]>

- Updates tensor shapes to include channel dimension (e.g., [3,5,6] -> [3,5,6,3]) - Changes layout specification from NHWC to HWC for non-batched tensors - Adjusts test data shapes to match 3D tensor expectations Signed-off-by: Janusz Lisiecki <[email protected]>

* Update deps 25/10 * Stopped conda build for Python 3.9 which is no longer maintained by conda Signed-off-by: Rafal Banas <[email protected]> Signed-off-by: Kamil Tokarski <[email protected]> Signed-off-by: Janusz Lisiecki <[email protected]> Co-authored-by: Kamil Tokarski <[email protected]> Co-authored-by: Janusz Lisiecki <[email protected]>

Signed-off-by: Michał Zientkiewicz <[email protected]>

Signed-off-by: Kamil Tokarski <[email protected]>

…IA#6060) Operator generator: - generate operator classes, constructors and __call__ functions - generate operator functions for stateless operators Tests: - arithmetic op tests - slicing tests - conversion tests - RN50 pipeline Device: - accept torch device and torch-style device names (cuda == gpu) - operators now accept "gpu" device for "mixed" ops Fixes: - temporarily remove invocation result cache (leaky and not necessary without more sophisticated keys / CSE) - fix copying tensors to host (backend impl) --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

- Replaces CUDA_VERSION with CUDA_SUBVERSION=13.0.96-1 - Pins CUDA package to specific version (13.0.96-1) - update 2 Signed-off-by: Janusz Lisiecki <[email protected]>

* Rename DALI2 to dynamic * import dynamic as ndd --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

* Add math functions and tests. * Move arithm op arguments to GPU if there's any GPU argument. * Forbid mixing of DALI CPU and GPU tensors/batches in arithmetic ops and math functions. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

- Dynamic mode augmentation gallery notebook - Adds Dynamic Mode section to the examples section of the documentation - Sphinx 8.x Compatibility Signed-off-by: Joaquin Anton Guirao <[email protected]>

* Update to FFmpeg 8.0 Signed-off-by: Janusz Lisiecki <[email protected]> Signed-off-by: Kamil Tokarski <[email protected]> Co-authored-by: Kamil Tokarski <[email protected]>

Signed-off-by: Janusz Lisiecki <[email protected]>

…A#6070) Signed-off-by: Michal Zientkiewicz <[email protected]>

* Fix memory order scoping * Adjust lambda captures * Remove the usage of std::is_pod. * Warning suppression. * Fix incorrect usage of enums. * Adjust compiler version in jupyter-conda test --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Add bounding box rotation as a single op with options including how they should expand with the rotate, whether the image canvas was fixed when the image was rotated, and the box format (layout and normalization). This op prunes boxes and labels if keep_size=True and the box is truncated to a fraction below remove_threshold. --------- Signed-off-by: Bryce Ferenczi <[email protected]>

) Signed-off-by: Michał Zientkiewicz <[email protected]>

… pool. (NVIDIA#6072) * Add a busy list to CUDAStreamPool. Don't return busy streams from thread pool. * Extend stream pool tests. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

… bbox_rotate tests. (NVIDIA#6073) * Fix usage of std::optional in bbox rotate. Fix conversion to numpy in bbox_rotate tests. --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

…ream 0. (NVIDIA#6071) * Use proper stream in TensorList and Tensor copy. * Fix usages of DynamicScratchpad. * Set non-host stream when setting last input for repeat-last inputs. * Fix C API tests. * Refactor copy stream/device selection Signed-off-by: Michał Zientkiewicz <[email protected]> * Review: copy_to_external. * WAR device selection for H2D and D2H copy and copy with special stream. Known issues: D2D copy across devices doesn't work with buffers allocated with VMM API, regardless of permissions used in `cuMemSetAccess`. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

…5988) * Document executor flags StreamPolicy and OperatorConcurrency --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

- updates installation instructions to include --no-build-isolation flag for all nvidia-dali-tf-plugin pip install commands. This flag is necessary because pip's default build isolation prevents the installation process from detecting the installed TensorFlow version. - adds warnings to TensorFlow and Video plugin - adds --no-build-isolation to tests Signed-off-by: Janusz Lisiecki <[email protected]>

* Add name generation for dynamic API. * Limit the number of spelled-out arguments. * Adjust argument type name for dynamic API (use Tensor/Batch instead of TensorList) * Add docstrings for dynamic mode ops and functions. * Fix __call__ documentation. Update tests to look for mention of graph. Signed-off-by: Michal Zientkiewicz <[email protected]>

Signed-off-by: Michal Zientkiewicz <[email protected]>

* Remove texture-based video processing in NvDecoder - Removes TextureObject class and related texture management code - Replaces texture-based convert_frame with direct VideoColorSpaceConversion - Removes textures_ member and get_textures method - Unifies the video reader with the experimental one by removing texture usage which don't provide much gain in this use case - fixes the usage of libswscale to avoid intermediate quantization in ycbcr->rgb video conversion Signed-off-by: Janusz Lisiecki <[email protected]> Signed-off-by: Michal Zientkiewicz <[email protected]> Co-authored-by: Michal Zientkiewicz <[email protected]>

… Device and Readers. (NVIDIA#6080) * Fill gaps in dynamic mode Tensor, Batch, Device, EvalContext and Readers documentation. * Fix typos (and likewise typos find elsewhere in the code) * Hide private functions (and remove their vestigial usages) Signed-off-by: Michal Zientkiewicz <[email protected]>

Make sure that there's at least one block descriptor for each sample in the batch. Signed-off-by: Michal Zientkiewicz <[email protected]>

* Add Philox32x4_10 generator for CPU. * Test vs cuRAND --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

…VIDIA#6087) Add support for passing random state to random operators in Dynamic Mode through a new _random_state argument that is hidden from the Python API and documentation. Changes: - Add OpSchema::AddRandomStateArg() function that creates a hidden optional argument accepting 1D uint32 tensor inputs - The argument name starts with underscore (_random_state) and is explicitly marked as hidden to prevent it from appearing in documentation - Apply AddRandomStateArg() to all random operators except readers: * random operators: Choice, Uniform, Normal, CoinFlip, BatchPermutation * noise operators: Gaussian, SaltAndPepper, Shot * segmentation operators: RandomMaskPixel, RandomObjectBBox * image operators: Jitter, RandomBBoxCrop, ROIRandomCrop * RandomCropAttr base schema - Add comprehensive Python tests verifying: * Argument is hidden from GetArgumentNames() * Argument is configured as tensor argument * Operators accept uint32 tensor inputs * Reader operators don't have the argument The _random_state argument enables the executor to pass RNG state to operators in Dynamic Mode without exposing it in the public Python API. Signed-off-by: Joaquin Anton Guirao <[email protected]>

…#6091) - Replace hardcoded clang 19 path with find command to locate __clang_cuda_runtime_wrapper.h across different clang versions. - This makes the Dockerfile compatible with various clang installations in manylinux builds. Signed-off-by: Janusz Lisiecki <[email protected]>

This PR adds some random distributions usable on both host and device. The distributions added in this PR: * uniform real (float and double) * uniform int (including up to full integer range) * uniform real over unit range [0..1] * normal standard (Box-Muller transform) * normal (scaled normal standard) * Bernoulli --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Operator's keyword arguments are converted to their types as advertised in OpSchema. Add tests that verify that the necessary conversions take place and the unnecessary conversions are skipped. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Generate documentation section for the Dynamic Mode. Signed-off-by: Szymon Karpiński <[email protected]>

The distribution carries only the mean value and the actual distribution is selected a evaluation time. The results are not consistent between CPU and GPU. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

…VIDIA#6084) * Remove the name "dynamic executor" from the docs and error messages * Remove redundant notice from getting_started notebook * Unify peek_image_shape and experimental.peek_image_shape docs --------- Signed-off-by: Michał Szołucha <[email protected]>

Signed-off-by: Marek Dabek <[email protected]>

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        # We need to enable conditionals by default
+        try:
+            Pipeline.push_current(self)
+            self._conditionals_enabled = True


dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        try:
+            Pipeline.push_current(self)
+            self._conditionals_enabled = True
+            self._condition_stack = _conditionals._ConditionStack()


dali/python/nvidia/dali/experimental/torchvision/v2/pad.py

dali/test/python/torchvision/test_tv_centercrop.py

+def test_invalid_type(size):
+    with assert_raises(TypeError):
+        td = Compose([CenterCrop(size=size)])
+        del td


dali/test/python/torchvision/test_tv_centercrop.py

+def test_value_error(size):
+    with assert_raises(ValueError):
+        td = Compose([CenterCrop(size=size)])
+        del td


dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

+    def _infer_effective_size(
+        self,
+        size: Optional[Union[int, Sequence[int]]],
+        max_size: Optional[int] = None,
+    ) -> Tuple[int, int]:


dali/python/nvidia/dali/experimental/torchvision/v2/tensor.py

+    [DEPRECATED but used]
+    """
+
+    def __init__(self): ...


dali/test/python/torchvision/test_tv_resize.py

+        or (not isinstance(resize, int))
+    ):
+        with assert_raises(ValueError):
+            td = Compose(


greptile-apps · 2025-12-04T10:55:19Z

Greptile Overview

Greptile Summary

This PR introduces a torchvision-compatible API for DALI, implementing common image transforms (Resize, CenterCrop, RandomHorizontalFlip, RandomVerticalFlip, Pad, ToTensor) that can be composed using a Compose class that extends nvidia.dali.Pipeline.

Key changes:

Added new nvidia.dali.experimental.torchvision module with v2-style transforms
Compose inherits from Pipeline and enables chaining transforms with conditional support via autograph
Supports both PIL Image and torch.Tensor inputs, with automatic conversion back to input type
Modified _conditionals.py to include the new torchvision module in autograph conversion

Issues found:

Syntax error in compose.py:167 - mode="RGB" incorrectly passed to append() instead of Image.fromarray()
Ignored device parameter in flips.py:29 - hardcoded to "cpu"
Ignored device parameter in pad.py:294 - hardcoded to "cpu"
Unreachable code in pad.py:131-132 - early return prevents concatenation
Debug print statement left in resize.py:128

Confidence Score: 2/5

Contains a syntax error that will cause runtime failures and several logic bugs affecting GPU execution paths
The PR has a clear syntax error in compose.py line 167 that will cause a TypeError when processing batched PIL images. Additionally, the device parameter is ignored in multiple places (flips.py, pad.py), preventing GPU execution even when explicitly requested. There's also unreachable code in pad.py suggesting incomplete implementation.
compose.py (syntax error), flips.py (device param ignored), pad.py (device param ignored + unreachable code)

Important Files Changed

File Analysis

Filename	Score	Overview
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py	2/5	Core pipeline composition class with syntax error in Image.fromarray() call at line 167 - mode="RGB" passed to append() instead of fromarray()
dali/python/nvidia/dali/experimental/torchvision/v2/flips.py	3/5	Random flip implementations with device parameter being ignored - always hardcoded to "cpu" at line 29
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py	3/5	Padding implementations with device parameter ignored at line 294 and unreachable code after early return at line 131
dali/python/nvidia/dali/experimental/torchvision/v2/resize.py	3/5	Resize implementation with debug print statement left at line 128 that should be removed
dali/python/nvidia/dali/experimental/torchvision/v2/centercrop.py	5/5	Center crop implementation with proper input validation and DALI fn.crop usage
dali/python/nvidia/dali/experimental/torchvision/v2/tensor.py	4/5	ToTensor implementation - simple conversion dividing by 255.0
dali/python/nvidia/dali/experimental/torchvision/init.py	5/5	Package exports for torchvision-compatible transforms
dali/python/nvidia/dali/_conditionals.py	5/5	Added torchvision module to autograph convert_modules list

Sequence Diagram

sequenceDiagram
    participant User
    participant Compose
    participant Pipeline
    participant Transform
    participant ExternalSource
    participant DALIOps

    User->>Compose: __call__(data_input)
    Compose->>Compose: Convert PIL/Tensor to torch.Tensor
    Compose->>Pipeline: build()
    Compose->>ExternalSource: feed_input(data)
    Compose->>Pipeline: run()
    
    loop For each transform in op_list
        Pipeline->>Transform: __call__(input_node)
        Transform->>DALIOps: fn.crop/fn.flip/fn.resize/fn.pad
        DALIOps-->>Transform: output_node
        Transform-->>Pipeline: output_node
    end
    
    Pipeline-->>Compose: output TensorList
    Compose->>Compose: to_torch_tensor(output)
    Compose->>Compose: Convert back to PIL if needed
    Compose-->>User: output (PIL/Tensor)

greptile-apps

_{13 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

dali/python/nvidia/dali/experimental/torchvision/v2/flips.py

dali/python/nvidia/dali/experimental/torchvision/v2/pad.py

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

Signed-off-by: Marek Dabek <[email protected]>

mdabek-nvidia and others added 30 commits October 2, 2025 19:56

Torchvision API initiali commit

0126df4

Implemented: - Compose, - Resize, - CenterCrop Signed-off-by: Marek Dabek <[email protected]>

Merge branch 'NVIDIA:main' into torchvision_api_poc

65ca209

Padding with different modes

0ff9c2f

Signed-off-by: Marek Dabek <[email protected]>

Automatic type translation in Compose operator and first attempt to T…

331006d

…oTensor Signed-off-by: Marek Dabek <[email protected]>

Fixed resize size calculation

d1784cf

Fixed resize size calculation

8994368

Fix standalone op output streams. (NVIDIA#6055)

2a81559

Signed-off-by: Michał Zientkiewicz <[email protected]>

Reduce minimum throughput for experimental decoder in TL1_decoder_perf (

fb920c6

NVIDIA#6050) Signed-off-by: Joaquin Anton Guirao <[email protected]>

DALI Dynamic docs main page (NVIDIA#6052)

a7c1d2c

The Mode documentation main landing page Signed-off-by: Marek Dabek <[email protected]>

Remove CMake from acknowledgements. (NVIDIA#6020)

b9a72f1

Signed-off-by: Michał Zientkiewicz <[email protected]>

Fix conversion of pinned tensors to DLPack. (NVIDIA#6061)

0bfd842

Signed-off-by: Michał Zientkiewicz <[email protected]>

Update VERSION to 1.53.0dev

e0ea559

Signed-off-by: Kamil Tokarski <[email protected]>

Move to CUDA 13.0 U2 (NVIDIA#6063)

0f6df0d

- Replaces CUDA_VERSION with CUDA_SUBVERSION=13.0.96-1 - Pins CUDA package to specific version (13.0.96-1) - update 2 Signed-off-by: Janusz Lisiecki <[email protected]>

Rename DALI2 to dynamic (NVIDIA#6064)

7c27767

* Rename DALI2 to dynamic * import dynamic as ndd --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

Dynamic mode: add augmentation gallery (NVIDIA#6057)

fbf93ff

- Dynamic mode augmentation gallery notebook - Adds Dynamic Mode section to the examples section of the documentation - Sphinx 8.x Compatibility Signed-off-by: Joaquin Anton Guirao <[email protected]>

Update to FFmpeg 8.0 (NVIDIA#6067)

dcccb63

* Update to FFmpeg 8.0 Signed-off-by: Janusz Lisiecki <[email protected]> Signed-off-by: Kamil Tokarski <[email protected]> Co-authored-by: Kamil Tokarski <[email protected]>

Fix warnings reported by flake8 (NVIDIA#6059)

ae94333

Signed-off-by: Janusz Lisiecki <[email protected]>

Fix stream ordering in Tensor::Copy and Tensor(List)GPU.as_cpu (NVIDI…

b29e7ec

…A#6070) Signed-off-by: Michal Zientkiewicz <[email protected]>

Switch to C++20 (NVIDIA#5962)

b54ee4d

* Fix memory order scoping * Adjust lambda captures * Remove the usage of std::is_pod. * Warning suppression. * Fix incorrect usage of enums. * Adjust compiler version in jupyter-conda test --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Make semaphore implementation selectable. Default is POSIX. (NVIDIA#6074

a5946fe

) Signed-off-by: Michał Zientkiewicz <[email protected]>

Add a busy list to CUDAStreamPool. Don't return busy streams from the…

f96253d

… pool. (NVIDIA#6072) * Add a busy list to CUDAStreamPool. Don't return busy streams from thread pool. * Extend stream pool tests. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

mzient and others added 22 commits November 27, 2025 12:08

Fix usage of std::optional in bbox rotate. Fix conversion to numpy in…

d33c365

… bbox_rotate tests. (NVIDIA#6073) * Fix usage of std::optional in bbox rotate. Fix conversion to numpy in bbox_rotate tests. --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

Document executor flags StreamPolicy and OperatorConcurrency (NVIDIA#…

ef3ed09

…5988) * Document executor flags StreamPolicy and OperatorConcurrency --------- Signed-off-by: Michał Zientkiewicz <[email protected]>

Coveriity check 2025.11. (NVIDIA#6081)

e930871

Signed-off-by: Michal Zientkiewicz <[email protected]>

Fix large batch handling in GPU random ops. (NVIDIA#6082)

fd7213a

Make sure that there's at least one block descriptor for each sample in the batch. Signed-off-by: Michal Zientkiewicz <[email protected]>

Add Philox32x4_10 generator for CPU. (NVIDIA#6088)

88d5cc2

* Add Philox32x4_10 generator for CPU. * Test vs cuRAND --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Generate Dynamic Mode documentation (NVIDIA#6085)

9007f8e

Generate documentation section for the Dynamic Mode. Signed-off-by: Szymon Karpiński <[email protected]>

Poisson distribution using cuRAND and STL (NVIDIA#6096)

b713f96

The distribution carries only the mean value and the actual distribution is selected a evaluation time. The results are not consistent between CPU and GPU. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

Merge branch 'main' into torchvision_api_poc

e8fa437

Merge branch 'NVIDIA:main' into torchvision_api_poc

647546c

Unit test adjustment and fixes

510de1b

Signed-off-by: Marek Dabek <[email protected]>

Input types handling and unit tests adjustment

6318a31

Signed-off-by: Marek Dabek <[email protected]>

Merge branch 'NVIDIA:main' into torchvision_api_poc

94aed99

mdabek-nvidia marked this pull request as draft December 4, 2025 10:52

github-advanced-security bot found potential problems Dec 4, 2025

View reviewed changes

greptile-apps bot reviewed Dec 4, 2025

View reviewed changes

Greptile fixes

df45283

Signed-off-by: Marek Dabek <[email protected]>

mdabek-nvidia force-pushed the torchvision_api_poc branch from b5d4df5 to df45283 Compare December 4, 2025 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchvision objective API PoC #6108

Torchvision objective API PoC #6108

Uh oh!

mdabek-nvidia commented Dec 4, 2025

Uh oh!

Check warning

Check warning

Uh oh!

Check warning

Check warning

Check notice

Check notice

Check notice

greptile-apps bot commented Dec 4, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Torchvision objective API PoC #6108

Are you sure you want to change the base?

Torchvision objective API PoC #6108

Uh oh!

Conversation

mdabek-nvidia commented Dec 4, 2025

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

Check warning

Uh oh!

Check warning

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

Check notice

Check notice

Check notice

greptile-apps bot commented Dec 4, 2025

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants