-
Notifications
You must be signed in to change notification settings - Fork 650
Torchvision objective API PoC #6108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implemented: - Compose, - Resize, - CenterCrop Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
…oTensor Signed-off-by: Marek Dabek <[email protected]>
- nvcc for CUDA 12.4 and newer uses a different allocator than the default one from glibc and this clashes with asan. In the asan build we install plugins that are compiled on the fly. We cannot turn off sanitizers as DALI is imported and the python signatures are generated on the fly. This makes sure the extra env variable that is passed only to the stub generation process during video plugin installation test so we can preload asan only for this part and not affect the nvcc invocations. Signed-off-by: Janusz Lisiecki <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
…y checks in Tensor and TensorList python bindings (NVIDIA#6054) * Add layout dimensionality checks in Tensor and TensorList python bindings. * Fix layouts in backend_impl tests. * Fix layouts in FW iterator test. ----- Signed-off-by: Michal Zientkiewicz <[email protected]>
NVIDIA#6050) Signed-off-by: Joaquin Anton Guirao <[email protected]>
…IDIA#6058) - assumes that in the absence of a keyframe in the video, the first frame can be treated as such. - when reusing the old index, the number of frames in the video is not set according to the index but inferred from the video, which may not match the index build (as it can skip frames with negative timestamps). Signed-off-by: Janusz Lisiecki <[email protected]>
The Mode documentation main landing page Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
This PR brings two classes that are foundational for the Dynamic Mode: Tensor and Batch. Both classes can wrap either an actual object, backed by Tensor[List]<CPU|GPU> or InvocationResult or results of indexing or composition. A batch can be: * TensorList<CPU|GPU> * a list of Tensor objects A tensor can be: * Tensor<CPU|GPU> * a Batch and an index in the batch * a TensorSlice object There are also utility functions, tensor, as_tensor and batch, as_batch. The ones without as_ construct a tensor or batch, respectively, and perform a copy. The functions with as_ prefix avoid the copy, if possible. Batch access: as of this PR, Batch deliberately doesn't include __getitem__ or __len__. It can be iterated to yield tensors, but it cannot be indexed. To access the list of tensors, use .tensors property; to apply batched slicing, use .slice property. __getitem__ will be added later and will implement either tensor access or batched slicing, which remains to be decided. ------ Signed-off-by: Michal Zientkiewicz <[email protected]>
- Updates tensor shapes to include channel dimension (e.g., [3,5,6] -> [3,5,6,3]) - Changes layout specification from NHWC to HWC for non-batched tensors - Adjusts test data shapes to match 3D tensor expectations Signed-off-by: Janusz Lisiecki <[email protected]>
* Update deps 25/10 * Stopped conda build for Python 3.9 which is no longer maintained by conda Signed-off-by: Rafal Banas <[email protected]> Signed-off-by: Kamil Tokarski <[email protected]> Signed-off-by: Janusz Lisiecki <[email protected]> Co-authored-by: Kamil Tokarski <[email protected]> Co-authored-by: Janusz Lisiecki <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
…IA#6060) Operator generator: - generate operator classes, constructors and __call__ functions - generate operator functions for stateless operators Tests: - arithmetic op tests - slicing tests - conversion tests - RN50 pipeline Device: - accept torch device and torch-style device names (cuda == gpu) - operators now accept "gpu" device for "mixed" ops Fixes: - temporarily remove invocation result cache (leaky and not necessary without more sophisticated keys / CSE) - fix copying tensors to host (backend impl) --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
- Replaces CUDA_VERSION with CUDA_SUBVERSION=13.0.96-1 - Pins CUDA package to specific version (13.0.96-1) - update 2 Signed-off-by: Janusz Lisiecki <[email protected]>
* Rename DALI2 to dynamic * import dynamic as ndd --------- Signed-off-by: Michał Zientkiewicz <[email protected]>
* Add math functions and tests. * Move arithm op arguments to GPU if there's any GPU argument. * Forbid mixing of DALI CPU and GPU tensors/batches in arithmetic ops and math functions. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
- Dynamic mode augmentation gallery notebook - Adds Dynamic Mode section to the examples section of the documentation - Sphinx 8.x Compatibility Signed-off-by: Joaquin Anton Guirao <[email protected]>
* Update to FFmpeg 8.0 Signed-off-by: Janusz Lisiecki <[email protected]> Signed-off-by: Kamil Tokarski <[email protected]> Co-authored-by: Kamil Tokarski <[email protected]>
Signed-off-by: Janusz Lisiecki <[email protected]>
…A#6070) Signed-off-by: Michal Zientkiewicz <[email protected]>
* Fix memory order scoping * Adjust lambda captures * Remove the usage of std::is_pod. * Warning suppression. * Fix incorrect usage of enums. * Adjust compiler version in jupyter-conda test --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
Add bounding box rotation as a single op with options including how they should expand with the rotate, whether the image canvas was fixed when the image was rotated, and the box format (layout and normalization). This op prunes boxes and labels if keep_size=True and the box is truncated to a fraction below remove_threshold. --------- Signed-off-by: Bryce Ferenczi <[email protected]>
) Signed-off-by: Michał Zientkiewicz <[email protected]>
… pool. (NVIDIA#6072) * Add a busy list to CUDAStreamPool. Don't return busy streams from thread pool. * Extend stream pool tests. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
… bbox_rotate tests. (NVIDIA#6073) * Fix usage of std::optional in bbox rotate. Fix conversion to numpy in bbox_rotate tests. --------- Signed-off-by: Michał Zientkiewicz <[email protected]>
…ream 0. (NVIDIA#6071) * Use proper stream in TensorList and Tensor copy. * Fix usages of DynamicScratchpad. * Set non-host stream when setting last input for repeat-last inputs. * Fix C API tests. * Refactor copy stream/device selection Signed-off-by: Michał Zientkiewicz <[email protected]> * Review: copy_to_external. * WAR device selection for H2D and D2H copy and copy with special stream. Known issues: D2D copy across devices doesn't work with buffers allocated with VMM API, regardless of permissions used in `cuMemSetAccess`. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
…5988) * Document executor flags StreamPolicy and OperatorConcurrency --------- Signed-off-by: Michał Zientkiewicz <[email protected]>
- updates installation instructions to include --no-build-isolation flag for all nvidia-dali-tf-plugin pip install commands. This flag is necessary because pip's default build isolation prevents the installation process from detecting the installed TensorFlow version. - adds warnings to TensorFlow and Video plugin - adds --no-build-isolation to tests Signed-off-by: Janusz Lisiecki <[email protected]>
* Add name generation for dynamic API. * Limit the number of spelled-out arguments. * Adjust argument type name for dynamic API (use Tensor/Batch instead of TensorList) * Add docstrings for dynamic mode ops and functions. * Fix __call__ documentation. Update tests to look for mention of graph. Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
* Remove texture-based video processing in NvDecoder - Removes TextureObject class and related texture management code - Replaces texture-based convert_frame with direct VideoColorSpaceConversion - Removes textures_ member and get_textures method - Unifies the video reader with the experimental one by removing texture usage which don't provide much gain in this use case - fixes the usage of libswscale to avoid intermediate quantization in ycbcr->rgb video conversion Signed-off-by: Janusz Lisiecki <[email protected]> Signed-off-by: Michal Zientkiewicz <[email protected]> Co-authored-by: Michal Zientkiewicz <[email protected]>
… Device and Readers. (NVIDIA#6080) * Fill gaps in dynamic mode Tensor, Batch, Device, EvalContext and Readers documentation. * Fix typos (and likewise typos find elsewhere in the code) * Hide private functions (and remove their vestigial usages) Signed-off-by: Michal Zientkiewicz <[email protected]>
Make sure that there's at least one block descriptor for each sample in the batch. Signed-off-by: Michal Zientkiewicz <[email protected]>
* Add Philox32x4_10 generator for CPU. * Test vs cuRAND --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
…VIDIA#6087) Add support for passing random state to random operators in Dynamic Mode through a new _random_state argument that is hidden from the Python API and documentation. Changes: - Add OpSchema::AddRandomStateArg() function that creates a hidden optional argument accepting 1D uint32 tensor inputs - The argument name starts with underscore (_random_state) and is explicitly marked as hidden to prevent it from appearing in documentation - Apply AddRandomStateArg() to all random operators except readers: * random operators: Choice, Uniform, Normal, CoinFlip, BatchPermutation * noise operators: Gaussian, SaltAndPepper, Shot * segmentation operators: RandomMaskPixel, RandomObjectBBox * image operators: Jitter, RandomBBoxCrop, ROIRandomCrop * RandomCropAttr base schema - Add comprehensive Python tests verifying: * Argument is hidden from GetArgumentNames() * Argument is configured as tensor argument * Operators accept uint32 tensor inputs * Reader operators don't have the argument The _random_state argument enables the executor to pass RNG state to operators in Dynamic Mode without exposing it in the public Python API. Signed-off-by: Joaquin Anton Guirao <[email protected]>
…#6091) - Replace hardcoded clang 19 path with find command to locate __clang_cuda_runtime_wrapper.h across different clang versions. - This makes the Dockerfile compatible with various clang installations in manylinux builds. Signed-off-by: Janusz Lisiecki <[email protected]>
This PR adds some random distributions usable on both host and device. The distributions added in this PR: * uniform real (float and double) * uniform int (including up to full integer range) * uniform real over unit range [0..1] * normal standard (Box-Muller transform) * normal (scaled normal standard) * Bernoulli --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
Operator's keyword arguments are converted to their types as advertised in OpSchema. Add tests that verify that the necessary conversions take place and the unnecessary conversions are skipped. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
Generate documentation section for the Dynamic Mode. Signed-off-by: Szymon Karpiński <[email protected]>
The distribution carries only the mean value and the actual distribution is selected a evaluation time. The results are not consistent between CPU and GPU. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
…VIDIA#6084) * Remove the name "dynamic executor" from the docs and error messages * Remove redundant notice from getting_started notebook * Unify peek_image_shape and experimental.peek_image_shape docs --------- Signed-off-by: Michał Szołucha <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
| # We need to enable conditionals by default | ||
| try: | ||
| Pipeline.push_current(self) | ||
| self._conditionals_enabled = True |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class Warning
Pipeline
| try: | ||
| Pipeline.push_current(self) | ||
| self._conditionals_enabled = True | ||
| self._condition_stack = _conditionals._ConditionStack() |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class Warning
Pipeline
| def test_invalid_type(size): | ||
| with assert_raises(TypeError): | ||
| td = Compose([CenterCrop(size=size)]) | ||
| del td |
Check warning
Code scanning / CodeQL
Unnecessary delete statement in function Warning test
td
test_invalid_type
| def test_value_error(size): | ||
| with assert_raises(ValueError): | ||
| td = Compose([CenterCrop(size=size)]) | ||
| del td |
Check warning
Code scanning / CodeQL
Unnecessary delete statement in function Warning test
td
test_value_error
| def _infer_effective_size( | ||
| self, | ||
| size: Optional[Union[int, Sequence[int]]], | ||
| max_size: Optional[int] = None, | ||
| ) -> Tuple[int, int]: |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
| [DEPRECATED but used] | ||
| """ | ||
|
|
||
| def __init__(self): ... |
Check notice
Code scanning / CodeQL
Statement has no effect Note
| or (not isinstance(resize, int)) | ||
| ): | ||
| with assert_raises(ValueError): | ||
| td = Compose( |
Check notice
Code scanning / CodeQL
Unused local variable Note test
Greptile OverviewGreptile SummaryThis PR introduces a torchvision-compatible API for DALI, implementing common image transforms ( Key changes:
Issues found:
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant Compose
participant Pipeline
participant Transform
participant ExternalSource
participant DALIOps
User->>Compose: __call__(data_input)
Compose->>Compose: Convert PIL/Tensor to torch.Tensor
Compose->>Pipeline: build()
Compose->>ExternalSource: feed_input(data)
Compose->>Pipeline: run()
loop For each transform in op_list
Pipeline->>Transform: __call__(input_node)
Transform->>DALIOps: fn.crop/fn.flip/fn.resize/fn.pad
DALIOps-->>Transform: output_node
Transform-->>Pipeline: output_node
end
Pipeline-->>Compose: output TensorList
Compose->>Compose: to_torch_tensor(output)
Compose->>Compose: Convert back to PIL if needed
Compose-->>User: output (PIL/Tensor)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 files reviewed, 5 comments
Signed-off-by: Marek Dabek <[email protected]>
b5d4df5 to
df45283
Compare
Category:
New feature
Description:
Torchvision objective API.
This proof of concept is currently limited to few selected operators and composing them into a single pipeline.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4309