Conversation
Summary of ChangesHello @Ohm-Rishabh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the profiling capabilities of the codebase by integrating NVTX range markers into key computational graphs and the training loop. This allows developers to gain deeper insights into performance bottlenecks using NVIDIA's profiling tools. Additionally, the changes include updates to a training test script to leverage these new profiling features and minor cleanups within the training pipeline for improved stability and data logging. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces NVTX range markers throughout the codebase to improve profiling capabilities with NVIDIA Nsight Systems. The changes are well-implemented, adding a new nvtx_range utility and applying it to key performance-critical sections in various models and layers. This will provide valuable, fine-grained insights into GPU execution.
My review includes a few suggestions to improve the portability of example and test scripts by removing hardcoded GPU device IDs. I also noted some leftover debug code and a significant out-of-scope change (removal of audio muxing logic) that would be better handled in a separate pull request to maintain clarity and focus.
I am having trouble creating individual review comments. Click here to see my feedback.
examples/training/finetune/wan_t2v_1.3B/crush_smol/preprocess_wan_data_t2v.sh (4)
Hardcoding CUDA_VISIBLE_DEVICES makes this example script difficult to run on different machines where GPU 2 might not be available or suitable. It would be more robust to either expect the user to set this environment variable before running the script or to parameterize it.
# Set the CUDA_VISIBLE_DEVICES environment variable to select a specific GPU, e.g.:
# export CUDA_VISIBLE_DEVICES=0
fastvideo/tests/training/Vanilla/mfu_calculation.py (36)
Hardcoding CUDA_VISIBLE_DEVICES in a test script reduces its portability and can cause failures on systems where the specified GPUs (5, 6) are not available. It's better to let the user or the test environment configure this. Please remove this line and let the environment (e.g., a CI runner or the user's shell) control which GPUs are visible to the script.
fastvideo/training/training_pipeline.py (835)
This commented-out return statement appears to be a leftover from a debugging session. It should be removed to keep the code clean.
fastvideo/training/training_pipeline.py (951-1042)
The removal of the _mux_audio static method and its related logic is a significant change. While this might be a valid cleanup, it seems unrelated to the main purpose of this pull request, which is to add NVTX tracers. Including unrelated changes makes the PR harder to review and understand. It would be better to submit this change in a separate PR with a descriptive title and explanation.
This is the pr for adding nvtx range markers in the codebase.
To get trace: python fastvideo/tests/training/Vanilla/mfu_calculation.py --profile
To add custom range: with nvtx_range("range name"): ....code block....