CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes by leofang · Pull Request #2283 · NVIDIA/cuda-python

leofang · 2026-07-01T04:04:17Z

Summary

Add nightly-cuda-core: released cuda-core from PyPI against main-built pathfinder + bindings. Fills the "core released x bindings main" quadrant discussed here (EPIC: Setup cuda-python integration tests #1955 (comment)).
Add nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda.
Both fetch the release tag's test suite from git — neither wheel ships test_*.py.
Fix CUDA_PYTHON_PER_THREAD_DEFAULT_STREAM → CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM typo in ci/test-matrix.yml (CI: Test CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1 #971 (comment)).
Widen nvidia-nvshmem-cu{12,13} pin from \!=3.7.0 to <3.7 in cuda_pathfinder/pyproject.toml (3.7.x breaks main).

Refs #971, #1955.

nightly-cuda-core: test the released cuda-core from PyPI against main-built pathfinder and cuda-bindings, catching the "core released × bindings main" gap documented in issue NVIDIA#1955. Runs on linux-64 (a100) and win-64 (a100 MCDM). nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda. Installs main pathfinder+bindings+core plus numba-cuda-mlir from PyPI, runs numba-cuda-mlir's own test suite from the matching git tag. Linux amd64/arm64 x CUDA 12.9.1 / 13.3.0. Both modes fetch the released version's tests from git tags because the respective wheels do not ship test_*.py files. Includes tag-not-found fallback (log warning + exit 0) to avoid red-lining the nightly on a freshly-cut PyPI release that hasn't been pushed to git yet.

The two ENV overrides intended to exercise the per-thread default stream code path were misspelled (missing the CUDA_ segment), so the env var was silently ignored and the PTDS coverage added in NVIDIA#1972 had no effect. Rename to the correct CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM. Refs NVIDIA#971.

….7.0) nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward.

copy-pr-bot · 2026-07-01T04:04:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Drop the linux-aarch64 rows and instead add win-64 coverage with the same CUDA 12.9.1 / 13.3.0 pair. Switch all four rows from GPU l4 to rtxpro6000. Windows rows use DRIVER_MODE MCDM, matching the existing rtxpro6000 CUDA 13.3.0 patterns.

Remove before merging.

leofang · 2026-07-01T04:17:46Z

/ok to test a0ccd19

…r tests The initial approach used git inside the ubuntu:24.04 container to fetch the released version's test suite, but git is not installed on that container (install_unix_deps only pulls in jq/wget/g++/etc.) and its absence made the run steps silently skip via the tag-not-fetchable fallback. On Windows, git archive of just the cuda_core subtree also hit a dangling-symlink extraction failure (cuda_core/.git_archival.txt). Refactor to: - run-tests: just install wheels and expose the resolved release version (CUDA_CORE_RELEASED_VER / NUMBA_CUDA_MLIR_VER) and cuda-core test-group name via GITHUB_ENV. No more git operations. - test-wheel-{linux,windows}.yml: add an actions/checkout step per mode that pulls the matching release tag into a subdirectory (cuda-core-released / numba-cuda-mlir-released), then the follow-up test step installs that tag's test dep-group and runs pytest. For numba-cuda-mlir also pass --ignore=tests/benchmarks --ignore=tests/doc_examples to pytest: those directories import the `numba` package at module top and would fail collection, which is cuSIMT's expected behavior (see NVIDIA/numba-cuda-mlir#136 — cuSIMT intentionally does not depend on numba).

github-actions · 2026-07-01T04:36:56Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-2283/
https://nvidia.github.io/cuda-python/pr-preview/pr-2283/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-2283/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-2283/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

leofang · 2026-07-01T04:39:43Z

/ok to test 9490bd3

Two nightly failure fixups after the first green iteration: nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5 removed that name entirely, so test collection fails with "AttributeError: module 'numpy' has no attribute 'row_stack'". Cap numpy to <2.5. See NVIDIA/numba-cuda-mlir#154. nightly-cuda-core: released cuda-core v1.0.1's test suite uses a parametrize argvalues pattern that pytest 9.1 rejects ("in parametrize the number of names (1)... must be equal to the number of values (3)"). The main-side fix was NVIDIA#2212 but it has not shipped in a cuda-core release yet. Cap pytest to <9.1 for the released-cuda-core test run only.

leofang · 2026-07-01T04:55:02Z

/ok to test f3770d9

…htly-numba-cuda-mlir Applied only in the affected nightly-* pytest invocations; the released source trees under test are unmodified. nightly-numba-cuda-mlir (all 10 tests deselected are from cuSIMT): * CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync} TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync, test_launch_no_sync, test_launch_sync, test_launch_sync_two_streams, test_fortran_contiguous} Serial-pytest contamination of numba_cuda_mlir.cuda.cudadrv from an xfailed test in test_nrt_comprehensive.py. Upstream CI runs with `pytest -n auto --dist loadscope`, which isolates the offending side effect in a separate xdist worker; our nightly runs serially and hits the pollution. See NVIDIA/numba-cuda-mlir#135. * TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn Subprocess-invokes `cuobjdump`, which isn't on PATH in the base ubuntu:24.04 container. Filed as an upstream skip-guard bug. nightly-cuda-core (3 tests deselected are pre-existing v1.0.1 issues): * test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion] Expected drift: main cuda-bindings adds NvlinkVersion.VERSION_6_0 which v1.0.1's wrapper mapping predates. This mode intentionally pairs released core with main bindings, so this coverage-style test will stay red here until a cuda-core release catches up. * test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive Environment-dependent test: expects rlcompleter to crash without the tab-completion patch, but on Windows MCDM the pre-patch behavior is clean. Passes on Linux, fails on Windows MCDM. * test_memory.py::test_non_managed_resources_report_not_managed[pinned] Same underlying "Failed to allocate memory from pool" error that v1.0.1 already xfails in the sibling test_pinned_memory_resource_initialization (TODO(#9999)). cuda-python main has since fixed the parametrized case to route through _allocate_pinned_buffer_or_xfail(), but that fix hasn't shipped in a cuda-core release yet.

leofang · 2026-07-01T06:39:49Z

/ok to test 7476a9f

Previously applied the same list on both Linux and Windows workflows, which over-deselected — some tests only fail on one platform because the underlying issues (serial-pytest test-order in mlir, MCDM-only behavior in cuda-core) are platform-specific. Now: nightly-numba-cuda-mlir linux-64: TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync, test_launch_no_sync, test_launch_sync, test_launch_sync_two_streams, test_fortran_contiguous} + TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn. win-64: CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync} + TestCudaArrayInterface::test_fortran_contiguous. Test-order contamination in numba-cuda-mlir#135 surfaces different tests depending on collection order (linux-64 vs win-64 exercise different subsets), so the per-platform lists differ. cuobjdump-based TestLinkerDumpAssembly only fires on Linux because the ubuntu:24.04 container's PATH lacks cuobjdump; Windows runners ship it with the local CTK. nightly-cuda-core linux-64: test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion]. win-64: NvlinkVersion (same as Linux) + test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive + test_memory.py::test_non_managed_resources_report_not_managed[pinned]. rlcompleter and pinned mempool tests only fail on Windows MCDM. NvlinkVersion fails on both (expected drift for the mode).

Each deselect is now wrapped in a bash conditional keyed on the installed release version. When a newer numba-cuda-mlir or cuda-core release ships with the referenced fix, the nightly picks it up automatically, the guard evaluates false, and the deselect drops — so the tests run against the new release. If they still fail we hear about it loudly rather than silently masking a regression. Current guards: - numba-cuda-mlir NVIDIA#135 tests + cuobjdump TestLinkerDumpAssembly: applied when installed numba-cuda-mlir version <= 0.4.0. - cuda-core NvlinkVersion / rlcompleter opt-out / pinned mempool: applied when installed cuda-core version <= 1.0.1. Structure keeps one conditional block per (mode, platform) with a comment above each deselect explaining the tracking issue.

leofang · 2026-07-01T07:00:01Z

/ok to test 01ac84e

leofang added 3 commits July 1, 2026 03:53

cuda_pathfinder: pin nvshmem to <3.7 (was previously excluding only 3…

2a42aa7

….7.0) nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward.

github-actions Bot added CI/CD CI/CD infrastructure cuda.pathfinder Everything related to the cuda.pathfinder module labels Jul 1, 2026

This was linked to issues Jul 1, 2026

CI: Test CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1 #971

Open

EPIC: Setup cuda-python integration tests #1955

Open

leofang added 2 commits July 1, 2026 04:15

Temporarily add push trigger to ci-nightly.yml for testing

a0ccd19

Remove before merging.

leofang added bug Something isn't working enhancement Any code-related improvements labels Jul 1, 2026

leofang added 2 commits July 1, 2026 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283

CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283
leofang wants to merge 10 commits into
NVIDIA:mainfrom
leofang:leofang/nightly-cuda-core-and-mlir

leofang commented Jul 1, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

leofang commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

leofang commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leofang commented Jul 1, 2026 •

edited

Loading