CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283
Draft
leofang wants to merge 10 commits into
Draft
CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283leofang wants to merge 10 commits into
leofang wants to merge 10 commits into
Conversation
nightly-cuda-core: test the released cuda-core from PyPI against main-built pathfinder and cuda-bindings, catching the "core released × bindings main" gap documented in issue NVIDIA#1955. Runs on linux-64 (a100) and win-64 (a100 MCDM). nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda. Installs main pathfinder+bindings+core plus numba-cuda-mlir from PyPI, runs numba-cuda-mlir's own test suite from the matching git tag. Linux amd64/arm64 x CUDA 12.9.1 / 13.3.0. Both modes fetch the released version's tests from git tags because the respective wheels do not ship test_*.py files. Includes tag-not-found fallback (log warning + exit 0) to avoid red-lining the nightly on a freshly-cut PyPI release that hasn't been pushed to git yet.
The two ENV overrides intended to exercise the per-thread default stream code path were misspelled (missing the CUDA_ segment), so the env var was silently ignored and the PTDS coverage added in NVIDIA#1972 had no effect. Rename to the correct CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM. Refs NVIDIA#971.
….7.0)
nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward.
Contributor
This was
linked to
issues
Jul 1, 2026
Drop the linux-aarch64 rows and instead add win-64 coverage with the same CUDA 12.9.1 / 13.3.0 pair. Switch all four rows from GPU l4 to rtxpro6000. Windows rows use DRIVER_MODE MCDM, matching the existing rtxpro6000 CUDA 13.3.0 patterns.
Remove before merging.
Member
Author
|
/ok to test a0ccd19 |
…r tests
The initial approach used git inside the ubuntu:24.04 container to fetch
the released version's test suite, but git is not installed on that
container (install_unix_deps only pulls in jq/wget/g++/etc.) and its
absence made the run steps silently skip via the tag-not-fetchable
fallback. On Windows, git archive of just the cuda_core subtree also hit
a dangling-symlink extraction failure (cuda_core/.git_archival.txt).
Refactor to:
- run-tests: just install wheels and expose the resolved release version
(CUDA_CORE_RELEASED_VER / NUMBA_CUDA_MLIR_VER) and cuda-core test-group
name via GITHUB_ENV. No more git operations.
- test-wheel-{linux,windows}.yml: add an actions/checkout step per mode
that pulls the matching release tag into a subdirectory
(cuda-core-released / numba-cuda-mlir-released), then the follow-up
test step installs that tag's test dep-group and runs pytest.
For numba-cuda-mlir also pass --ignore=tests/benchmarks
--ignore=tests/doc_examples to pytest: those directories import the
`numba` package at module top and would fail collection, which is
cuSIMT's expected behavior (see NVIDIA/numba-cuda-mlir#136 — cuSIMT
intentionally does not depend on numba).
|
Member
Author
|
/ok to test 9490bd3 |
Two nightly failure fixups after the first green iteration: nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5 removed that name entirely, so test collection fails with "AttributeError: module 'numpy' has no attribute 'row_stack'". Cap numpy to <2.5. See NVIDIA/numba-cuda-mlir#154. nightly-cuda-core: released cuda-core v1.0.1's test suite uses a parametrize argvalues pattern that pytest 9.1 rejects ("in parametrize the number of names (1)... must be equal to the number of values (3)"). The main-side fix was NVIDIA#2212 but it has not shipped in a cuda-core release yet. Cap pytest to <9.1 for the released-cuda-core test run only.
Member
Author
|
/ok to test f3770d9 |
…htly-numba-cuda-mlir
Applied only in the affected nightly-* pytest invocations; the released
source trees under test are unmodified.
nightly-numba-cuda-mlir (all 10 tests deselected are from cuSIMT):
* CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync}
TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync,
test_launch_no_sync, test_launch_sync,
test_launch_sync_two_streams, test_fortran_contiguous}
Serial-pytest contamination of numba_cuda_mlir.cuda.cudadrv from an
xfailed test in test_nrt_comprehensive.py. Upstream CI runs with
`pytest -n auto --dist loadscope`, which isolates the offending
side effect in a separate xdist worker; our nightly runs serially
and hits the pollution. See NVIDIA/numba-cuda-mlir#135.
* TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn
Subprocess-invokes `cuobjdump`, which isn't on PATH in the base
ubuntu:24.04 container. Filed as an upstream skip-guard bug.
nightly-cuda-core (3 tests deselected are pre-existing v1.0.1 issues):
* test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion]
Expected drift: main cuda-bindings adds NvlinkVersion.VERSION_6_0
which v1.0.1's wrapper mapping predates. This mode intentionally
pairs released core with main bindings, so this coverage-style
test will stay red here until a cuda-core release catches up.
* test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
Environment-dependent test: expects rlcompleter to crash without
the tab-completion patch, but on Windows MCDM the pre-patch
behavior is clean. Passes on Linux, fails on Windows MCDM.
* test_memory.py::test_non_managed_resources_report_not_managed[pinned]
Same underlying "Failed to allocate memory from pool" error that
v1.0.1 already xfails in the sibling test_pinned_memory_resource_initialization
(TODO(#9999)). cuda-python main has since fixed the parametrized
case to route through _allocate_pinned_buffer_or_xfail(), but that
fix hasn't shipped in a cuda-core release yet.
Member
Author
|
/ok to test 7476a9f |
Previously applied the same list on both Linux and Windows workflows,
which over-deselected — some tests only fail on one platform because
the underlying issues (serial-pytest test-order in mlir, MCDM-only
behavior in cuda-core) are platform-specific.
Now:
nightly-numba-cuda-mlir
linux-64: TestCudaArrayInterface::{test_consume_no_sync,
test_consume_sync, test_launch_no_sync, test_launch_sync,
test_launch_sync_two_streams, test_fortran_contiguous}
+ TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn.
win-64: CudaArraySetting::{test_no_sync_default_stream,
test_no_sync_supplied_stream, test_sync}
+ TestCudaArrayInterface::test_fortran_contiguous.
Test-order contamination in numba-cuda-mlir#135 surfaces different
tests depending on collection order (linux-64 vs win-64 exercise
different subsets), so the per-platform lists differ. cuobjdump-based
TestLinkerDumpAssembly only fires on Linux because the ubuntu:24.04
container's PATH lacks cuobjdump; Windows runners ship it with the
local CTK.
nightly-cuda-core
linux-64: test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion].
win-64: NvlinkVersion (same as Linux)
+ test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
+ test_memory.py::test_non_managed_resources_report_not_managed[pinned].
rlcompleter and pinned mempool tests only fail on Windows MCDM.
NvlinkVersion fails on both (expected drift for the mode).
Each deselect is now wrapped in a bash conditional keyed on the installed release version. When a newer numba-cuda-mlir or cuda-core release ships with the referenced fix, the nightly picks it up automatically, the guard evaluates false, and the deselect drops — so the tests run against the new release. If they still fail we hear about it loudly rather than silently masking a regression. Current guards: - numba-cuda-mlir NVIDIA#135 tests + cuobjdump TestLinkerDumpAssembly: applied when installed numba-cuda-mlir version <= 0.4.0. - cuda-core NvlinkVersion / rlcompleter opt-out / pinned mempool: applied when installed cuda-core version <= 1.0.1. Structure keeps one conditional block per (mode, platform) with a comment above each deselect explaining the tracking issue.
Member
Author
|
/ok to test 01ac84e |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nightly-cuda-core: released cuda-core from PyPI against main-built pathfinder + bindings. Fills the "core released x bindings main" quadrant discussed here (EPIC: Setup cuda-python integration tests #1955 (comment)).nightly-numba-cuda-mlir: MLIR-backend companion tonightly-numba-cuda.test_*.py.CUDA_PYTHON_PER_THREAD_DEFAULT_STREAM→CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAMtypo inci/test-matrix.yml(CI: TestCUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1#971 (comment)).nvidia-nvshmem-cu{12,13}pin from\!=3.7.0to<3.7incuda_pathfinder/pyproject.toml(3.7.x breaks main).Refs #971, #1955.