Skip to content

CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283

Draft
leofang wants to merge 10 commits into
NVIDIA:mainfrom
leofang:leofang/nightly-cuda-core-and-mlir
Draft

CI: nightly-cuda-core + nightly-numba-cuda-mlir + PTDS/nvshmem fixes#2283
leofang wants to merge 10 commits into
NVIDIA:mainfrom
leofang:leofang/nightly-cuda-core-and-mlir

Conversation

@leofang

@leofang leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

  • Add nightly-cuda-core: released cuda-core from PyPI against main-built pathfinder + bindings. Fills the "core released x bindings main" quadrant discussed here (EPIC: Setup cuda-python integration tests #1955 (comment)).
  • Add nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda.
  • Both fetch the release tag's test suite from git — neither wheel ships test_*.py.
  • Fix CUDA_PYTHON_PER_THREAD_DEFAULT_STREAMCUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM typo in ci/test-matrix.yml (CI: Test CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1 #971 (comment)).
  • Widen nvidia-nvshmem-cu{12,13} pin from \!=3.7.0 to <3.7 in cuda_pathfinder/pyproject.toml (3.7.x breaks main).

Refs #971, #1955.

leofang added 3 commits July 1, 2026 03:53
nightly-cuda-core: test the released cuda-core from PyPI against
main-built pathfinder and cuda-bindings, catching the "core released ×
bindings main" gap documented in issue NVIDIA#1955. Runs on linux-64 (a100)
and win-64 (a100 MCDM).

nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda.
Installs main pathfinder+bindings+core plus numba-cuda-mlir from PyPI,
runs numba-cuda-mlir's own test suite from the matching git tag.
Linux amd64/arm64 x CUDA 12.9.1 / 13.3.0.

Both modes fetch the released version's tests from git tags because
the respective wheels do not ship test_*.py files. Includes
tag-not-found fallback (log warning + exit 0) to avoid red-lining the
nightly on a freshly-cut PyPI release that hasn't been pushed to git
yet.
The two ENV overrides intended to exercise the per-thread default
stream code path were misspelled (missing the CUDA_ segment), so the
env var was silently ignored and the PTDS coverage added in NVIDIA#1972 had
no effect. Rename to the correct
CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM.

Refs NVIDIA#971.
….7.0)

nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward.
@copy-pr-bot

copy-pr-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added CI/CD CI/CD infrastructure cuda.pathfinder Everything related to the cuda.pathfinder module labels Jul 1, 2026
leofang added 2 commits July 1, 2026 04:15
Drop the linux-aarch64 rows and instead add win-64 coverage with the
same CUDA 12.9.1 / 13.3.0 pair. Switch all four rows from GPU l4 to
rtxpro6000. Windows rows use DRIVER_MODE MCDM, matching the existing
rtxpro6000 CUDA 13.3.0 patterns.
@leofang leofang added bug Something isn't working enhancement Any code-related improvements labels Jul 1, 2026
@leofang

leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test a0ccd19

…r tests

The initial approach used git inside the ubuntu:24.04 container to fetch
the released version's test suite, but git is not installed on that
container (install_unix_deps only pulls in jq/wget/g++/etc.) and its
absence made the run steps silently skip via the tag-not-fetchable
fallback. On Windows, git archive of just the cuda_core subtree also hit
a dangling-symlink extraction failure (cuda_core/.git_archival.txt).

Refactor to:

- run-tests: just install wheels and expose the resolved release version
  (CUDA_CORE_RELEASED_VER / NUMBA_CUDA_MLIR_VER) and cuda-core test-group
  name via GITHUB_ENV. No more git operations.
- test-wheel-{linux,windows}.yml: add an actions/checkout step per mode
  that pulls the matching release tag into a subdirectory
  (cuda-core-released / numba-cuda-mlir-released), then the follow-up
  test step installs that tag's test dep-group and runs pytest.

For numba-cuda-mlir also pass --ignore=tests/benchmarks
--ignore=tests/doc_examples to pytest: those directories import the
`numba` package at module top and would fail collection, which is
cuSIMT's expected behavior (see NVIDIA/numba-cuda-mlir#136 — cuSIMT
intentionally does not depend on numba).
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

@leofang

leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test 9490bd3

Two nightly failure fixups after the first green iteration:

nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard
that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5
removed that name entirely, so test collection fails with
"AttributeError: module 'numpy' has no attribute 'row_stack'". Cap
numpy to <2.5. See NVIDIA/numba-cuda-mlir#154.

nightly-cuda-core: released cuda-core v1.0.1's test suite uses a
parametrize argvalues pattern that pytest 9.1 rejects
("in parametrize the number of names (1)... must be equal to the
number of values (3)"). The main-side fix was NVIDIA#2212 but it has not
shipped in a cuda-core release yet. Cap pytest to <9.1 for the
released-cuda-core test run only.
@leofang

leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test f3770d9

…htly-numba-cuda-mlir

Applied only in the affected nightly-* pytest invocations; the released
source trees under test are unmodified.

nightly-numba-cuda-mlir (all 10 tests deselected are from cuSIMT):

  * CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync}
    TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync,
                             test_launch_no_sync, test_launch_sync,
                             test_launch_sync_two_streams, test_fortran_contiguous}
      Serial-pytest contamination of numba_cuda_mlir.cuda.cudadrv from an
      xfailed test in test_nrt_comprehensive.py. Upstream CI runs with
      `pytest -n auto --dist loadscope`, which isolates the offending
      side effect in a separate xdist worker; our nightly runs serially
      and hits the pollution. See NVIDIA/numba-cuda-mlir#135.
  * TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn
      Subprocess-invokes `cuobjdump`, which isn't on PATH in the base
      ubuntu:24.04 container. Filed as an upstream skip-guard bug.

nightly-cuda-core (3 tests deselected are pre-existing v1.0.1 issues):

  * test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion]
      Expected drift: main cuda-bindings adds NvlinkVersion.VERSION_6_0
      which v1.0.1's wrapper mapping predates. This mode intentionally
      pairs released core with main bindings, so this coverage-style
      test will stay red here until a cuda-core release catches up.
  * test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
      Environment-dependent test: expects rlcompleter to crash without
      the tab-completion patch, but on Windows MCDM the pre-patch
      behavior is clean. Passes on Linux, fails on Windows MCDM.
  * test_memory.py::test_non_managed_resources_report_not_managed[pinned]
      Same underlying "Failed to allocate memory from pool" error that
      v1.0.1 already xfails in the sibling test_pinned_memory_resource_initialization
      (TODO(#9999)). cuda-python main has since fixed the parametrized
      case to route through _allocate_pinned_buffer_or_xfail(), but that
      fix hasn't shipped in a cuda-core release yet.
@leofang

leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test 7476a9f

leofang added 2 commits July 1, 2026 06:48
Previously applied the same list on both Linux and Windows workflows,
which over-deselected — some tests only fail on one platform because
the underlying issues (serial-pytest test-order in mlir, MCDM-only
behavior in cuda-core) are platform-specific.

Now:

nightly-numba-cuda-mlir
  linux-64: TestCudaArrayInterface::{test_consume_no_sync,
    test_consume_sync, test_launch_no_sync, test_launch_sync,
    test_launch_sync_two_streams, test_fortran_contiguous}
    + TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn.
  win-64: CudaArraySetting::{test_no_sync_default_stream,
    test_no_sync_supplied_stream, test_sync}
    + TestCudaArrayInterface::test_fortran_contiguous.

Test-order contamination in numba-cuda-mlir#135 surfaces different
tests depending on collection order (linux-64 vs win-64 exercise
different subsets), so the per-platform lists differ. cuobjdump-based
TestLinkerDumpAssembly only fires on Linux because the ubuntu:24.04
container's PATH lacks cuobjdump; Windows runners ship it with the
local CTK.

nightly-cuda-core
  linux-64: test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion].
  win-64: NvlinkVersion (same as Linux)
    + test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
    + test_memory.py::test_non_managed_resources_report_not_managed[pinned].

rlcompleter and pinned mempool tests only fail on Windows MCDM.
NvlinkVersion fails on both (expected drift for the mode).
Each deselect is now wrapped in a bash conditional keyed on the
installed release version. When a newer numba-cuda-mlir or cuda-core
release ships with the referenced fix, the nightly picks it up
automatically, the guard evaluates false, and the deselect drops — so
the tests run against the new release. If they still fail we hear
about it loudly rather than silently masking a regression.

Current guards:
- numba-cuda-mlir NVIDIA#135 tests + cuobjdump TestLinkerDumpAssembly:
  applied when installed numba-cuda-mlir version <= 0.4.0.
- cuda-core NvlinkVersion / rlcompleter opt-out / pinned mempool:
  applied when installed cuda-core version <= 1.0.1.

Structure keeps one conditional block per (mode, platform) with a
comment above each deselect explaining the tracking issue.
@leofang

leofang commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/ok to test 01ac84e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD CI/CD infrastructure cuda.pathfinder Everything related to the cuda.pathfinder module enhancement Any code-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EPIC: Setup cuda-python integration tests CI: Test CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM=1

1 participant