Prevent dynamic kernel construction during pending async work by shreyaskommuri · Pull Request #4443 · NVIDIA/cuda-quantum

shreyaskommuri · 2026-05-03T20:11:25Z

Summary

Fixes the minimum failure mode from #4359 by guarding `cudaq.make_kernel()` while async `sample_async` or `observe_async` work is pending. Instead of allowing unsafe concurrent access to the process-wide MLIR context and risking a segfault, CUDA-Q now raises an actionable `RuntimeError`.

This does not make dynamic kernel construction fully thread-safe; it documents and enforces the current constraint.

Changes

`python/cudaq/kernel/utils.py`

Adds `_active_async_work_count` (protected by `threading.Lock`) and three helpers: `register_async_work`, `unregister_async_work`, `check_no_active_async_work`.

`python/cudaq/kernel/kernel_builder.py`

`make_kernel()` calls `check_no_active_async_work()` before touching the global MLIR context.
Adds a constraint note to the docstring.

`python/cudaq/runtime/sample.py`

`AsyncSampleResult` now accepts `async_work_registered` to avoid double-counting when constructed from `sample_async`.
`get()` unregisters async work in a `finally` block (correct even when `impl.get()` throws).
`del` unregisters async work when the result is discarded without calling `get()`, preventing the counter from leaking and permanently blocking `make_kernel()`.
`sample_async` registers before the async launch and unregisters on exception.

`python/cudaq/runtime/observe.py`

Adds `_AsyncObserveResult` wrapping the C++ future with the same register/unregister lifecycle as `AsyncSampleResult`, including `del`.
`observe_async` now returns `_AsyncObserveResult` instead of the raw C++ future.

`docs/sphinx/using/examples/multi_gpu_workflows.rst`

Adds a `.. note::` documenting the build-before-dispatch constraint for `make_kernel`.

Testing

`python3 -m py_compile` across all changed Python files — passes.
`git diff --check` — no whitespace errors.
Two new unit tests in `python/tests/builder/test_kernel_builder.py`:
- `test_make_kernel_rejects_pending_async_work` — verifies `make_kernel` raises while an `AsyncSampleResult` is live, then succeeds after `get()`.
- `test_make_kernel_allowed_after_gc_of_async_result` — verifies `make_kernel` succeeds after the result is dropped (`del`) without calling `get()`, exercising the `del` path.

Not tested

Targeted pytest, because local Python 3.14 environment does not have `pytest` installed.
GPU/mqpu reproducer, because this host is macOS CPU-only.

Guard cudaq.make_kernel while async sample or observe futures may still access the process-wide MLIR context, so users get an actionable error instead of a segfault. Constraint: Python dynamic kernels share a process-wide MLIR context that is unsafe to mutate while async execution is pending. Rejected: Making MLIR kernel construction fully thread-safe in this patch | too broad for the issue's minimum accepted behavior and likely crosses C++/MLIR ownership boundaries. Confidence: medium Scope-risk: moderate Directive: Keep dynamic kernel construction outside async dispatch loops unless the MLIR context ownership model is made thread-safe. Tested: python3 -m py_compile python/cudaq/kernel/utils.py python/cudaq/kernel/kernel_builder.py python/cudaq/runtime/sample.py python/cudaq/runtime/observe.py python/tests/builder/test_kernel_builder.py; git diff --check Not-tested: pytest target because local Python 3.14 environment does not have pytest installed; GPU/mqpu reproducer not run on this macOS CPU-only host. Co-authored-by: OmX <omx@oh-my-codex.dev>

copy-pr-bot · 2026-05-03T20:11:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

shreyaskommuri · 2026-05-06T17:32:44Z

Follow-up fixes on top of the original guard

After reviewing the initial implementation, I found two correctness bugs and one missing test case. These are now fixed in the same commit.

Bug 1 — `AsyncSampleResult.del` leaked the async-work count

Problem: If a caller discards an `AsyncSampleResult` without ever calling `.get()` (e.g. fire-and-forget, or an exception skips the call), `_active_async_work_count` stays > 0 permanently. Every subsequent `cudaq.make_kernel()` call in that process raises `RuntimeError`, which is worse than the segfault the guard was meant to prevent.

Fix: `del` now calls `unregister_async_work()` when `_async_work_registered` is still `True`. `unregister_async_work` already guards against going below zero, so a double-decrement is impossible.

Bug 2 — `_AsyncObserveResult` had no `del`

Problem: Same permanent-blockage issue on the observe path. The new `_AsyncObserveResult` class had `get()` correctly unregistering via `finally`, but nothing handled the GC path.

Fix: Added `del` with the same pattern as the sample fix.

New test — GC path

Added `test_make_kernel_allowed_after_gc_of_async_result` to `test_kernel_builder.py`. The existing `test_make_kernel_rejects_pending_async_work` only exercised the `.get()` path; this new test uses `del future` to exercise `del` directly.

AsyncSampleResult.__del__ and the new _AsyncObserveResult.__del__ now call unregister_async_work() when _async_work_registered is still True, covering the case where the caller drops the future without calling get(). Without this, _active_async_work_count stays > 0 permanently and every subsequent cudaq.make_kernel() raises RuntimeError. Also adds test_make_kernel_allowed_after_gc_of_async_result to exercise the __del__ path directly, and removes the redundant blank line before the literalinclude directive in multi_gpu_workflows.rst. Signed-off-by: shreyaskommuri <shreyaskommuri@gmail.com>

Merge branch 'main' into fix-make-kernel-async-guard

5878877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent dynamic kernel construction during pending async work#4443

Prevent dynamic kernel construction during pending async work#4443
shreyaskommuri wants to merge 3 commits intoNVIDIA:mainfrom
shreyaskommuri:fix-make-kernel-async-guard

shreyaskommuri commented May 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 3, 2026

Uh oh!

shreyaskommuri commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shreyaskommuri commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Not tested

Uh oh!

copy-pr-bot Bot commented May 3, 2026

Uh oh!

shreyaskommuri commented May 6, 2026

Follow-up fixes on top of the original guard

Bug 1 — `AsyncSampleResult.del` leaked the async-work count

Bug 2 — `_AsyncObserveResult` had no `del`

New test — GC path

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shreyaskommuri commented May 3, 2026 •

edited

Loading