Prevent dynamic kernel construction during pending async work#4443
Prevent dynamic kernel construction during pending async work#4443shreyaskommuri wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
Guard cudaq.make_kernel while async sample or observe futures may still access the process-wide MLIR context, so users get an actionable error instead of a segfault. Constraint: Python dynamic kernels share a process-wide MLIR context that is unsafe to mutate while async execution is pending. Rejected: Making MLIR kernel construction fully thread-safe in this patch | too broad for the issue's minimum accepted behavior and likely crosses C++/MLIR ownership boundaries. Confidence: medium Scope-risk: moderate Directive: Keep dynamic kernel construction outside async dispatch loops unless the MLIR context ownership model is made thread-safe. Tested: python3 -m py_compile python/cudaq/kernel/utils.py python/cudaq/kernel/kernel_builder.py python/cudaq/runtime/sample.py python/cudaq/runtime/observe.py python/tests/builder/test_kernel_builder.py; git diff --check Not-tested: pytest target because local Python 3.14 environment does not have pytest installed; GPU/mqpu reproducer not run on this macOS CPU-only host. Co-authored-by: OmX <omx@oh-my-codex.dev>
Follow-up fixes on top of the original guardAfter reviewing the initial implementation, I found two correctness bugs and one missing test case. These are now fixed in the same commit. Bug 1 — `AsyncSampleResult.del` leaked the async-work countProblem: If a caller discards an `AsyncSampleResult` without ever calling `.get()` (e.g. fire-and-forget, or an exception skips the call), `_active_async_work_count` stays > 0 permanently. Every subsequent `cudaq.make_kernel()` call in that process raises `RuntimeError`, which is worse than the segfault the guard was meant to prevent. Fix: `del` now calls `unregister_async_work()` when `_async_work_registered` is still `True`. `unregister_async_work` already guards against going below zero, so a double-decrement is impossible. Bug 2 — `_AsyncObserveResult` had no `del`Problem: Same permanent-blockage issue on the observe path. The new `_AsyncObserveResult` class had `get()` correctly unregistering via `finally`, but nothing handled the GC path. Fix: Added `del` with the same pattern as the sample fix. New test — GC pathAdded `test_make_kernel_allowed_after_gc_of_async_result` to `test_kernel_builder.py`. The existing `test_make_kernel_rejects_pending_async_work` only exercised the `.get()` path; this new test uses `del future` to exercise `del` directly. |
AsyncSampleResult.__del__ and the new _AsyncObserveResult.__del__ now call unregister_async_work() when _async_work_registered is still True, covering the case where the caller drops the future without calling get(). Without this, _active_async_work_count stays > 0 permanently and every subsequent cudaq.make_kernel() raises RuntimeError. Also adds test_make_kernel_allowed_after_gc_of_async_result to exercise the __del__ path directly, and removes the redundant blank line before the literalinclude directive in multi_gpu_workflows.rst. Signed-off-by: shreyaskommuri <shreyaskommuri@gmail.com>
Summary
Fixes the minimum failure mode from #4359 by guarding `cudaq.make_kernel()` while async `sample_async` or `observe_async` work is pending. Instead of allowing unsafe concurrent access to the process-wide MLIR context and risking a segfault, CUDA-Q now raises an actionable `RuntimeError`.
This does not make dynamic kernel construction fully thread-safe; it documents and enforces the current constraint.
Changes
`python/cudaq/kernel/utils.py`
`python/cudaq/kernel/kernel_builder.py`
`python/cudaq/runtime/sample.py`
`python/cudaq/runtime/observe.py`
`docs/sphinx/using/examples/multi_gpu_workflows.rst`
Testing
Not tested