Skip to content

Updated sample semantics#4418

Open
anpaz wants to merge 4 commits intoNVIDIA:mainfrom
anpaz:issue-4153
Open

Updated sample semantics#4418
anpaz wants to merge 4 commits intoNVIDIA:mainfrom
anpaz:issue-4153

Conversation

@anpaz
Copy link
Copy Markdown
Collaborator

@anpaz anpaz commented Apr 30, 2026

Summary

Fixes #4153 by updating cudaq::sample / cudaq.sample measurement result semantics so sampled bitstrings follow user measurement order when measurements are present; if the kernel includes no measurements, the current behavior of returning a bitstring based on the allocation order remains.

What Changed

  • Updated default sample semantics:
    • Kernels with no measurements use implicit final sampling over allocated qubits.
    • Kernels with measurements normally return __global__ bitstrings in measurement/program order.
    • Terminal mz / mx / my measurements that already follow allocation order remain allocation-order compatible.
    • explicit_measurements=False is rejected when it would change returned bitstrings.
  • Added conservative MLIR measurement analysis and metadata to record when a kernel requires explicit measurement-order semantics.
  • Updated Python sample / sample_async default explicit_measurements to auto mode via None.
  • Kept C++ sample_options::explicit_measurements as a bool, but documented it as a deprecated compatibility option.
  • Preserved named measurement registers in sample_result where available for compatibility.
  • Updated docs to describe the result contract in terms of outcome semantics rather than implementation fast paths.
  • Moved the QIR Base Profile measurement-order verifier into shared verifier code so both runtime and cudaq-translate paths can use it.
  • Updated affected tests and added coverage for measurement-order defaults, allocation-order-compatible measurements, rejected legacy requests, no-measurement kernels, Python sync/async, and C++ target tests.

Note: explicit_measurements is now deprecated as a user-facing semantic switch. Users should rely on the default sample behavior: kernels with measurements return results in measurement order when that affects the bitstring, while kernels without measurements use implicit final allocation-order sampling. The flag remains mainly as a compatibility/backend capability mechanism: explicit_measurements=False requests legacy allocation-order behavior and is only accepted when it does not change the returned outcomes, while targets that cannot support measurement-order sampling can reject kernels that require it.

anpaz added 2 commits April 29, 2026 16:43
Signed-off-by: Andres Paz <andresp@nvidia.com>
Signed-off-by: Andres Paz <andresp@nvidia.com>
@anpaz anpaz marked this pull request as draft April 30, 2026 04:07
@anpaz anpaz requested a review from khalatepradnya April 30, 2026 16:10
github-actions Bot pushed a commit that referenced this pull request Apr 30, 2026
@github-actions
Copy link
Copy Markdown

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

Copy link
Copy Markdown
Collaborator

@schweitzpgi schweitzpgi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the compiler makes no guarantee to maintain qubits or maintain their relative order. The correct approach is to tag measurement operations with identifiers (StringAttr) and post-process them as needed.

/// useful after transformations such as measurement expansion, loop unrolling,
/// and allocation combining, which can expose a more precise measurement shape
/// than was available earlier in the pipeline.
void addQuakeMetadataRefresh(mlir::OpPassManager &pm);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have several "metadata" passes. Do we need yet another one?

pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddDeallocs());
pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddMetadata());
pm.addPass(cudaq::opt::createQuakePropagateMetadata());
cudaq::opt::addQuakeMetadataRefresh(pm);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears "no" is the answer to my question.

* the terms of the Apache License 2.0 which accompanies this distribution. *
******************************************************************************/

#pragma once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be in the wrong place.

Is it a pure analysis? It appears to be. Why is it entirely implemented in a header file? Why not provide an API? It looks to be used from both the runtime and the new "refresh metadata" pass. Why?

@anpaz anpaz marked this pull request as ready for review May 1, 2026 03:52
Copy link
Copy Markdown
Collaborator

@khalatepradnya khalatepradnya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I see that the default sample semantics have been updated so measurement order is preserved automatically when needed, and that seems like the right user-facing direction.

One thing I am still not fully understanding: does this PR address the original performance concern in #4153? My reading is that kernels requiring explicit measurement-order semantics will still use explicitMeasurements internally, and for non-Stim local simulators that do not support buffered explicit sampling, that path still appears to execute one shot at a time. So the default result would now be semantically correct, but the explicit-measurements performance issue may still remain for targets like nvidia / other non-Stim local simulators.

Am I missing something?

@khalatepradnya khalatepradnya added the breaking change Change breaks backwards compatibility label May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

CI Summary — ❌ failed

Run #25346790486 · trigger push · ✅ 5 · ⏩ 7 · ❌ 1 · ⛔ 0

❌ Failed or cancelled
Job Result Link
build_and_test ❌ failure view
Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ❌ failure
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (50) — every matrix leg, with links
Job Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, clang16) / Caching ✅ success view
Load dependencies (amd64, clang16) / Finalize ✅ success view
Load dependencies (amd64, clang16) / Metadata ✅ success view
Load dependencies (amd64, gcc11) / Caching ✅ success view
Load dependencies (amd64, gcc11) / Finalize ✅ success view
Load dependencies (amd64, gcc11) / Metadata ✅ success view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, clang16) / Caching ✅ success view
Load dependencies (arm64, clang16) / Finalize ✅ success view
Load dependencies (arm64, clang16) / Metadata ✅ success view
Load dependencies (arm64, gcc11) / Caching ✅ success view
Load dependencies (arm64, gcc11) / Finalize ✅ success view
Load dependencies (arm64, gcc11) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ✅ success view
Retrieve PR info ✅ success view
⚠️ Required checks (4/8) — 4 missing — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ❌ failure view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Change breaks backwards compatibility

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explicit Measurements QPU Compatability

3 participants