Skip to content

Additional tracing and resource estimation target improvements#4432

Open
taalexander wants to merge 8 commits intoNVIDIA:mainfrom
taalexander:feature/benchmarking-improvements
Open

Additional tracing and resource estimation target improvements#4432
taalexander wants to merge 8 commits intoNVIDIA:mainfrom
taalexander:feature/benchmarking-improvements

Conversation

@taalexander
Copy link
Copy Markdown
Collaborator

Follow ups as noted by @schweitzpgi and @boschmitt.

Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
taalexander and others added 6 commits May 1, 2026 10:47
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

CI Summary — ✅ passed

Run #25229541107 · trigger push · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ✅ success
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (50) — every matrix leg, with links
Job Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, clang16) / Caching ✅ success view
Load dependencies (amd64, clang16) / Finalize ✅ success view
Load dependencies (amd64, clang16) / Metadata ✅ success view
Load dependencies (amd64, gcc11) / Caching ✅ success view
Load dependencies (amd64, gcc11) / Finalize ✅ success view
Load dependencies (amd64, gcc11) / Metadata ✅ success view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, clang16) / Caching ✅ success view
Load dependencies (arm64, clang16) / Finalize ✅ success view
Load dependencies (arm64, clang16) / Metadata ✅ success view
Load dependencies (arm64, gcc11) / Caching ✅ success view
Load dependencies (arm64, gcc11) / Finalize ✅ success view
Load dependencies (arm64, gcc11) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ in_progress view
Retrieve PR info ✅ success view
✅ Required checks (8/8) — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, clang16, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc11, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, clang16, openmpi) / Dev environment (Python) ✅ success view

jit-low-level-pipeline: "func.func(apply-control-negations,canonicalize,cse),symbol-dce"
# Target high-level stage: materialize registered custom-op matrices and
# synthesize them to gates before target lowering.
jit-high-level-pipeline: "get-concrete-matrix,unitary-synthesis"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this pipeline is run, we should always run

  pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddDeallocs());
  pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddMetadata());
  pm.addPass(cudaq::opt::createQuakePropagateMetadata());
  pm.addNestedPass<func::FuncOp>(cudaq::opt::createUnwindLowering());
  pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
  pm.addNestedPass<func::FuncOp>(cudaq::opt::createClassicalMemToReg());
  cudaq::opt::createClassicalOptimizationPipeline(pm, std::nullopt,
                                                  {options.allowEarlyExit});
  pm.addPass(cudaq::opt::createGlobalizeArrayValues());
  pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
  pm.addPass(cudaq::opt::createUnitarySynthesis());
  pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
  pm.addPass(cudaq::opt::createApplySpecialization(
      {.constantPropagation = options.applyConstProp}));
  cudaq::opt::addAggressiveInlining(pm);

as well as some other passes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So unitary-synthesis is already part of that pipeline and GetConcreteMatrix realy only matters if you're using user-defined custom ops.

# Target mid-level stage: inline synthesized helper calls, decompose to a CX
# routing basis, prepare wire metadata, optionally route, then lower routed
# two-qubit work to CZ. With no device argument, routing is bypassed.
jit-mid-level-pipeline: "apply-op-specialization,aggressive-inlining,decomposition{basis=h,rx,ry,rz,x,x(1)},func.func(add-dealloc,combine-quantum-alloc,canonicalize,factor-quantum-alloc,memtoreg),add-wireset,func.func(assign-wire-indices),qubit-mapping{device=%DEVICE:bypass%},func.func(delay-measurements,regtomem),decomposition{basis=h,rx,ry,rz,x,z(1)}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this pipeline, we'll run

  cudaq::opt::createClassicalOptimizationPipeline(pm);
  cudaq::opt::addDecomposition(pm, {std::string("U3ToRotations")});
  pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
  pm.addNestedPass<func::FuncOp>(cudaq::opt::createMultiControlDecomposition());

The mid-level pipeline is where we normally would put custom (per target) gate set mappings, routings, etc. So this is good. Do you really want delay-measurements? That's sort of a weird hack pass for the IQM target.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, apply op specialization and inlining should have already been done in the high-level pipeline. If we're doing them here again and they are actually transforming code in some way, it would be good to have an example and understand what's wrong.

# two-qubit work to CZ. With no device argument, routing is bypassed.
jit-mid-level-pipeline: "apply-op-specialization,aggressive-inlining,decomposition{basis=h,rx,ry,rz,x,x(1)},func.func(add-dealloc,combine-quantum-alloc,canonicalize,factor-quantum-alloc,memtoreg),add-wireset,func.func(assign-wire-indices),qubit-mapping{device=%DEVICE:bypass%},func.func(delay-measurements,regtomem),decomposition{basis=h,rx,ry,rz,x,z(1)}"
# Target low-level cleanup stage: remove wire-set symbols left by routing prep.
jit-low-level-pipeline: "symbol-dce"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And before this pipeline string

  pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
  pm.addNestedPass<func::FuncOp>(createCSEPass());
  pm.addPass(createSymbolDCEPass());

So, here we're re-running symbol-dce.

Unfortunately, there is the caveat that instead of simply applying the pipeline as constructed, the pipelines that are strings can be mutated by the launch code in ad hoc ways, which is a bad idea.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another user-defined pipeline that can be specified to run after this low-level one called post-codegen-passes. I think we can ignore that one and just use low-level for our purposes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants