Additional tracing and resource estimation target improvements#4432
Additional tracing and resource estimation target improvements#4432taalexander wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
CI Summary — ✅ passedRun #25229541107 · trigger Top-level jobs (13)
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
All sub-jobs (50) — every matrix leg, with links
✅ Required checks (8/8) — declared in
|
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, clang16, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, clang16, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc11, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc11, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, clang16, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, clang16, openmpi) / Dev environment (Python) | ✅ success | view |
| jit-low-level-pipeline: "func.func(apply-control-negations,canonicalize,cse),symbol-dce" | ||
| # Target high-level stage: materialize registered custom-op matrices and | ||
| # synthesize them to gates before target lowering. | ||
| jit-high-level-pipeline: "get-concrete-matrix,unitary-synthesis" |
There was a problem hiding this comment.
Before this pipeline is run, we should always run
pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddDeallocs());
pm.addNestedPass<func::FuncOp>(cudaq::opt::createQuakeAddMetadata());
pm.addPass(cudaq::opt::createQuakePropagateMetadata());
pm.addNestedPass<func::FuncOp>(cudaq::opt::createUnwindLowering());
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addNestedPass<func::FuncOp>(cudaq::opt::createClassicalMemToReg());
cudaq::opt::createClassicalOptimizationPipeline(pm, std::nullopt,
{options.allowEarlyExit});
pm.addPass(cudaq::opt::createGlobalizeArrayValues());
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addPass(cudaq::opt::createUnitarySynthesis());
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addPass(cudaq::opt::createApplySpecialization(
{.constantPropagation = options.applyConstProp}));
cudaq::opt::addAggressiveInlining(pm);as well as some other passes.
There was a problem hiding this comment.
So unitary-synthesis is already part of that pipeline and GetConcreteMatrix realy only matters if you're using user-defined custom ops.
| # Target mid-level stage: inline synthesized helper calls, decompose to a CX | ||
| # routing basis, prepare wire metadata, optionally route, then lower routed | ||
| # two-qubit work to CZ. With no device argument, routing is bypassed. | ||
| jit-mid-level-pipeline: "apply-op-specialization,aggressive-inlining,decomposition{basis=h,rx,ry,rz,x,x(1)},func.func(add-dealloc,combine-quantum-alloc,canonicalize,factor-quantum-alloc,memtoreg),add-wireset,func.func(assign-wire-indices),qubit-mapping{device=%DEVICE:bypass%},func.func(delay-measurements,regtomem),decomposition{basis=h,rx,ry,rz,x,z(1)}" |
There was a problem hiding this comment.
Before this pipeline, we'll run
cudaq::opt::createClassicalOptimizationPipeline(pm);
cudaq::opt::addDecomposition(pm, {std::string("U3ToRotations")});
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addNestedPass<func::FuncOp>(cudaq::opt::createMultiControlDecomposition());The mid-level pipeline is where we normally would put custom (per target) gate set mappings, routings, etc. So this is good. Do you really want delay-measurements? That's sort of a weird hack pass for the IQM target.
There was a problem hiding this comment.
Here, apply op specialization and inlining should have already been done in the high-level pipeline. If we're doing them here again and they are actually transforming code in some way, it would be good to have an example and understand what's wrong.
| # two-qubit work to CZ. With no device argument, routing is bypassed. | ||
| jit-mid-level-pipeline: "apply-op-specialization,aggressive-inlining,decomposition{basis=h,rx,ry,rz,x,x(1)},func.func(add-dealloc,combine-quantum-alloc,canonicalize,factor-quantum-alloc,memtoreg),add-wireset,func.func(assign-wire-indices),qubit-mapping{device=%DEVICE:bypass%},func.func(delay-measurements,regtomem),decomposition{basis=h,rx,ry,rz,x,z(1)}" | ||
| # Target low-level cleanup stage: remove wire-set symbols left by routing prep. | ||
| jit-low-level-pipeline: "symbol-dce" |
There was a problem hiding this comment.
And before this pipeline string
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addNestedPass<func::FuncOp>(createCSEPass());
pm.addPass(createSymbolDCEPass());So, here we're re-running symbol-dce.
Unfortunately, there is the caveat that instead of simply applying the pipeline as constructed, the pipelines that are strings can be mutated by the launch code in ad hoc ways, which is a bad idea.
There was a problem hiding this comment.
There is another user-defined pipeline that can be specified to run after this low-level one called post-codegen-passes. I think we can ignore that one and just use low-level for our purposes here.
Follow ups as noted by @schweitzpgi and @boschmitt.