refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup#192
Merged
Conversation
…ainfer-bench Every model crate's kernel-report tooling re-implemented the same CUDA-event timing loop, LatencyStats, KernelCall accessors, and device-alloc helpers. Lift them into a new `pegainfer-bench` crate and migrate the two existing consumers. - kimi-k2 `kernel_report.rs`: drop the local harness, re-export the shared types so the report bins are untouched (803 -> 663 lines). Non-optional dep because the harness is an always-compiled lib module. - qwen3-4b `qwen3_model_report.rs`: adopt the shared LatencyStats/accessors; keep its cold-L2 `measure_loop` and attr-filtered `bench_key` local because their measurement semantics deliberately differ from kimi's warm-cache / raw-key versions (1079 -> 981 lines). Optional dep gated on `kernel-report` since the harness lives in the report bin. Net -65 source lines; the real win is one canonical harness instead of two copies that had already silently diverged (warm vs cold-L2 timing). Verified: pegainfer-bench checks model-agnostic (Kimi kernels disabled); kimi + qwen3 report bins compile under `kernel-report`; kimi 18 / qwen3 17 host unit tests pass; no-feature qwen3 build does not pull pegainfer-bench. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The two model_report bins each carried byte-identical rollup leaf types and math (RollupRow, CallSiteRow, the per-call accumulator, and the row projection) — only their schedule walks genuinely diverge (qwen3's no-op all-reduce + bail-on-missing vs kimi's missing-provider accounting). Lift the shared pieces into pegainfer-bench so the report-row JSON schema has one source of truth and can't drift between models; each bin keeps its own loop. Field order is preserved, so the emitted JSON is unchanged. Net -97 source lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the long-standing follow-up ("split common report logic into a
reusable bench-core crate") and document why only the model-agnostic
harness + rollup were shared while the attention-specific regression
framework deliberately stays in qwen3_kernel_report.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request extracts model-agnostic kernel benchmarking and reporting utilities into a new shared crate, pegainfer-bench. Common components such as the CUDA-event timing loop, latency statistics, tensor accessors, and report rollup structures are removed from the pegainfer-kimi-k2 and pegainfer-qwen3-4b crates and replaced with references to the new shared library, significantly reducing code duplication. Documentation has also been updated to reflect these architectural changes. No review comments were provided, and I have no additional feedback to offer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The qwen3 and Kimi-K2 model-report bins independently carried the same kernel-benchmarking machinery. This branch lifts the genuinely model-agnostic parts into a new
pegainfer-benchcrate and points both bins at it, in two disciplined steps:3a235ab) — CUDA-event timing loop (measure_loop), latency stats (LatencyStats), andKernelCallaccessors (axis/input/output/attr_usize,zero_matrix/zero_weight). kimikernel_report.rsre-exports the shared types; qwen3'sqwen3_model_reportimports them. Each crate keeps its ownmeasure_*providers and its own (deliberately divergent)measure_loop/bench_key.1befa1d) — the report-row layer both bins duplicated byte-for-byte:RollupRow,CallSiteRow, the per-call accumulator, and the row projection. Field order is preserved, so the emitted report JSON is unchanged. Each bin keeps its own schedule walk (qwen3's no-op all-reduce + bail-on-missing vs kimi's missing-provider accounting), which genuinely diverge.Plus a docs follow-up (
fa317e8) closing the long-standing "split common report logic into a reusable bench-core crate" item inkernel-op-reports.md, and removal of an orphaned untrackedpegainfer-core/src/forward_ctx.rs(never compiled, no references).What was deliberately NOT shared
The qwen3 regression framework (TOML manifest,
KernelSnapshot, git/hardware/build provenance,RegressionThresholds, compare/compose, CUPTI, cold-L2) stays inqwen3_kernel_report. Its schema is attention-domain-specific (CaseShapecarries head/page dims,CaseParamscarries chunk/cta_tile_q) and has no second consumer — moving it would be premature abstraction. When a second model needs kernel-level regression gating, lift the generic primitives then.Entropy
10 files changed, +342 −469→ net −127 tracked lines (−386 including the orphaned file). The real win is single-sourcing: the timing harness and the report-row JSON schema each now have one definition that can't drift between models.Verification
pegainfer-bench, qwen3 + kimi report bins all compile clean (--features kernel-report).cargo test --release --lib(the one workspace failure,kvbm-logical's#[should_panic]lock-DAG test, is in an unrelated crate and isdebug_assertions-gated — pre-existing under--release).🤖 Generated with Claude Code