refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup by xiaguan · Pull Request #192 · xiaguan/pegainfer

xiaguan · 2026-05-29T05:57:50Z

What

The qwen3 and Kimi-K2 model-report bins independently carried the same kernel-benchmarking machinery. This branch lifts the genuinely model-agnostic parts into a new pegainfer-bench crate and points both bins at it, in two disciplined steps:

Harness (3a235ab) — CUDA-event timing loop (measure_loop), latency stats (LatencyStats), and KernelCall accessors (axis/input/output/attr_usize, zero_matrix/zero_weight). kimi kernel_report.rs re-exports the shared types; qwen3's qwen3_model_report imports them. Each crate keeps its own measure_* providers and its own (deliberately divergent) measure_loop/bench_key.
Rollup (1befa1d) — the report-row layer both bins duplicated byte-for-byte: RollupRow, CallSiteRow, the per-call accumulator, and the row projection. Field order is preserved, so the emitted report JSON is unchanged. Each bin keeps its own schedule walk (qwen3's no-op all-reduce + bail-on-missing vs kimi's missing-provider accounting), which genuinely diverge.

Plus a docs follow-up (fa317e8) closing the long-standing "split common report logic into a reusable bench-core crate" item in kernel-op-reports.md, and removal of an orphaned untracked pegainfer-core/src/forward_ctx.rs (never compiled, no references).

What was deliberately NOT shared

The qwen3 regression framework (TOML manifest, KernelSnapshot, git/hardware/build provenance, RegressionThresholds, compare/compose, CUPTI, cold-L2) stays in qwen3_kernel_report. Its schema is attention-domain-specific (CaseShape carries head/page dims, CaseParams carries chunk/cta_tile_q) and has no second consumer — moving it would be premature abstraction. When a second model needs kernel-level regression gating, lift the generic primitives then.

Entropy

10 files changed, +342 −469 → net −127 tracked lines (−386 including the orphaned file). The real win is single-sourcing: the timing harness and the report-row JSON schema each now have one definition that can't drift between models.

Verification

pegainfer-bench, qwen3 + kimi report bins all compile clean (--features kernel-report).
All touched crate libs pass cargo test --release --lib (the one workspace failure, kvbm-logical's #[should_panic] lock-DAG test, is in an unrelated crate and is debug_assertions-gated — pre-existing under --release).
Adversarial review confirmed JSON byte-identity (field order), preserved per-bin semantics, and correct visibility.
GPU end-to-end report run not executed here: this host has no model weights.

🤖 Generated with Claude Code

…ainfer-bench Every model crate's kernel-report tooling re-implemented the same CUDA-event timing loop, LatencyStats, KernelCall accessors, and device-alloc helpers. Lift them into a new `pegainfer-bench` crate and migrate the two existing consumers. - kimi-k2 `kernel_report.rs`: drop the local harness, re-export the shared types so the report bins are untouched (803 -> 663 lines). Non-optional dep because the harness is an always-compiled lib module. - qwen3-4b `qwen3_model_report.rs`: adopt the shared LatencyStats/accessors; keep its cold-L2 `measure_loop` and attr-filtered `bench_key` local because their measurement semantics deliberately differ from kimi's warm-cache / raw-key versions (1079 -> 981 lines). Optional dep gated on `kernel-report` since the harness lives in the report bin. Net -65 source lines; the real win is one canonical harness instead of two copies that had already silently diverged (warm vs cold-L2 timing). Verified: pegainfer-bench checks model-agnostic (Kimi kernels disabled); kimi + qwen3 report bins compile under `kernel-report`; kimi 18 / qwen3 17 host unit tests pass; no-feature qwen3 build does not pull pegainfer-bench. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The two model_report bins each carried byte-identical rollup leaf types and math (RollupRow, CallSiteRow, the per-call accumulator, and the row projection) — only their schedule walks genuinely diverge (qwen3's no-op all-reduce + bail-on-missing vs kimi's missing-provider accounting). Lift the shared pieces into pegainfer-bench so the report-row JSON schema has one source of truth and can't drift between models; each bin keeps its own loop. Field order is preserved, so the emitted JSON is unchanged. Net -97 source lines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Close the long-standing follow-up ("split common report logic into a reusable bench-core crate") and document why only the model-agnostic harness + rollup were shared while the attention-specific regression framework deliberately stays in qwen3_kernel_report. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request extracts model-agnostic kernel benchmarking and reporting utilities into a new shared crate, pegainfer-bench. Common components such as the CUDA-event timing loop, latency statistics, tensor accessors, and report rollup structures are removed from the pegainfer-kimi-k2 and pegainfer-qwen3-4b crates and replaced with references to the new shared library, significantly reducing code duplication. Documentation has also been updated to reflect these architectural changes. No review comments were provided, and I have no additional feedback to offer.

xiaguan and others added 3 commits May 29, 2026 12:02

xiaguan merged commit 17aee17 into main May 29, 2026
1 check passed

xiaguan deleted the refactor/extract-pegainfer-bench branch May 29, 2026 05:59

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup#192

refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup#192
xiaguan merged 3 commits into
mainfrom
refactor/extract-pegainfer-bench

xiaguan commented May 29, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaguan commented May 29, 2026

What

What was deliberately NOT shared

Entropy

Verification

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant