Skip to content

refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup#192

Merged
xiaguan merged 3 commits into
mainfrom
refactor/extract-pegainfer-bench
May 29, 2026
Merged

refactor(bench): extract pegainfer-bench — shared kernel-report harness + rollup#192
xiaguan merged 3 commits into
mainfrom
refactor/extract-pegainfer-bench

Conversation

@xiaguan
Copy link
Copy Markdown
Owner

@xiaguan xiaguan commented May 29, 2026

What

The qwen3 and Kimi-K2 model-report bins independently carried the same kernel-benchmarking machinery. This branch lifts the genuinely model-agnostic parts into a new pegainfer-bench crate and points both bins at it, in two disciplined steps:

  1. Harness (3a235ab) — CUDA-event timing loop (measure_loop), latency stats (LatencyStats), and KernelCall accessors (axis/input/output/attr_usize, zero_matrix/zero_weight). kimi kernel_report.rs re-exports the shared types; qwen3's qwen3_model_report imports them. Each crate keeps its own measure_* providers and its own (deliberately divergent) measure_loop/bench_key.
  2. Rollup (1befa1d) — the report-row layer both bins duplicated byte-for-byte: RollupRow, CallSiteRow, the per-call accumulator, and the row projection. Field order is preserved, so the emitted report JSON is unchanged. Each bin keeps its own schedule walk (qwen3's no-op all-reduce + bail-on-missing vs kimi's missing-provider accounting), which genuinely diverge.

Plus a docs follow-up (fa317e8) closing the long-standing "split common report logic into a reusable bench-core crate" item in kernel-op-reports.md, and removal of an orphaned untracked pegainfer-core/src/forward_ctx.rs (never compiled, no references).

What was deliberately NOT shared

The qwen3 regression framework (TOML manifest, KernelSnapshot, git/hardware/build provenance, RegressionThresholds, compare/compose, CUPTI, cold-L2) stays in qwen3_kernel_report. Its schema is attention-domain-specific (CaseShape carries head/page dims, CaseParams carries chunk/cta_tile_q) and has no second consumer — moving it would be premature abstraction. When a second model needs kernel-level regression gating, lift the generic primitives then.

Entropy

10 files changed, +342 −469net −127 tracked lines (−386 including the orphaned file). The real win is single-sourcing: the timing harness and the report-row JSON schema each now have one definition that can't drift between models.

Verification

  • pegainfer-bench, qwen3 + kimi report bins all compile clean (--features kernel-report).
  • All touched crate libs pass cargo test --release --lib (the one workspace failure, kvbm-logical's #[should_panic] lock-DAG test, is in an unrelated crate and is debug_assertions-gated — pre-existing under --release).
  • Adversarial review confirmed JSON byte-identity (field order), preserved per-bin semantics, and correct visibility.
  • GPU end-to-end report run not executed here: this host has no model weights.

🤖 Generated with Claude Code

xiaguan and others added 3 commits May 29, 2026 12:02
…ainfer-bench

Every model crate's kernel-report tooling re-implemented the same CUDA-event
timing loop, LatencyStats, KernelCall accessors, and device-alloc helpers. Lift
them into a new `pegainfer-bench` crate and migrate the two existing consumers.

- kimi-k2 `kernel_report.rs`: drop the local harness, re-export the shared types
  so the report bins are untouched (803 -> 663 lines). Non-optional dep because
  the harness is an always-compiled lib module.
- qwen3-4b `qwen3_model_report.rs`: adopt the shared LatencyStats/accessors;
  keep its cold-L2 `measure_loop` and attr-filtered `bench_key` local because
  their measurement semantics deliberately differ from kimi's warm-cache /
  raw-key versions (1079 -> 981 lines). Optional dep gated on `kernel-report`
  since the harness lives in the report bin.

Net -65 source lines; the real win is one canonical harness instead of two
copies that had already silently diverged (warm vs cold-L2 timing).

Verified: pegainfer-bench checks model-agnostic (Kimi kernels disabled); kimi +
qwen3 report bins compile under `kernel-report`; kimi 18 / qwen3 17 host unit
tests pass; no-feature qwen3 build does not pull pegainfer-bench.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The two model_report bins each carried byte-identical rollup leaf types and
math (RollupRow, CallSiteRow, the per-call accumulator, and the row
projection) — only their schedule walks genuinely diverge (qwen3's no-op
all-reduce + bail-on-missing vs kimi's missing-provider accounting). Lift the
shared pieces into pegainfer-bench so the report-row JSON schema has one
source of truth and can't drift between models; each bin keeps its own loop.

Field order is preserved, so the emitted JSON is unchanged. Net -97 source
lines.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the long-standing follow-up ("split common report logic into a
reusable bench-core crate") and document why only the model-agnostic
harness + rollup were shared while the attention-specific regression
framework deliberately stays in qwen3_kernel_report.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@xiaguan xiaguan merged commit 17aee17 into main May 29, 2026
1 check passed
@xiaguan xiaguan deleted the refactor/extract-pegainfer-bench branch May 29, 2026 05:59
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extracts model-agnostic kernel benchmarking and reporting utilities into a new shared crate, pegainfer-bench. Common components such as the CUDA-event timing loop, latency statistics, tensor accessors, and report rollup structures are removed from the pegainfer-kimi-k2 and pegainfer-qwen3-4b crates and replaced with references to the new shared library, significantly reducing code duplication. Documentation has also been updated to reflect these architectural changes. No review comments were provided, and I have no additional feedback to offer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant