feat(log): add --log-request-stats per-request statistics logging by LorrinWWW · Pull Request #512 · lightseekorg/tokenspeed

LorrinWWW · 2026-06-24T22:28:34Z

Summary

Adds an opt-in --log-request-stats flag that emits one structured RequestStats(...) line per request when it finishes or aborts, for latency/throughput debugging. Each line reports the request id, status/finish-reason, token counts (prompt/cache/output) and cache-hit rate, a host-side timing breakdown (queue, prefill, TTFT, total, preemption time/count), decode throughput, and spec-decode acceptance length/rate. All values are derived from timestamps and counters already available on the host — scheduled time is stamped pre-forward in the event loop and the rest post-forward in the output processor — so no GPU sync is introduced and there is no engine slowdown; when the flag is off, requests carry a shared no-op tracker so the hot path stays guard-free. The stats data model (tracker + summary) lives in a dedicated request_stats.py module, and the feature is covered by unit tests plus an end-to-end check.

Test Plan

Adds an opt-in per-request performance log emitted when a request finishes or aborts, for latency/throughput debugging. Replaces the bare "Req: <rid> Finish! Accept_num_tokens_avg" line with a Python-object repr: RequestStats(rid=..., status='finished', reason='stop', prompt_tokens=..., cache_tokens=..., output_tokens=..., cache_hit_rate=..., queue_ms=..., prefill_ms=..., ttft_ms=..., total_ms=..., preempt_ms=..., preempt_count=..., decode_tps=..., acc_len=..., acc_rate=..., recv_ts=..., commit_ts=..., finish_ts=...) Every value is derived from host-side timestamps/counters already available in the scheduler process (received via the request's created_time; scheduled/ prefill-done/first-token/finish stamped in post_process_forward_op; counts and spec acceptance from RequestState). No new GPU sync is introduced, so it adds no engine slowdown. Preemption (count + time a request's decode was delayed by others' prefill) is tracked host-side, best-effort, from forward_op. Gated by --log-request-stats (default off). http_status_code is intentionally omitted: it is only known at the HTTP/SMG gateway, not in the engine. Tests cover the disabled (zero-overhead) path, the derived-field formatting, the aborted + spec-decode case, and end-to-end timestamp recording through post_process_forward_op. Docs: configuration/server.md Observability section. Signed-off-by: Jue Wang <zjuwangjue@gmail.com>

# Conflicts: # test/runtime/test_generation_output_processor.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3cb8ed467

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-27T02:47:23Z

+        if self.log_request_stats:
+            state.stats = RequestStatsTracker()


Ensure admission-time aborts get stats trackers

When --log-request-stats is enabled, a real tracker is only installed in register(), but _process_new_requests() sends invalid-session or grammar-aborted requests through publish_finished_at_admission() without calling register() (event_loop.py:1021). Those requests therefore keep NOOP_STATS and the admission-finish path never logs a RequestStats(...) line, so the new flag silently omits these aborts despite promising one line per finished/aborted request.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b53855920

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-27T05:17:12Z

+            acc_len = rs.accept_draft_tokens or 0.0
+            acc_rate = (
+                round(max(0.0, acc_len - 1.0) / spec_num_tokens, 4)


Correct spec-decode acceptance math

When speculative decoding is enabled, rs.accept_draft_tokens is the extras-per-verify value; the repo documents this in test/runtime/models/test_mm25_perf.py as needing + 1 to get true acceptance length. This code logs that value directly as acc_len and then subtracts one more for acc_rate, so a request averaging 3 accepted drafts with draft width 4 is reported as acc_len=3.0, acc_rate=0.5 instead of an acceptance length of 4.0 and a draft acceptance rate of 0.75, making the new per-request stats misleading for spec-decode tuning.

Useful? React with 👍 / 👎.

LorrinWWW added 4 commits June 24, 2026 04:11

format

2cab10f

refactor

3d2441e

refactor

f47041d

LorrinWWW requested a review from gongwei-130 June 24, 2026 22:51

LorrinWWW and others added 5 commits June 24, 2026 18:51

Merge branch 'main' into jue/req-stats

9564653

fuse two log lines.

4fb0157

dp log

33725e7

add request_stats module (extracted from generation_output_processor)

f6acb8c

Merge remote-tracking branch 'origin/main' into jue/req-stats

a3cb8ed

# Conflicts: # test/runtime/test_generation_output_processor.py

jaywme marked this pull request as ready for review June 27, 2026 02:43

jaywme requested a review from a team as a code owner June 27, 2026 02:43

jaywme self-requested a review June 27, 2026 02:44

chatgpt-codex-connector Bot reviewed Jun 27, 2026

View reviewed changes

Merge branch 'main' into jue/req-stats

4b53855

chatgpt-codex-connector Bot reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(log): add --log-request-stats per-request statistics logging#512

feat(log): add --log-request-stats per-request statistics logging#512
LorrinWWW wants to merge 10 commits into
mainfrom
jue/req-stats

LorrinWWW commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if self.log_request_stats:
		state.stats = RequestStatsTracker()

Uh oh!

Conversation

LorrinWWW commented Jun 24, 2026

Summary

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants