Skip to content

feat(serve): export Prometheus metrics via /metrics endpoint#553

Open
ilyaters wants to merge 3 commits into
lightseekorg:mainfrom
ilyaters:main
Open

feat(serve): export Prometheus metrics via /metrics endpoint#553
ilyaters wants to merge 3 commits into
lightseekorg:mainfrom
ilyaters:main

Conversation

@ilyaters

Copy link
Copy Markdown

Problem

The ts serve orchestrator (SMG path) collected runtime metrics via EngineMetrics/RequestMetrics in the engine subprocess but never exposed them to users. The build_server() function in http_server.py lacked the add_prometheus_middleware() call, causing the /metrics endpoint on the control server (port + 1) to return 404 or empty responses.

This made --enable-metrics a no-op for the standard ts serve launch path — metrics were collected into PROMETHEUS_MULTIPROC_DIR but had no HTTP endpoint for scraping.

Solution

  • http_server.py: Extended build_server() and start() with an enable_metrics parameter. When enabled, mounts the /metrics Prometheus endpoint via add_prometheus_middleware() + enable_func_timer(). Added _metrics_route_mounted() helper for idempotent mounting.

  • serve_smg.py: Wired enable_metrics through run_smg()_start_control_server()build_server(). When enabled, calls set_prometheus_multiproc_dir() before spawning the engine subprocess, ensuring the engine and control server share the same multiprocess directory. This allows MultiProcessCollector to aggregate the engine's tokenspeed:* metrics. Added _engine_args_enable_metrics() to detect --enable-metrics in the engine argv and pass it through run_smg_from_args().

  • Tests: Added unit tests for _engine_args_enable_metrics() flag parsing (bare flag, =true/1/yes, =false/0, absent, mixed with other flags) and for /metrics route mounting behavior (absent by default, mounted when enabled, skipped when disabled).

  • Docs: Updated docs/configuration/server.md Observability section with a Prometheus Metrics subsection documenting the /metrics endpoint on the control port, --control-port parameter, and a prometheus.yml scrape config example.

Verification

tokenspeed serve <model> --port 8000 --enable-metrics --tensor-parallel-size 1

# Runtime metrics (tokenspeed:*) available on control port = serve port + 1
curl http://localhost:8001/metrics

Expected output includes tokenspeed_* metrics for request latency, token throughput, and engine performance counters.

Breaking Changes

None. The /metrics endpoint remains disabled by default; existing behavior is preserved unless --enable-metrics is explicitly passed.


Related: # (add issue number if applicable)

ilyaters added 2 commits June 29, 2026 12:04
…path

The ts serve orchestrator (SMG path) collected runtime metrics via

Signed-off-by: Ilya Tarasevichia <newilya8@gmail.com>
@ilyaters ilyaters requested a review from a team as a code owner June 29, 2026 11:34

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da6f7b2840

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +615 to +618
if token == "--enable-metrics":
return True
if token.startswith("--enable-metrics="):
return token.split("=", 1)[1].lower() in ("1", "true", "yes")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize metrics flag values before detection

When this helper is reached from run_smg_from_args, split_argv() has already normalized --enable-metrics=false or --enable-metrics=true into ['--enable-metrics', 'false'/'true'], so the bare-token branch returns True before the value can be inspected. That makes the newly documented/tested false form enable the control /metrics setup anyway, and the child engine then receives a stray value for a store_true flag. Either reject equals/value forms in the splitter or detect metrics from the raw argv before normalization.

Useful? React with 👍 / 👎.

… normalization

split_argv._normalize() rewrites --flag=value into ['--flag', 'value'],

Signed-off-by: Ilya Tarasevichia <newilya8@gmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04d4099408

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +490 to +493
if enable_metrics:
from tokenspeed.runtime.utils.common import set_prometheus_multiproc_dir

set_prometheus_multiproc_dir()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaking PROMETHEUS_MULTIPROC_DIR to smg

When ts serve --enable-metrics is used, this call mutates the parent os.environ before both children are launched; spawn_gateway() is invoked later without a custom env, so the SMG gateway inherits the same runtime/control multiprocess directory even though it has its own Prometheus exporter. In deployments scraping both endpoints, prometheus_client switches behavior based on this env var, so gateway metrics can be written into the runtime scrape directory or disappear from the gateway exporter; scope this env to the engine/control path or scrub it for the gateway subprocess.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant