feat(serve): export Prometheus metrics via /metrics endpoint#553
feat(serve): export Prometheus metrics via /metrics endpoint#553ilyaters wants to merge 3 commits into
Conversation
…path The ts serve orchestrator (SMG path) collected runtime metrics via Signed-off-by: Ilya Tarasevichia <newilya8@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da6f7b2840
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if token == "--enable-metrics": | ||
| return True | ||
| if token.startswith("--enable-metrics="): | ||
| return token.split("=", 1)[1].lower() in ("1", "true", "yes") |
There was a problem hiding this comment.
Normalize metrics flag values before detection
When this helper is reached from run_smg_from_args, split_argv() has already normalized --enable-metrics=false or --enable-metrics=true into ['--enable-metrics', 'false'/'true'], so the bare-token branch returns True before the value can be inspected. That makes the newly documented/tested false form enable the control /metrics setup anyway, and the child engine then receives a stray value for a store_true flag. Either reject equals/value forms in the splitter or detect metrics from the raw argv before normalization.
Useful? React with 👍 / 👎.
… normalization split_argv._normalize() rewrites --flag=value into ['--flag', 'value'], Signed-off-by: Ilya Tarasevichia <newilya8@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04d4099408
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if enable_metrics: | ||
| from tokenspeed.runtime.utils.common import set_prometheus_multiproc_dir | ||
|
|
||
| set_prometheus_multiproc_dir() |
There was a problem hiding this comment.
Avoid leaking PROMETHEUS_MULTIPROC_DIR to smg
When ts serve --enable-metrics is used, this call mutates the parent os.environ before both children are launched; spawn_gateway() is invoked later without a custom env, so the SMG gateway inherits the same runtime/control multiprocess directory even though it has its own Prometheus exporter. In deployments scraping both endpoints, prometheus_client switches behavior based on this env var, so gateway metrics can be written into the runtime scrape directory or disappear from the gateway exporter; scope this env to the engine/control path or scrub it for the gateway subprocess.
Useful? React with 👍 / 👎.
Problem
The
ts serveorchestrator (SMG path) collected runtime metrics viaEngineMetrics/RequestMetricsin the engine subprocess but never exposed them to users. Thebuild_server()function inhttp_server.pylacked theadd_prometheus_middleware()call, causing the/metricsendpoint on the control server (port + 1) to return 404 or empty responses.This made
--enable-metricsa no-op for the standardts servelaunch path — metrics were collected intoPROMETHEUS_MULTIPROC_DIRbut had no HTTP endpoint for scraping.Solution
http_server.py: Extendedbuild_server()andstart()with anenable_metricsparameter. When enabled, mounts the/metricsPrometheus endpoint viaadd_prometheus_middleware()+enable_func_timer(). Added_metrics_route_mounted()helper for idempotent mounting.serve_smg.py: Wiredenable_metricsthroughrun_smg()→_start_control_server()→build_server(). When enabled, callsset_prometheus_multiproc_dir()before spawning the engine subprocess, ensuring the engine and control server share the same multiprocess directory. This allowsMultiProcessCollectorto aggregate the engine'stokenspeed:*metrics. Added_engine_args_enable_metrics()to detect--enable-metricsin the engine argv and pass it throughrun_smg_from_args().Tests: Added unit tests for
_engine_args_enable_metrics()flag parsing (bare flag,=true/1/yes,=false/0, absent, mixed with other flags) and for/metricsroute mounting behavior (absent by default, mounted when enabled, skipped when disabled).Docs: Updated
docs/configuration/server.mdObservability section with a Prometheus Metrics subsection documenting the/metricsendpoint on the control port,--control-portparameter, and aprometheus.ymlscrape config example.Verification
Expected output includes
tokenspeed_*metrics for request latency, token throughput, and engine performance counters.Breaking Changes
None. The
/metricsendpoint remains disabled by default; existing behavior is preserved unless--enable-metricsis explicitly passed.Related: # (add issue number if applicable)