fix: propagate trace context end-to-end for agent Services#1297
fix: propagate trace context end-to-end for agent Services#1297syn-zhu wants to merge 3 commits intokagent-dev:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Agent manifest translation so that the Kubernetes Service created for each Agent CR sets an explicit appProtocol, enabling AgentGateway’s A2A plugin to discover and route directly to agent Services (preserving HTTP headers for distributed tracing).
Changes:
- Set
spec.ports[0].appProtocol: kgateway.dev/a2aon the per-Agent Service port.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hey there, thanks for the PR, this is a great idea! You will need to update the goldens as well as sign your commits for us to merge this |
Thanks! Just saw this now, but it looks like the issue was already fixed by https://github.com/opspawn/kagent/commit/d9f2a3a45a26ec11dd4a9a4cf18e4374a374b03f :) Gonna just close this PR, thanks! |
|
Oops @EItanya i realized the commit I linked wasn't actually a merged commit, but rather a branch. I've updated and reopened my PR to address the things you mentioned. Please lmk if there's anything else! |
13799c1 to
c2c5902
Compare
c2c5902 to
256bbda
Compare
ffe2d68 to
f965911
Compare
|
Hi @syn-zhu, Could you please test this locally on your end first? As it's Claude-generated code, a brief manual validation, e.g. such as posting before/after screenshots from a tracing tool would be the minimum step to ensure it's ready for contribution. |
End-to-End Test Results: Trace PropagationTested on a live EKS cluster running kagent v0.7.13 with Langfuse (OTLP-backed) as the trace backend. Each test sends an A2A Setup
Stage 0 — Baseline (upstream v0.7.13, no patches)
Problem: Agent uses the default Stage 1 — Commit 1 only (Python SDK: W3C propagator + AioHttpClientInstrumentor)Sent request directly to agent pod (bypassing controller) to isolate the Python SDK changes.
Result: Stage 2 — Both commits (Python SDK + Go controller trace propagation)A2A request sent through the controller (the production path). Before (upstream controller, patched agent image)
Problem: The upstream controller strips After (patched controller + patched agent)
Result: The patched controller extracts Summary
|
f965911 to
3d2c566
Compare
updated |
3d2c566 to
32027e9
Compare
krisztianfekete
left a comment
There was a problem hiding this comment.
This looks mostly good, but can you please look at the two comments I've just added?
| logging.info("Enabling tracing") | ||
| # Set up W3C TraceContext propagator so incoming traceparent headers | ||
| # are extracted and outgoing requests carry them forward. | ||
| set_global_textmap(CompositeHTTPPropagator([TraceContextTextMapPropagator()])) |
There was a problem hiding this comment.
Are you sure this is necessary? If it is (but I don't think it is), we should at least preserve the existing propagators that this overrides.
There was a problem hiding this comment.
Good catch, no it isn't necessary anymore. addressed in most recent commit
There was a problem hiding this comment.
These are unrelated to auth. Can we move these into internal/ or somewhere tracing-specific?
There was a problem hiding this comment.
addressed in most recent commits!
ae82ed1 to
c234441
Compare
Two changes to enable end-to-end W3C TraceContext propagation: 1. Add AppProtocol "kgateway.dev/a2a" to agent Service port so AgentGateway can discover agent Services directly via kgateway protocol matching, rather than proxying through the controller. Update all golden test outputs to include the new appProtocol field. 2. Set up W3C TraceContext propagator in the Python agent SDK tracing configuration so agent pods correctly extract incoming traceparent headers and propagate them on outgoing requests. Fixes kagent-dev#1295 Signed-off-by: Simon Zhu <simon.zhu@mongodb.com>
…t pods The A2A server deserializes incoming HTTP requests into JSON-RPC params, discarding the original HTTP headers. When the controller forwards requests to agent pods via the A2A client, trace context headers (traceparent, tracestate) are lost, breaking distributed tracing. Fix: capture W3C trace context headers from the incoming request into the Go context in the A2A auth middleware, then inject them into outgoing requests in the A2ARequestHandler. This closes the gap between the A2A server (which strips headers) and the A2A client (which constructs new HTTP requests). Also update the agent_with_passthrough golden test (added in kagent-dev#1327) to include the appProtocol field. Signed-off-by: Simon Zhu <simon.zhu@mongodb.com>
- Remove unnecessary set_global_textmap override in Python tracing setup; the OTEL SDK already configures TraceContext + W3CBaggage propagators by default. - Move trace header context utilities (TraceHeadersFrom/To) from go/core/pkg/auth into go/core/internal/tracecontext, since they are unrelated to auth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Simon Zhu <simon.zhu@mongodb.com>
c234441 to
5fd64e0
Compare
| Name: "http", | ||
| Port: dep.Port, | ||
| TargetPort: intstr.FromInt(int(dep.Port)), | ||
| AppProtocol: ptr.To("kgateway.dev/a2a"), |
There was a problem hiding this comment.
Whilst I'm completely for integrating with other projects, should this really be a hard-coded value that is applied to all Agents (regardless of whether users actually make use of KGateway/AgentGateway)?
What happens when this value changes; there seems to be a bit of a split going on between kgateway and agentgateway - so what would happen if this changes from kgateway.dev/a2a to agentgateway.dev/a2a? And what happens if I want to use a different appProtocol for whatever reason? Or, what happens if I only want some agents (not all) to be picked up via the A2A plugin?
Maybe a better approach here would be to allow users to configure this via the Agent CRD instead?
There was a problem hiding this comment.
Maybe a better approach here would be to allow users to configure this via the Agent CRD instead?
Completely agree, and was just thinking to make this change myself. Will update the PR
| logging.info("Created new TracerProvider") | ||
|
|
||
| HTTPXClientInstrumentor().instrument() | ||
| _instrument_aiohttp_client() |
There was a problem hiding this comment.
What value do we get out of this? Outbound LLM calls are already instrumented from what I can tell - see openai.chat span for example.
There was a problem hiding this comment.
See the Stage 0 vs Stage 1 comparison in my earlier comment: #1297 (comment)
The problem is correlation with the incoming request; if the incoming request to the agent had trace headers, they currently don't get propagated. the outbound calls start a completely disjoint trace.
Adding this change makes it possible to trace long, complex agent request flows (e.g. user calls Agent 1 who calls Agent 2 and Agent 3, who each call MCP server etc etc) with a single root trace ID
Does this make sense?
There was a problem hiding this comment.
Having said that, I actually wanna triple-check this earlier comment: a48fbcf#r2859471141
Need to make sure that change which I removed is really not needed. Might need to add it back.
…opagation (#1433) This PR adds OpenTelemetry distributed tracing to the kagent controller API, fixes trace context propagation across A2A agent calls, and cleans up noise in the existing Python agent tracing. Fixes #1295 essentially replacing a chunk of #1297 (it does not address being able to set the `appProtocol` on the `Agent` `Service` - that is a separate concern IMO). Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>







Summary
Three fixes to enable end-to-end W3C TraceContext propagation across the controller→agent boundary:
AppProtocol on agent Services — Set
appProtocol: kgateway.dev/a2aon the Service port created for each Agent CR so AgentGateway's A2A plugin can discover agent Services directly via protocol matching, rather than proxying through the kagent controller (which drops HTTP headers includingtraceparent).W3C TraceContext propagator in Python SDK — Configure the W3C TraceContext propagator in
kagent-coretracing setup so agent pods correctly extract incomingtraceparentheaders and propagate them on outgoing requests.Trace header propagation in Go controller — The A2A server deserializes incoming HTTP requests into JSON-RPC params, discarding the original HTTP headers. When the controller forwards requests to agent pods via the A2A client,
traceparent/tracestateare lost. Fix: capture W3C trace context headers from the incoming request into the Go context in the A2A auth middleware (A2AAuthenticator.Wrap), then inject them into outgoing requests inA2ARequestHandler.All golden test outputs have been updated to include the new
appProtocolfield, includingagent_with_passthrough(added in #1327).Incorporates changes from https://github.com/opspawn/kagent/commit/d9f2a3a45a26ec11dd4a9a4cf18e4374a374b03f.
Test plan
testdata/outputs/*.jsonfiles includeappProtocol: "kgateway.dev/a2a"on Service portsgo test ./internal/httpserver/auth/...passeskubectl get svc <agent> -o jsonpath='{.spec.ports[0].appProtocol}'returnskgateway.dev/a2atraceparentheader through gateway → controller → agent pod, verify trace ID is preserved end-to-endCloses #1295
🤖 Generated with Claude Code