telemetry: populate response_flags label in workload, service and connection metrics#1757
telemetry: populate response_flags label in workload, service and connection metrics#1757aeron-gh wants to merge 1 commit into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. The metrics were silent and bare, / With flags that were missing an air. / Now 'UF' and a dash, / In a quick, simple flash, / Make debugging a much easier affair. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds Istio-compatible responseFlags reporting for TCP connection metrics and documents Kmesh runtime debugging interfaces.
Changes:
- Populate
responseFlagsin workload/service/connection metric label builders based on connection success. - Update telemetry tests to include connection
successand assertresponseFlagsuses"-"when no flags. - Add a new debugging guide covering admin endpoints, logs, dumps, monitoring, and profiling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/controller/telemetry/metric.go | Adds buildResponseFlags and wires responseFlags into metric label construction. |
| pkg/controller/telemetry/metric_test.go | Updates expectations for response flags and adds unit test coverage for buildResponseFlags. |
| docs/en/kmesh_debugging.md | Introduces a debugging guide for Kmesh admin endpoints and operational workflows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Task | Command | HTTP endpoint (`localhost:15200`) | | ||
| | --- | --- | --- | | ||
| | Daemon logs | `kubectl logs -n kmesh-system <kmesh-daemon-pod>` | — | | ||
| | List / set log level | `kmeshctl log <kmesh-daemon-pod> [--set <logger>:<level>]` | `/debug/loggers` | | ||
| | Config dump | `kmeshctl dump <kmesh-daemon-pod> <mode>` | `/debug/config_dump/{dual-engine,kernel-native}` | | ||
| | eBPF map dump | — | `/debug/config_dump/bpf/{dual-engine,kernel-native}` | |
| // buildResponseFlags maps connection outcome to an Istio-compatible response_flags value. | ||
| // "UF" (upstream connection failure) is set when the TCP handshake did not succeed; | ||
| // "-" indicates a successfully established connection. | ||
| func buildResponseFlags(reqMetric *requestMetric) string { | ||
| if reqMetric.success != connection_success { | ||
| return "UF" | ||
| } | ||
| return "-" | ||
| } |
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| got := buildResponseFlags(&tt.metric) | ||
| assert.Equal(t, tt.want, got) | ||
| }) |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive debugging guide for Kmesh and implements mapping of connection outcomes to Istio-compatible response_flags ('UF' or '-') for workload, service, and connection metrics, along with corresponding unit tests. The review feedback recommends adding a nil check in the newly introduced buildResponseFlags function to prevent potential nil pointer dereferences.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func buildResponseFlags(reqMetric *requestMetric) string { | ||
| if reqMetric.success != connection_success { | ||
| return "UF" | ||
| } | ||
| return "-" | ||
| } |
There was a problem hiding this comment.
To adhere to defensive programming practices and prevent potential nil pointer dereferences, consider adding a nil check for reqMetric before accessing its fields.
func buildResponseFlags(reqMetric *requestMetric) string {
if reqMetric == nil {
return "-"
}
if reqMetric.success != connection_success {
return "UF"
}
return "-"
}The responseFlags field was declared in all three metric label structs (workload, service and connection) but was never set, so every metric reported response_flags="-" regardless of the connection outcome. The field carried a "responseFlags is not used for now" TODO that was never followed up on. Add buildResponseFlags() which maps the connection success field from the eBPF ringbuf event to an Istio-compatible flag: "-" when the TCP handshake completed successfully, and "UF" (upstream connection failure) when it did not. Wire it into buildWorkloadMetric, buildServiceMetric and buildConnectionMetric, and add test coverage for both paths. Signed-off-by: Abhishek Agarwal <agab0323@gmail.com>
9448bb2 to
a9b7824
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Heads-up for reviewers on a possible overlap: while working on this I noticed #1624 also touches The two target different outputs:
So they're largely complementary (accesslog vs. metrics), but there's clearly conceptual overlap. I kept this change self-contained so it can merge on its own. If you'd prefer the metric label to read from #1624's eBPF |
What this PR does
Closes #1756
The
response_flagslabel on the three TCP metric label structs (workloadMetricLabels,serviceMetricLabels,connectionMetricLabels) was declared but never populated. This meant every metric emitted by the telemetry controller had an emptyresponse_flagsvalue, which then got converted to"-"bystruct2map()regardless of whether the connection succeeded or failed. The field was marked with a// TODO: responseFlags is not used for nowcomment that was never followed up on.This PR removes those TODOs and properly derives the flag from the
successfield that eBPF already provides in the ring buffer event:"-"when the connection was established successfully"UF"(upstream connection failure) when the TCP handshake didn't succeed"UF"is the Istio-standardresponse_flagsvalue for upstream connection failures, matching what Envoy emits. This makes Kmesh's metrics consistent with what users expect when building dashboards or alerts based onresponse_flags.Changes
buildResponseFlags()helper that mapsrequestMetric.successto an Istio-compatible flag stringbuildWorkloadMetric(),buildServiceMetric(), andbuildConnectionMetric()right afterrequestProtocolis setrequestProtocolfield in the three label structs (it was narrower thanresponseFlags, causing visual misalignment)success: connection_successin test data (all 8 cases) and updated expectedresponseFlagsfrom""to"-"TestBuildResponseFlagswith both the success and failure pathsTest plan
go test ./pkg/controller/telemetry/... -vpasses locally"UF"matches the flag Envoy uses for upstream TCP failures