Skip to content

telemetry: populate response_flags label in workload, service and connection metrics#1757

Open
aeron-gh wants to merge 1 commit into
kmesh-net:mainfrom
aeron-gh:fix/populate-response-flags-metric
Open

telemetry: populate response_flags label in workload, service and connection metrics#1757
aeron-gh wants to merge 1 commit into
kmesh-net:mainfrom
aeron-gh:fix/populate-response-flags-metric

Conversation

@aeron-gh

@aeron-gh aeron-gh commented Jun 14, 2026

Copy link
Copy Markdown

What this PR does

Closes #1756

The response_flags label on the three TCP metric label structs (workloadMetricLabels, serviceMetricLabels, connectionMetricLabels) was declared but never populated. This meant every metric emitted by the telemetry controller had an empty response_flags value, which then got converted to "-" by struct2map() regardless of whether the connection succeeded or failed. The field was marked with a // TODO: responseFlags is not used for now comment that was never followed up on.

This PR removes those TODOs and properly derives the flag from the success field that eBPF already provides in the ring buffer event:

  • "-" when the connection was established successfully
  • "UF" (upstream connection failure) when the TCP handshake didn't succeed

"UF" is the Istio-standard response_flags value for upstream connection failures, matching what Envoy emits. This makes Kmesh's metrics consistent with what users expect when building dashboards or alerts based on response_flags.

Changes

  • Added buildResponseFlags() helper that maps requestMetric.success to an Istio-compatible flag string
  • Called it in buildWorkloadMetric(), buildServiceMetric(), and buildConnectionMetric() right after requestProtocol is set
  • Fixed alignment of requestProtocol field in the three label structs (it was narrower than responseFlags, causing visual misalignment)
  • Updated existing tests to set success: connection_success in test data (all 8 cases) and updated expected responseFlags from "" to "-"
  • Added TestBuildResponseFlags with both the success and failure paths

Test plan

  • go test ./pkg/controller/telemetry/... -v passes locally
  • Checked that "UF" matches the flag Envoy uses for upstream TCP failures

Copilot AI review requested due to automatic review settings June 14, 2026 20:28
@kmesh-bot kmesh-bot requested review from YaoZengzeng and kwb0523 June 14, 2026 20:28
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where the response_flags label in Kmesh metrics was not being populated. By leveraging existing eBPF event data, the changes ensure that metrics now correctly reflect connection outcomes, aligning Kmesh's telemetry with Istio standards. Additionally, the PR includes a new debugging guide to assist users in troubleshooting the Kmesh daemon.

Highlights

  • Telemetry Metrics Improvement: Implemented population of the response_flags label in workload, service, and connection metrics, replacing the previous placeholder TODOs.
  • Logic Implementation: Added a buildResponseFlags helper function that derives the flag from the success field provided by eBPF, mapping successful connections to '-' and upstream failures to 'UF'.
  • Test Coverage: Updated existing tests to include success status in test data and added a new unit test, TestBuildResponseFlags, to verify the flag mapping logic.
  • Documentation: Added a new documentation file docs/en/kmesh_debugging.md providing a comprehensive guide on debugging the Kmesh daemon.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


The metrics were silent and bare, / With flags that were missing an air. / Now 'UF' and a dash, / In a quick, simple flash, / Make debugging a much easier affair.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Istio-compatible responseFlags reporting for TCP connection metrics and documents Kmesh runtime debugging interfaces.

Changes:

  • Populate responseFlags in workload/service/connection metric label builders based on connection success.
  • Update telemetry tests to include connection success and assert responseFlags uses "-" when no flags.
  • Add a new debugging guide covering admin endpoints, logs, dumps, monitoring, and profiling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pkg/controller/telemetry/metric.go Adds buildResponseFlags and wires responseFlags into metric label construction.
pkg/controller/telemetry/metric_test.go Updates expectations for response flags and adds unit test coverage for buildResponseFlags.
docs/en/kmesh_debugging.md Introduces a debugging guide for Kmesh admin endpoints and operational workflows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/en/kmesh_debugging.md Outdated
Comment on lines +178 to +183
| Task | Command | HTTP endpoint (`localhost:15200`) |
| --- | --- | --- |
| Daemon logs | `kubectl logs -n kmesh-system <kmesh-daemon-pod>` | — |
| List / set log level | `kmeshctl log <kmesh-daemon-pod> [--set <logger>:<level>]` | `/debug/loggers` |
| Config dump | `kmeshctl dump <kmesh-daemon-pod> <mode>` | `/debug/config_dump/{dual-engine,kernel-native}` |
| eBPF map dump | — | `/debug/config_dump/bpf/{dual-engine,kernel-native}` |
Comment on lines +837 to +845
// buildResponseFlags maps connection outcome to an Istio-compatible response_flags value.
// "UF" (upstream connection failure) is set when the TCP handshake did not succeed;
// "-" indicates a successfully established connection.
func buildResponseFlags(reqMetric *requestMetric) string {
if reqMetric.success != connection_success {
return "UF"
}
return "-"
}
Comment on lines +1820 to +1824
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := buildResponseFlags(&tt.metric)
assert.Equal(t, tt.want, got)
})

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive debugging guide for Kmesh and implements mapping of connection outcomes to Istio-compatible response_flags ('UF' or '-') for workload, service, and connection metrics, along with corresponding unit tests. The review feedback recommends adding a nil check in the newly introduced buildResponseFlags function to prevent potential nil pointer dereferences.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +840 to +845
func buildResponseFlags(reqMetric *requestMetric) string {
if reqMetric.success != connection_success {
return "UF"
}
return "-"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To adhere to defensive programming practices and prevent potential nil pointer dereferences, consider adding a nil check for reqMetric before accessing its fields.

func buildResponseFlags(reqMetric *requestMetric) string {
	if reqMetric == nil {
		return "-"
	}
	if reqMetric.success != connection_success {
		return "UF"
	}
	return "-"
}

The responseFlags field was declared in all three metric label structs
(workload, service and connection) but was never set, so every metric
reported response_flags="-" regardless of the connection outcome. The
field carried a "responseFlags is not used for now" TODO that was never
followed up on.

Add buildResponseFlags() which maps the connection success field from
the eBPF ringbuf event to an Istio-compatible flag: "-" when the TCP
handshake completed successfully, and "UF" (upstream connection failure)
when it did not. Wire it into buildWorkloadMetric, buildServiceMetric
and buildConnectionMetric, and add test coverage for both paths.

Signed-off-by: Abhishek Agarwal <agab0323@gmail.com>
@kmesh-bot

Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot added size/M and removed size/L labels Jun 14, 2026
@aeron-gh

Copy link
Copy Markdown
Author

Heads-up for reviewers on a possible overlap: while working on this I noticed #1624 also touches response_flags, so I wanted to surface it up front rather than have it come up in review.

The two target different outputs:

  • feat: support Egress Gateway REGISTER_ONLY mode and enhance observability #1624 adds a real response_flags field to the eBPF stats and surfaces it in the accesslog (via a flag map, currently REGISTER_ONLY), as part of the egress-gateway work.
  • This PR populates the response_flags Prometheus metric label on the workload/service/connection metrics, derived from the existing connection success field (- on a successful handshake, UF on a failed one).

So they're largely complementary (accesslog vs. metrics), but there's clearly conceptual overlap. I kept this change self-contained so it can merge on its own. If you'd prefer the metric label to read from #1624's eBPF response_flags once that lands instead of deriving it from success, I'm glad to rework it that way — whatever fits the direction you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(telemetry): populate response_flags metric label in workload/service/connection metrics

3 participants