Skip to content

Conversation

@kaarolch
Copy link

@kaarolch kaarolch commented Nov 13, 2025

Summary

When using the tag_cardinality_limit transform, it's difficult to identify which specific metrics and tag keys are hitting the configured value limit. The tag_value_limit_exceeded_total metric only provides a count of exceeded events without context about which metric or tag was blocked, making it challenging to debug and monitor cardinality issues.
More in #20084

This PR adds metric_name and tag_key labels to the tag_value_limit_exceeded_total metric and allowing:

  • Identify which specific metrics are hitting the limit
  • Identify which tag keys are causing the limit to be exceeded
  • Create more targeted alerts and dashboards

Vector configuration

sources:
  vector_metrics:
    type: "internal_metrics" # required
    scrape_interval_secs: 10 
  statsd:
    type: statsd
    address: "0.0.0.0:8128"
    mode: "udp"

transforms:
  metrics_cardinality_limit:
    type: tag_cardinality_limit
    inputs:
      - statsd
    limit_exceeded_action: drop_tag
    mode: exact
    value_limit: 10
sinks:
  drop_bh:
    type: blackhole
    inputs:
      - metrics_cardinality_limit
    print_interval_secs: 0
    buffer:
      type: memory
      when_full: drop_newest
      max_events: 100
  console_debug:
    type: console
    inputs:
      -  vector_metrics
    encoding:
      codec: json
    buffer:
       max_events: 10000
       type: memory
       when_full: drop_newest

How did you test this PR?

Build locally based on docs/DEVELOPING.md and start with config mentioned in the previous section :

target/debug/vector --config ../vector.yaml

Result:

{"name":"tag_value_limit_exceeded_total","namespace":"vector","tags":{"component_id":"metrics_cardinality_limit","component_kind":"transform","component_type":"tag_cardinality_limit","host":"xxxx","metric_name":"high_cardinality_metric_gauge_1","tag_key":"pod_name"},"timestamp":"2025-11-13T20:10:30.055509Z","kind":"absolute","counter":{"value":7470.0}}

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 13, 2025
@kaarolch kaarolch marked this pull request as ready for review November 13, 2025 20:15
@kaarolch kaarolch requested review from a team as code owners November 13, 2025 20:15
Copy link
Contributor

@domalessi domalessi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions but looks good

@kaarolch
Copy link
Author

btw. Should we add more label? This PR introduces two new tags to an internal metric for the tag_cardinality_limit transform.

counter!("tag_value_limit_exceeded_total").increment(1);
counter!(
"tag_value_limit_exceeded_total",
"metric_name" => self.metric_name.to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable. We could have very high cardinality here though, I wonder if we need to gate this behind a config option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants