Skip to content

filter_kubernetes: add kube_meta_cache_refresh_interval option#12035

Open
Abdessamade03 wants to merge 2 commits into
fluent:masterfrom
Abdessamade03:filter-kubernetes-cache-refresh-interval
Open

filter_kubernetes: add kube_meta_cache_refresh_interval option#12035
Abdessamade03 wants to merge 2 commits into
fluent:masterfrom
Abdessamade03:filter-kubernetes-cache-refresh-interval

Conversation

@Abdessamade03

@Abdessamade03 Abdessamade03 commented Jul 2, 2026

Copy link
Copy Markdown

Summary

The Kubernetes filter caches pod metadata (keyed by namespace + pod name) to keep the request rate against the API server low. With the default kube_meta_cache_ttl of 0 the cache has no time-based expiry, so for long-lived pods with a stable name (such as StatefulSet members) labels and annotations that change in place are not reflected until the pod restarts or the entry is evicted.

This adds an opt-in kube_meta_cache_refresh_interval (default 0, disabled, so no behavior change). When set, a scheduled timer periodically re-fetches metadata for the pods currently in the cache and updates the entries in place.

Design notes (per the discussion in #11961):

  • Node-scoped by nature. The cache only holds pods this instance is logging, so the refresh is inherently bounded to node-local pods; the added request load stays proportional to what the node already does.
  • No new mislabeling window. Enrichment is applied inline per record, so updating an entry never relabels records already past the filter; only later records use the new values. This is the same eventual-consistency the existing kube_meta_cache_ttl expiry already has.
  • Bounded tracking. Pods evicted from the cache are dropped from the refresh set on the next cycle, so the tracking list stays a subset of the cache.
  • Works with both the API server and Kubelet (use_kubelet) fetch paths, reusing the existing metadata fetch code.

Fixes #11961


Testing

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found
Example configuration
[FILTER]
    name                              kubernetes
    match                             kube.*
    kube_meta_cache_refresh_interval  60
Debug log (interval=1s)
[info]  [filter:kubernetes:kubernetes.0] background pod metadata refresh enabled (interval=1s)
[debug] [filter:kubernetes:kubernetes.0] refreshed metadata for 1 cached pod(s)
Valgrind (runtime test, in-process)
==21251== HEAP SUMMARY:
==21251==     in use at exit: 0 bytes in 0 blocks
==21251==   total heap usage: 3,602 allocs, 3,602 frees, 1,696,266 bytes allocated
==21251== All heap blocks were freed -- no leaks are possible
==21251== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

A runtime test (kube_meta_cache_refresh in tests/runtime/filter_kubernetes.c) drives the full cache, track, refresh and cleanup cycle, and the entire flb-rt-filter_kubernetes suite passes with no regressions.

Packaging: [N/A]

Documentation

  • Documentation required for this feature

The caching behavior and the StatefulSet caveat were documented in fluent/fluent-bit-docs#2611 (merged). A follow-up docs PR will cover the new kube_meta_cache_refresh_interval option.

Backporting

  • Backport to latest stable release (targeting master; happy to backport if desired)

Fluent Bit is licensed under Apache 2.0; by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Added optional background refresh for cached Kubernetes pod metadata.
    • Introduced kube_meta_cache_refresh_interval to control refresh frequency.
    • Long-lived pods can now reflect updated labels and annotations without restarting.
  • Bug Fixes

    • Improved cache update behavior to avoid unintended evictions during metadata refresh.
    • Enhanced cleanup to stop background refresh and release tracked metadata more reliably.
  • Tests

    • Added an end-to-end runtime test covering metadata cache refresh behavior over time.

The Kubernetes filter caches pod metadata keyed by namespace and pod name
to keep the request rate against the API server low. With the default
kube_meta_cache_ttl of 0 the cache has no time-based expiry, so for
long-lived pods with a stable name (such as StatefulSet members) labels
and annotations that change in place are not reflected until the pod
restarts or the entry is evicted.

Add an opt-in kube_meta_cache_refresh_interval option (default 0,
disabled). When set, a scheduled timer periodically re-fetches metadata
for the pods currently in the cache and updates the entries in place.

The cache only ever holds pods this instance is logging, so the refresh
is naturally bounded to node-local pods. Enrichment is applied per
record, so updating an entry never relabels records already past the
filter, matching the eventual-consistency of the existing TTL expiry
path. Pods evicted from the cache are dropped from the refresh set on the
next cycle. Both the API server and Kubelet fetch paths are supported.

Signed-off-by: Abdessamade Abarchihi <135852701+Abdessamade03@users.noreply.github.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b6ec55477

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2162 to +2164
flb_hash_table_add(ctx->hash_table,
rp->meta.cache_key, rp->meta.cache_key_len,
buf, size);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid evicting cache entries during refresh

When the pod metadata cache is full, this refresh path can evict unrelated entries even though it is only replacing an entry that was just confirmed to exist. flb_hash_table_add() checks capacity and evicts before it checks whether the key is a replacement (src/flb_hash_table.c:414-431), so with the default random-eviction cache a refresh cycle over 256 tracked pods can randomly drop other cached pods and shrink/churn the cache, causing avoidable API/kubelet misses for active pods. Use a replacement path that does not run capacity eviction for existing keys.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74ecc2. Reordered flb_hash_table_add() so a replacement is handled before the capacity check, so refresh no longer evicts unrelated entries. The hashtable internal suite (including the eviction tests) and the full filter_kubernetes suite pass, and valgrind is clean.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9c1d954-36a6-4eff-be40-c5c6ad8d319a

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6ec55 and b74ecc2.

📒 Files selected for processing (1)
  • src/flb_hash_table.c

📝 Walkthrough

Walkthrough

This PR adds optional periodic refresh of cached Kubernetes pod metadata and changes hash table insertion so existing keys are replaced before eviction logic runs.

Changes

Background pod metadata refresh

Layer / File(s) Summary
Refresh configuration and context state
plugins/filter_kubernetes/kube_conf.h, plugins/filter_kubernetes/kube_meta.h, plugins/filter_kubernetes/kubernetes.c
Adds the refresh interval field, refresh timer and pod list state, the timer forward declaration, and the config option that enables background metadata refresh.
Refresh tracker and timer implementation
plugins/filter_kubernetes/kube_meta.c
Adds tracked pod copies, the periodic refresh callback, and start/destroy functions that re-fetch and update cached pod metadata in place.
Filter lifecycle wiring
plugins/filter_kubernetes/kube_conf.c, plugins/filter_kubernetes/kubernetes.c
Initializes refresh state on context creation, starts refresh lazily from the filter callback, and destroys refresh state during teardown.
Runtime refresh test
tests/runtime/filter_kubernetes.c
Adds and registers an end-to-end runtime test that exercises the refresh interval and checks for updated enrichment output.

Hash table replacement ordering

Layer / File(s) Summary
Replacement before eviction
src/flb_hash_table.c
Moves existing-entry replacement ahead of capacity checks so updates return immediately without triggering eviction paths.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant cb_kube_filter
  participant flb_kube_meta_refresh_start
  participant SchedulerTimer
  participant kube_meta_refresh_callback
  participant get_and_merge_pod_meta
  participant PodMetaCache

  cb_kube_filter->>flb_kube_meta_refresh_start: start timer if interval > 0 and no timer running
  flb_kube_meta_refresh_start->>SchedulerTimer: create periodic timer
  SchedulerTimer->>kube_meta_refresh_callback: invoke on interval
  kube_meta_refresh_callback->>PodMetaCache: check pod still cached
  kube_meta_refresh_callback->>get_and_merge_pod_meta: re-fetch pod metadata
  get_and_merge_pod_meta-->>kube_meta_refresh_callback: return updated metadata
  kube_meta_refresh_callback->>PodMetaCache: update cache entry in place
Loading

Suggested labels: backport to v4.2.x

Suggested reviewers: edsiper

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title concisely describes the main change: adding the kube_meta_cache_refresh_interval option.
Linked Issues check ✅ Passed The PR adds background pod-metadata refresh so updated labels and pod fields are reloaded without restarting Fluent Bit.
Out of Scope Changes check ✅ Passed The hash table change appears to support safe in-place cache replacement for the new refresh flow, so no clear out-of-scope changes stand out.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
tests/runtime/filter_kubernetes.c (2)

1032-1038: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Docstring overclaims "cleanup" coverage.

The comment states this test validates the "track -> timer -> refresh -> cleanup path," but nothing in the test asserts cleanup/eviction of tracked entries from refresh_pods. Consider tightening the comment to match what's actually exercised.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runtime/filter_kubernetes.c` around lines 1032 - 1038, The test
docstring in filter_kubernetes.c overstates what is covered by the refresh_pods
path: it mentions cleanup, but the test only verifies tracking, timer-driven
refresh, and continued enrichment. Tighten the comment near the
kube_meta_cache_refresh_interval scenario so it matches the actual assertions
and removes the cleanup/eviction claim unless the test explicitly checks it.

1032-1128: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy lift

Test doesn't actually validate a metadata change was refreshed.

Both pushes use the identical tag-derived metadata (kube.default.refreshpod.con), so the assertions only confirm that enrichment continues to work after the timer fires and the cache entry is replaced — they don't prove the content of the cache entry was actually re-fetched/updated. A regression where the refresh timer silently no-ops (e.g. skips the actual re-fetch call but still leaves the old cache entry intact) would still pass this test.

Consider asserting on something that would differ between the tracked/stale copy and the refreshed one (e.g., a counter of fetch invocations, or a field value that changes between calls) to meaningfully cover the "in-place update" behavior described in the PR.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runtime/filter_kubernetes.c` around lines 1032 - 1128, The refresh test
in flb_test_kube_meta_cache_refresh only proves enrichment still happens after
the timer, but it does not verify that the cached metadata was actually
refreshed in place. Update the test to assert a value that changes across
refreshes, or to observe a fetch/refresh counter tied to the kubernetes filter’s
metadata refresh path, so the test can distinguish a stale cached copy from a
re-fetched entry. Use the existing refresh_enriched/cb_refresh_enriched flow as
the location to add the stronger assertion.
plugins/filter_kubernetes/kube_meta.c (1)

2572-2574: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Track pod for refresh even when the cache insert failed.

kube_meta_refresh_track(ctx, meta) is called unconditionally after the if (id >= 0) { ... } block, so it also fires when flb_hash_table_add fails (id < 0) and the entry was never actually cached. It's self-correcting (pruned on the next refresh tick since the key won't be found), but it does needless work (deep copy + list insert) and briefly tracks a pod that isn't cached.

🧹 Suggested fix: only track on successful cache insert
         id = flb_hash_table_add(ctx->hash_table,
                                 meta->cache_key, meta->cache_key_len,
                                 tmp_hash_meta_buf, hash_meta_size);
         if (id >= 0) {
             /*
              * Release the original buffer created on extract_pod_meta() as a new
              * copy has been generated into the hash table, then re-set
              * the outgoing buffer and size.
              */
             flb_free(tmp_hash_meta_buf);
             flb_hash_table_get_by_id(ctx->hash_table, id, meta->cache_key,
                                      &hash_meta_buf, &hash_meta_size);
+
+            /* Track this pod for background metadata refresh (if enabled) */
+            kube_meta_refresh_track(ctx, meta);
         }
-
-        /* Track this pod for background metadata refresh (if enabled) */
-        kube_meta_refresh_track(ctx, meta);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/filter_kubernetes/kube_meta.c` around lines 2572 - 2574, The refresh
tracking in kube_meta.c is happening even when the cache insert fails, so update
the logic around kube_meta_refresh_track(ctx, meta) to run only after a
successful flb_hash_table_add in the successful id >= 0 path. Keep the tracking
tied to the same control flow that confirms the pod was actually cached, using
kube_meta_refresh_track and the existing cache insert block to locate the
change.
plugins/filter_kubernetes/kubernetes.c (1)

622-629: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Repeated retry/log-spam if refresh timer creation fails.

If flb_kube_meta_refresh_start fails once (scheduler unavailable, timer creation error), ctx->refresh_timer stays NULL, so this block re-attempts and re-logs an error on every subsequent cb_kube_filter invocation (i.e., every flush) instead of failing once and staying disabled.

🛡️ Suggested fix: track failure to avoid repeated retries
-    if (ctx->kube_meta_cache_refresh_interval > 0 && ctx->refresh_timer == NULL) {
-        flb_kube_meta_refresh_start(ctx);
-    }
+    if (ctx->kube_meta_cache_refresh_interval > 0 && ctx->refresh_timer == NULL &&
+        ctx->refresh_start_failed == FLB_FALSE) {
+        if (flb_kube_meta_refresh_start(ctx) == -1) {
+            ctx->refresh_start_failed = FLB_TRUE;
+        }
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/filter_kubernetes/kubernetes.c` around lines 622 - 629,
`cb_kube_filter` repeatedly calls `flb_kube_meta_refresh_start` whenever
`ctx->refresh_timer` stays NULL, which causes retry/log spam after a startup
failure. Add a failure/disabled flag in the Kubernetes filter context and update
the `cb_kube_filter` guard so it only attempts `flb_kube_meta_refresh_start`
once; if the start fails, mark refresh as permanently disabled and skip future
retries/logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@plugins/filter_kubernetes/kube_meta.c`:
- Around line 2572-2574: The refresh tracking in kube_meta.c is happening even
when the cache insert fails, so update the logic around
kube_meta_refresh_track(ctx, meta) to run only after a successful
flb_hash_table_add in the successful id >= 0 path. Keep the tracking tied to the
same control flow that confirms the pod was actually cached, using
kube_meta_refresh_track and the existing cache insert block to locate the
change.

In `@plugins/filter_kubernetes/kubernetes.c`:
- Around line 622-629: `cb_kube_filter` repeatedly calls
`flb_kube_meta_refresh_start` whenever `ctx->refresh_timer` stays NULL, which
causes retry/log spam after a startup failure. Add a failure/disabled flag in
the Kubernetes filter context and update the `cb_kube_filter` guard so it only
attempts `flb_kube_meta_refresh_start` once; if the start fails, mark refresh as
permanently disabled and skip future retries/logs.

In `@tests/runtime/filter_kubernetes.c`:
- Around line 1032-1038: The test docstring in filter_kubernetes.c overstates
what is covered by the refresh_pods path: it mentions cleanup, but the test only
verifies tracking, timer-driven refresh, and continued enrichment. Tighten the
comment near the kube_meta_cache_refresh_interval scenario so it matches the
actual assertions and removes the cleanup/eviction claim unless the test
explicitly checks it.
- Around line 1032-1128: The refresh test in flb_test_kube_meta_cache_refresh
only proves enrichment still happens after the timer, but it does not verify
that the cached metadata was actually refreshed in place. Update the test to
assert a value that changes across refreshes, or to observe a fetch/refresh
counter tied to the kubernetes filter’s metadata refresh path, so the test can
distinguish a stale cached copy from a re-fetched entry. Use the existing
refresh_enriched/cb_refresh_enriched flow as the location to add the stronger
assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06b044ba-2688-410e-999f-200b381a7e66

📥 Commits

Reviewing files that changed from the base of the PR and between 54235a2 and 4b6ec55.

📒 Files selected for processing (6)
  • plugins/filter_kubernetes/kube_conf.c
  • plugins/filter_kubernetes/kube_conf.h
  • plugins/filter_kubernetes/kube_meta.c
  • plugins/filter_kubernetes/kube_meta.h
  • plugins/filter_kubernetes/kubernetes.c
  • tests/runtime/filter_kubernetes.c

flb_hash_table_add() ran the capacity check and eviction before checking
whether the key already existed, so replacing the value of an existing
key on a full table would evict an unrelated entry even though the entry
count does not change. Check for a replacement first and only run the
capacity and eviction path when inserting a genuinely new entry.

Signed-off-by: Abdessamade Abarchihi <135852701+Abdessamade03@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fluent-bit not updating pod metadata when pod restarts/updates

1 participant