Skip to content

Conversation

@mirko-lazarevic
Copy link
Contributor

@mirko-lazarevic mirko-lazarevic commented Dec 1, 2025

This fix ensures that when the buffer is
flushed, the record will have proper timestamp
and metadata instead of just the "log" field.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Multiline processing now registers first-line context earlier for regex, ends-with, and equality modes so metadata is captured before buffering/concatenation; metadata is no longer packed when the content was truncated.
  • Tests

    • Added regression tests verifying full metadata (time, stream, file, log) is preserved across multiline flushes, including truncation scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The multiline parser now registers the stream-group context earlier for first-line maps in FLB_ML_REGEX, FLB_ML_ENDSWITH, and FLB_ML_EQ when the group's buffer is empty, and it suppresses metadata packing when content was truncated.

Changes

Cohort / File(s) Change Summary
Multiline context initialization
src/multiline/flb_ml.c
Added conditional calls to flb_ml_register_context(stream_group, tm, full_map) when stream_group->mp_sbuf.size == 0 in FLB_ML_REGEX, FLB_ML_ENDSWITH, and FLB_ML_EQ branches; in REGEX path, registration is placed after truncation logic. Also changed metadata packing condition from processed && metadata != NULL to !truncated && processed && metadata != NULL.
Multiline metadata regression tests
tests/internal/multiline.c
Added tests for issue 10576: metadata_result helper, flush_callback_metadata_check, append_log_with_metadata, test_issue_10576, test_issue_truncation_10576, and registered them in TEST_LIST to validate metadata preservation and truncation behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review placement and conditions for flb_ml_register_context() to ensure correct ordering across branches.
  • Verify REGEX truncation handling and that registration occurs only when intended.
  • Validate the metadata packing condition change doesn't drop metadata in other flows.
  • Inspect new tests and callbacks for correctness and potential flakiness.

Poem

🐇 I nibble lines and tuck a map inside,
When fragments meet, I hold the stride,
If edges fray or truncation nips,
I still remember all the bits,
A rabbit keeps the trail with pride.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'multiline: ensure context is registered for REGEX type' directly addresses the primary technical change—ensuring context registration for REGEX multiline mode—which is the core fix for metadata preservation in flushed records.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 272 to 274
if (stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid registering context after truncated regex flush

In package_content the new mp_sbuf.size == 0 check registers a context even when flb_ml_rule_process returned FLB_MULTILINE_TRUNCATED, which already flushed the group, cleared mp_sbuf, and reset rule_to_state. That leaves a packed map in an otherwise empty group; the next flush or start-state match will unpack the stale map (or emit it as a standalone record) and the subsequent multiline message will carry metadata/timestamps from the previous, truncated line instead of the new one. This affects regex parsers when the buffer limit forces truncation, producing misattributed or duplicate log records.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mirko-lazarevic can you resolve this either by indicating it is not relevant or fixing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrick-stephens I'm not 100% sure about this. Could you please ask the person who worked on it previously to confirm whether it's relevant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most straightforward / proper way to fix this is:

if (ret == FLB_MULTILINE_TRUNCATED) {
    truncated = FLB_TRUE;
}

if (!truncated && stream_group->mp_sbuf.size == 0) {
    flb_ml_register_context(stream_group, tm, full_map);
}

and in another line:

if (!truncated && processed && metadata != NULL) {
    msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}

Reasons:

  • When TRUNCATED is returned,

    • the entire lifecycle has already been completed inside rule_process → flush_stream_group,
    • so it is not appropriate for package_content() to start a new group implicitly at that moment.
  • When the next multiline start-state line arrives,
    the context should be initialized by flb_ml_rule_process(), not by package_content().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmo0920 When you say "in another line", do you mean the line immediately below within the same if block? I'm asking because in lines between 349-351 this logic already exist.

Should I simply add the (!truncated) condition to that existing if block?

Apologize if this is a basic question - I don’t yet have the full context of all the multiline options and logic.

Copy link
Contributor

@cosmo0920 cosmo0920 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I simply add the (!truncated) condition to that existing if block?

        if (ret == FLB_MULTILINE_TRUNCATED) {
            truncated = FLB_TRUE;
        }

        if (stream_group->mp_sbuf.size == 0) {
            flb_ml_register_context(stream_group, tm, full_map);
        }

should be

        if (ret == FLB_MULTILINE_TRUNCATED) {
            truncated = FLB_TRUE;
        }

        if (!truncated && stream_group->mp_sbuf.size == 0) {
            flb_ml_register_context(stream_group, tm, full_map);
        }

And yes:

    if (processed && metadata != NULL) {
        msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
    }

should be

    if (!truncated && processed && metadata != NULL) {
        msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
    }

Your understanding is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmo0920 Thanks. I made the changes accordingly. In addition, I added additional unit test that should cover this specific case.

@mirko-lazarevic
Copy link
Contributor Author

This pull request should address the issue #10576

For the fluent-bit configuration example and steps how to reproduce the issue, navigate to #10576

Output after the fix:

Fluent Bit v4.2.1
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF graduated project under the Fluent organization
* https://fluentbit.io

______ _                  _    ______ _ _             ___   _____
|  ___| |                | |   | ___ (_) |           /   | / __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |   / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |_./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)_____/

             Fluent Bit v4.2 – Direct Routes Ahead
         Celebrating 10 Years of Open, Fluent Innovation!

[2025/12/01 12:30:43.528267000] [ info] [fluent bit] version=4.2.1, commit=10ebd3a354, pid=6123
[2025/12/01 12:30:43.528771000] [ info] [storage] ver=1.5.4, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/12/01 12:30:43.528993000] [ info] [simd    ] disabled
[2025/12/01 12:30:43.528998000] [ info] [cmetrics] version=1.0.5
[2025/12/01 12:30:43.529349000] [ info] [ctraces ] version=0.6.6
[2025/12/01 12:30:43.529578000] [ info] [input:tail:tail.0] initializing
[2025/12/01 12:30:43.529585000] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2025/12/01 12:30:43.530012000] [ info] [input:tail:tail.0] multiline core started
[2025/12/01 12:30:43.530308000] [ info] [input:tail:tail.0] thread instance initialized
[2025/12/01 12:30:43.530546000] [ info] [filter:multiline:ml-detect] created emitter: emitter_for_ml-detect
[2025/12/01 12:30:43.530591000] [ info] [input:emitter:emitter_for_ml-detect] initializing
[2025/12/01 12:30:43.530596000] [ info] [input:emitter:emitter_for_ml-detect] storage_strategy='memory' (memory only)
[2025/12/01 12:30:43.530916000] [ info] [output:stdout:stdout.0] worker #0 started
[2025/12/01 12:30:43.531683000] [ info] [http_server] listen iface=0.0.0.0 tcp_port=8081
[2025/12/01 12:30:43.531917000] [ info] [sp] stream processor started
[2025/12/01 12:30:43.532206000] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2


[2025/12/01 12:30:49.787352000] [ info] [filter:multiline:ml-detect] created new multiline stream for tail.0_kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588649.296478000, {}], {"time"=>"2025-12-01T12:30:49.296478+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:30:49 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588654.298018000, {}], {"time"=>"2025-12-01T12:30:54.298018+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:30:54 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588659.299245000, {}], {"time"=>"2025-12-01T12:30:59.299245+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-12-01T11:30:59+00:00 should be ok", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588667.512873000, {}], {"time"=>"2025-12-01T12:31:07.512873+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:31:07 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588672.513383999, {}], {"time"=>"2025-12-01T12:31:12.513384+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:31:12 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]

@patrick-stephens
Copy link
Collaborator

@mirko-lazarevic maybe tweak the commit slightly as having ml: in there is redundant and confusing.

Can you add some unit tests as well? I really like to see those as next time the code is refactored/updated it will prevent a similar problem.

@patrick-stephens
Copy link
Collaborator

The CIFuzz failure is down to something else so can be ignored: #11227

This fix ensures that when the buffer is
flushed, the record will have proper timestamp
and metadata instead of just the "log" field.

Signed-off-by: Mirko Lazarevic <[email protected]>
@mirko-lazarevic
Copy link
Contributor Author

@patrick-stephens

@mirko-lazarevic maybe tweak the commit slightly as having ml: in there is redundant and confusing.

I saw exact the same commit message from one of the maintainers, that's why I did the same. Anyway, I removed ml:.

I'll see if I can add some unit tests, although my knowledge in this area is limited.

Signed-off-by: Mirko Lazarevic <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tests/internal/multiline.c (5)

1663-1720: Strengthen metadata validation beyond field-count == 4

The callback correctly verifies that no record is left with a single log-only field and gives useful debug output. Right now, “full metadata” is inferred purely from field_count == 4, which would still pass if those four fields were not actually time, stream, log, and file.

If you want this test to be a tighter regression guard for #10576, consider also asserting the presence of those specific keys in the map (e.g. track booleans for each key and require all four to be seen before incrementing records_with_full_metadata). This would still be low-cost here and better documents the expected contract.


1722-1791: Minor: avoid extra pack/unpack or clarify intent in helper

append_log_to_multiline_processor is functionally correct and nicely encapsulates the construction of a “full metadata” record. For a test helper, going through pack → unpack to recover the map is fine, but it’s a bit indirect.

Two optional tweaks you might consider:

  • Add a brief comment explaining that you intentionally reuse the existing [timestamp, map] pattern from other tests and then pull out map = &root.via.array.ptr[1] to feed flb_ml_append_object, so readers don’t have to reverse‑engineer it.
  • Or, if you prefer, build the msgpack_object map directly (mirroring how other tests feed flb_ml_append_object) and skip the unpack step to reduce noise.

No functional changes required; this is just for clarity and maintainability.


1838-1848: Fix parameter comments for flb_ml_parser_create to match existing usage

Here the call to flb_ml_parser_create passes two NULL arguments commented as key_group and key_pattern, but earlier in this file (e.g. the elastic test) the documented order is:

/* key_content, key_pattern, key_group, parser ctx, parser name */

The arguments are all NULL so behavior is unaffected, but the swapped comments are misleading. It’d be good to align them with the existing pattern, e.g.:

    mlp = flb_ml_parser_create(config,
                               "parser_10576",  /* name      */
                               FLB_ML_REGEX,    /* type      */
                               NULL,            /* match_str */
                               FLB_FALSE,       /* negate    */
                               1000,            /* flush_ms  */
                               "log",           /* key_content */
                               NULL,            /* key_pattern */
                               NULL,            /* key_group */
                               NULL,            /* parser ctx */
                               NULL);           /* parser name */

1809-1912: Issue/test naming consistency for easier traceability

The test is labeled and commented as covering issue 10576, and TEST_LIST registers it under "issue_10576", but the function name is test_issue_10567_metadata_preservation.

This doesn’t affect behavior, but it’s slightly confusing when grepping by issue number. Consider renaming the function to test_issue_10576_metadata_preservation (and adjusting the prototype/usage) so the function name, comment header, and test label all refer to the same issue id.


1649-1655: Optional: trim unused fields in struct metadata_result or use them

metadata_result currently defines key and total_expected, but the new metadata callback logic never reads them. That’s not harmful, but it adds a bit of noise.

Either:

  • Remove these fields from struct metadata_result and the corresponding assignments in test_issue_10567_metadata_preservation, or
  • Use total_expected in a final assertion and/or key when scanning the map (e.g. explicitly validating that the log field is present), which would also slightly strengthen the test.

Purely cosmetic, but would tighten up the helper struct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23a2e18 and 727dfad.

📒 Files selected for processing (1)
  • tests/internal/multiline.c (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
🔇 Additional comments (1)
tests/internal/multiline.c (1)

1931-1936: LGTM: good to have a dedicated regression test registered

Registering "issue_10576" in TEST_LIST and wiring it to the new metadata preservation test completes the regression guard for this bug. Looks consistent with the existing issue tests.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/internal/multiline.c (4)

1646-1655: Unused fields in struct metadata_result make the helper harder to read

total_expected and key are never used by flush_callback_metadata_check or the test assertions; only the counters and current_record matter. This makes it less obvious what the test actually relies on.

Consider either wiring total_expected into a final assertion, or dropping unused fields from this struct so the intent is clearer.


1663-1720: Make the metadata check resilient to extra fields instead of requiring exactly 4

Right now, you treat “full metadata” as field_count == 4 and increment records_with_log_only for anything else, including maps with more than 4 fields. That will cause this regression test to start failing if we ever add more metadata keys (e.g. tag, container_id) even though the underlying bug hasn’t regressed.

Given the comment “all lines should have multiple fields (time, stream, log, file, etc.)”, a more robust condition would be something like:

  • Treat field_count == 1 (only log) as failure, and
  • Treat field_count > 1 as success, or
  • Explicitly check for the presence of time, stream, log, and file keys regardless of map size.

For example:

-    /* 4 fields: time, stream, log, file*/
-    if (field_count == 4) {
-        res->records_with_full_metadata++;
-    } else {
-        res->records_with_log_only++;
-        fprintf(stdout, "  WARNING: Record has only 'log' field (missing metadata)\n");
-    }
+    /* Consider "log-only" as a single-field map; anything else has metadata */
+    if (field_count == 1) {
+        res->records_with_log_only++;
+        fprintf(stdout, "  WARNING: Record has only 'log' field (missing metadata)\n");
+    }
+    else {
+        res->records_with_full_metadata++;
+    }

You could further strengthen this by actually checking for the expected key names when iterating the map.


1809-1912: Align test naming with the referenced issue and consider asserting on key names

Function name and expectations:

  • The function is named test_issue_10567_metadata_preservation but the comments, parser name (parser_10576), and TEST_LIST entry refer to issue 10576. That mismatch is confusing when grepping for the issue ID.
  • The final assertions only check that no record was classified as “log only” and that counts match, but they don’t assert that the flushed records actually contain time, stream, log, and file keys — which is what the bug was about.

Suggestions:

  1. Rename the function to match the issue ID and the test list entry:
-static void test_issue_10567_metadata_preservation()
+static void test_issue_10576_metadata_preservation()

(and update the prototype in the TEST_LIST entry accordingly).

  1. Optionally, make the test stronger by having flush_callback_metadata_check verify the presence of those four key names rather than relying solely on field count.

1935-1935: Keep TEST_LIST entry consistent with the test function name

Once you rename the test function to use 10576, this entry should be updated to stay in sync with the symbol name:

-    { "issue_10576"   , test_issue_10567_metadata_preservation },
+    { "issue_10576"   , test_issue_10576_metadata_preservation },

This keeps the usual “issue NNNN ↔ test_issue_NNNN” convention used elsewhere in this file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 727dfad and 3488f02.

📒 Files selected for processing (1)
  • tests/internal/multiline.c (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/internal/multiline.c (4)
src/multiline/flb_ml.c (4)
  • flb_ml_append_object (764-863)
  • flb_ml_create (876-928)
  • flb_ml_flush_pending_now (143-149)
  • flb_ml_destroy (989-1014)
src/multiline/flb_ml_parser.c (4)
  • flb_ml_parser_create (200-224)
  • flb_ml_parser_init (131-141)
  • flb_ml_parser_instance_create (261-312)
  • flb_ml_parser_instance_set (315-340)
src/multiline/flb_ml_rule.c (1)
  • flb_ml_rule_create (48-115)
src/multiline/flb_ml_stream.c (1)
  • flb_ml_stream_create (223-276)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: PR - fuzzing test

Signed-off-by: Mirko Lazarevic <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/internal/multiline.c (2)

1646-1720: Metadata tracking callback is sound; optional stronger assertions

The metadata_result struct and flush_callback_metadata_check correctly validate non-zero timestamps, count fields, and track records with/without metadata, matching the test’s intent. If you want the regression to be stricter, you could additionally assert that expected keys like "time", "stream", "log", and "file" are present instead of relying solely on field_count > 1, but that’s optional for this test.


1795-1909: Issue 10576 test wiring and assertions align well with the regression being guarded

The test_issue_10576 setup (custom parser_10576, regex rules, multiline context, and stream using flush_callback_metadata_check) accurately exercises the “flush pending lines preserve metadata” path and asserts that every flushed record carries more than just log. The control flow, resource cleanup, and use of metadata_result counters all look correct. As a tiny nit, the parameter comments in the flb_ml_parser_create call list key_group before key_pattern, whereas other call sites comment them in signature order; aligning that would avoid confusion if these ever become non‑NULL.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3488f02 and 914f918.

📒 Files selected for processing (1)
  • tests/internal/multiline.c (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (2)
tests/internal/multiline.c (2)

1722-1792: Helper for feeding pre‑packed records into multiline engine looks correct

append_log_to_multiline_processor cleanly constructs the [timestamp, map] msgpack record, includes all expected fields, unpacks safely with error handling, and enforces flb_ml_append_object returning FLB_MULTILINE_OK before returning. Buffer lifetimes (msgpack_unpacked_destroy / msgpack_sbuffer_destroy) are handled correctly, so this helper is safe to use in the test.


1932-1933: New test is properly registered in TEST_LIST

Registering "issue_10576" in TEST_LIST ensures the new regression test is executed alongside existing multiline tests; no issues here.

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this patch contains issues for truncation happened.
I commented for that case straightforward solution.

Comment on lines 272 to 274
if (stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most straightforward / proper way to fix this is:

if (ret == FLB_MULTILINE_TRUNCATED) {
    truncated = FLB_TRUE;
}

if (!truncated && stream_group->mp_sbuf.size == 0) {
    flb_ml_register_context(stream_group, tm, full_map);
}

and in another line:

if (!truncated && processed && metadata != NULL) {
    msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}

Reasons:

  • When TRUNCATED is returned,

    • the entire lifecycle has already been completed inside rule_process → flush_stream_group,
    • so it is not appropriate for package_content() to start a new group implicitly at that moment.
  • When the next multiline start-state line arrives,
    the context should be initialized by flb_ml_rule_process(), not by package_content().

Addresses PR comments and adds correspoinding unit tests

Signed-off-by: Mirko Lazarevic <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/internal/multiline.c (3)

1752-1754: Fix typo in comment.

Minor documentation issue.

-/*
- * Helper funciton to append log with custom stream/file metadata
- */
+/*
+ * Helper function to append log with custom stream/file metadata
+ */

1855-1865: Parameter comments are swapped.

Comparing with test_parser_elastic (lines 1006-1008), the actual signature order is key_content, key_pattern, key_group. The comments here incorrectly label them. Functionally harmless since both are NULL, but misleading for future maintainers.

     mlp = flb_ml_parser_create(config,
                                "parser_10576",  /* name      */
                                FLB_ML_REGEX,    /* type      */
                                NULL,            /* match_str */
                                FLB_FALSE,       /* negate */
                                1000,            /* flush_ms */
                                "log",           /* key_content */
-                               NULL,            /* key_group */
-                               NULL,            /* key_pattern */
+                               NULL,            /* key_pattern */
+                               NULL,            /* key_group */
                                NULL,            /* parser */
                                NULL);           /* parser_name */

1969-1981: Same parameter comment swap as above.

     mlp = flb_ml_parser_create(config,
                                "truncation_parser_10576",   /* name      */
                                FLB_ML_REGEX,                /* type      */
                                NULL,                        /* match_str */
                                FLB_FALSE,                   /* negate */
                                1000,                        /* flush_ms */
                                "log",                       /* key_content */
-                               NULL,                        /* key_group */
-                               NULL,                        /* key_pattern */
+                               NULL,                        /* key_pattern */
+                               NULL,                        /* key_group */
                                NULL,                        /* parser */
                                NULL);                       /* parser_name */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 914f918 and ec6417e.

📒 Files selected for processing (2)
  • src/multiline/flb_ml.c (2 hunks)
  • tests/internal/multiline.c (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • src/multiline/flb_ml.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (9)
src/multiline/flb_ml.c (4)

268-276: Context registration correctly guarded against truncation.

The !truncated condition ensures that after a FLB_MULTILINE_TRUNCATED return from flb_ml_rule_process, no stale context is registered on an already-flushed group. This aligns with the fix discussed in the prior review thread.


291-293: Context registration for ENDSWITH looks correct.

This branch doesn't invoke flb_ml_rule_process, so truncation cannot occur here. Registering context when the buffer is empty is appropriate for capturing first-line metadata.


325-327: Context registration for EQ looks correct.

Same rationale as ENDSWITH—no truncation path exists in this branch, so registering context when the buffer is empty is safe.


349-351: Metadata packing correctly suppressed after truncation.

Adding !truncated prevents packing stale metadata into a group that was already flushed during truncation, avoiding misattributed records.

tests/internal/multiline.c (5)

1652-1663: Test helper structure looks appropriate for the test scope.

The fixed-size arrays (10 elements) are sufficient for the test cases which use at most 6 lines.


1671-1750: Metadata verification callback is well-structured.

The callback properly unpacks records, checks for expected metadata fields, and tracks results for assertion.


1826-1923: Comprehensive test for metadata preservation.

The test correctly validates that all flushed records retain their original metadata fields (stream, file) even after line-by-line flushing. The assertions verify zero records with missing metadata.


1941-2065: Truncation test validates correct per-record metadata isolation.

This test confirms that after truncation, a new start-state line correctly receives its own metadata rather than inheriting stale context from the truncated group. The explicit metadata value comparisons (stdout/stderr, app1.log/app2.log) are excellent for catching regressions.


2087-2088: Test list entries added correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants