-
Notifications
You must be signed in to change notification settings - Fork 1.8k
multiline: ensure context is registered for REGEX type #11231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe multiline parser now registers the stream-group context earlier for first-line maps in FLB_ML_REGEX, FLB_ML_ENDSWITH, and FLB_ML_EQ when the group's buffer is empty, and it suppresses metadata packing when content was truncated. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/multiline/flb_ml.c
Outdated
| if (stream_group->mp_sbuf.size == 0) { | ||
| flb_ml_register_context(stream_group, tm, full_map); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid registering context after truncated regex flush
In package_content the new mp_sbuf.size == 0 check registers a context even when flb_ml_rule_process returned FLB_MULTILINE_TRUNCATED, which already flushed the group, cleared mp_sbuf, and reset rule_to_state. That leaves a packed map in an otherwise empty group; the next flush or start-state match will unpack the stale map (or emit it as a standalone record) and the subsequent multiline message will carry metadata/timestamps from the previous, truncated line instead of the new one. This affects regex parsers when the buffer limit forces truncation, producing misattributed or duplicate log records.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirko-lazarevic can you resolve this either by indicating it is not relevant or fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrick-stephens I'm not 100% sure about this. Could you please ask the person who worked on it previously to confirm whether it's relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most straightforward / proper way to fix this is:
if (ret == FLB_MULTILINE_TRUNCATED) {
truncated = FLB_TRUE;
}
if (!truncated && stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}and in another line:
if (!truncated && processed && metadata != NULL) {
msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}Reasons:
-
When TRUNCATED is returned,
- the entire lifecycle has already been completed inside
rule_process → flush_stream_group, - so it is not appropriate for
package_content()to start a new group implicitly at that moment.
- the entire lifecycle has already been completed inside
-
When the next multiline start-state line arrives,
the context should be initialized byflb_ml_rule_process(), not bypackage_content().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmo0920 When you say "in another line", do you mean the line immediately below within the same if block? I'm asking because in lines between 349-351 this logic already exist.
Should I simply add the (!truncated) condition to that existing if block?
Apologize if this is a basic question - I don’t yet have the full context of all the multiline options and logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I simply add the (
!truncated) condition to that existing if block?
if (ret == FLB_MULTILINE_TRUNCATED) {
truncated = FLB_TRUE;
}
if (stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}should be
if (ret == FLB_MULTILINE_TRUNCATED) {
truncated = FLB_TRUE;
}
if (!truncated && stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}And yes:
if (processed && metadata != NULL) {
msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}should be
if (!truncated && processed && metadata != NULL) {
msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}Your understanding is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmo0920 Thanks. I made the changes accordingly. In addition, I added additional unit test that should cover this specific case.
|
This pull request should address the issue #10576 For the fluent-bit configuration example and steps how to reproduce the issue, navigate to #10576 Output after the fix: |
|
@mirko-lazarevic maybe tweak the commit slightly as having Can you add some unit tests as well? I really like to see those as next time the code is refactored/updated it will prevent a similar problem. |
|
The CIFuzz failure is down to something else so can be ignored: #11227 |
This fix ensures that when the buffer is flushed, the record will have proper timestamp and metadata instead of just the "log" field. Signed-off-by: Mirko Lazarevic <[email protected]>
a398968 to
23a2e18
Compare
I saw exact the same commit message from one of the maintainers, that's why I did the same. Anyway, I removed I'll see if I can add some unit tests, although my knowledge in this area is limited. |
Signed-off-by: Mirko Lazarevic <[email protected]>
1470a9e to
3488f02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
tests/internal/multiline.c (5)
1663-1720: Strengthen metadata validation beyond field-count == 4The callback correctly verifies that no record is left with a single
log-only field and gives useful debug output. Right now, “full metadata” is inferred purely fromfield_count == 4, which would still pass if those four fields were not actuallytime,stream,log, andfile.If you want this test to be a tighter regression guard for #10576, consider also asserting the presence of those specific keys in the map (e.g. track booleans for each key and require all four to be seen before incrementing
records_with_full_metadata). This would still be low-cost here and better documents the expected contract.
1722-1791: Minor: avoid extra pack/unpack or clarify intent in helper
append_log_to_multiline_processoris functionally correct and nicely encapsulates the construction of a “full metadata” record. For a test helper, going through pack → unpack to recover the map is fine, but it’s a bit indirect.Two optional tweaks you might consider:
- Add a brief comment explaining that you intentionally reuse the existing
[timestamp, map]pattern from other tests and then pull outmap = &root.via.array.ptr[1]to feedflb_ml_append_object, so readers don’t have to reverse‑engineer it.- Or, if you prefer, build the
msgpack_objectmap directly (mirroring how other tests feedflb_ml_append_object) and skip the unpack step to reduce noise.No functional changes required; this is just for clarity and maintainability.
1838-1848: Fix parameter comments forflb_ml_parser_createto match existing usageHere the call to
flb_ml_parser_createpasses twoNULLarguments commented askey_groupandkey_pattern, but earlier in this file (e.g. the elastic test) the documented order is:/* key_content, key_pattern, key_group, parser ctx, parser name */The arguments are all
NULLso behavior is unaffected, but the swapped comments are misleading. It’d be good to align them with the existing pattern, e.g.:mlp = flb_ml_parser_create(config, "parser_10576", /* name */ FLB_ML_REGEX, /* type */ NULL, /* match_str */ FLB_FALSE, /* negate */ 1000, /* flush_ms */ "log", /* key_content */ NULL, /* key_pattern */ NULL, /* key_group */ NULL, /* parser ctx */ NULL); /* parser name */
1809-1912: Issue/test naming consistency for easier traceabilityThe test is labeled and commented as covering issue 10576, and
TEST_LISTregisters it under"issue_10576", but the function name istest_issue_10567_metadata_preservation.This doesn’t affect behavior, but it’s slightly confusing when grepping by issue number. Consider renaming the function to
test_issue_10576_metadata_preservation(and adjusting the prototype/usage) so the function name, comment header, and test label all refer to the same issue id.
1649-1655: Optional: trim unused fields instruct metadata_resultor use them
metadata_resultcurrently defineskeyandtotal_expected, but the new metadata callback logic never reads them. That’s not harmful, but it adds a bit of noise.Either:
- Remove these fields from
struct metadata_resultand the corresponding assignments intest_issue_10567_metadata_preservation, or- Use
total_expectedin a final assertion and/orkeywhen scanning the map (e.g. explicitly validating that thelogfield is present), which would also slightly strengthen the test.Purely cosmetic, but would tighten up the helper struct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/internal/multiline.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
🔇 Additional comments (1)
tests/internal/multiline.c (1)
1931-1936: LGTM: good to have a dedicated regression test registeredRegistering
"issue_10576"inTEST_LISTand wiring it to the new metadata preservation test completes the regression guard for this bug. Looks consistent with the existing issue tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/internal/multiline.c (4)
1646-1655: Unused fields instruct metadata_resultmake the helper harder to read
total_expectedandkeyare never used byflush_callback_metadata_checkor the test assertions; only the counters andcurrent_recordmatter. This makes it less obvious what the test actually relies on.Consider either wiring
total_expectedinto a final assertion, or dropping unused fields from this struct so the intent is clearer.
1663-1720: Make the metadata check resilient to extra fields instead of requiring exactly 4Right now, you treat “full metadata” as
field_count == 4and incrementrecords_with_log_onlyfor anything else, including maps with more than 4 fields. That will cause this regression test to start failing if we ever add more metadata keys (e.g.tag,container_id) even though the underlying bug hasn’t regressed.Given the comment “all lines should have multiple fields (time, stream, log, file, etc.)”, a more robust condition would be something like:
- Treat
field_count == 1(onlylog) as failure, and- Treat
field_count > 1as success, or- Explicitly check for the presence of
time,stream,log, andfilekeys regardless of map size.For example:
- /* 4 fields: time, stream, log, file*/ - if (field_count == 4) { - res->records_with_full_metadata++; - } else { - res->records_with_log_only++; - fprintf(stdout, " WARNING: Record has only 'log' field (missing metadata)\n"); - } + /* Consider "log-only" as a single-field map; anything else has metadata */ + if (field_count == 1) { + res->records_with_log_only++; + fprintf(stdout, " WARNING: Record has only 'log' field (missing metadata)\n"); + } + else { + res->records_with_full_metadata++; + }You could further strengthen this by actually checking for the expected key names when iterating the map.
1809-1912: Align test naming with the referenced issue and consider asserting on key namesFunction name and expectations:
- The function is named
test_issue_10567_metadata_preservationbut the comments, parser name (parser_10576), andTEST_LISTentry refer to issue10576. That mismatch is confusing when grepping for the issue ID.- The final assertions only check that no record was classified as “log only” and that counts match, but they don’t assert that the flushed records actually contain
time,stream,log, andfilekeys — which is what the bug was about.Suggestions:
- Rename the function to match the issue ID and the test list entry:
-static void test_issue_10567_metadata_preservation() +static void test_issue_10576_metadata_preservation()(and update the prototype in the
TEST_LISTentry accordingly).
- Optionally, make the test stronger by having
flush_callback_metadata_checkverify the presence of those four key names rather than relying solely on field count.
1935-1935: KeepTEST_LISTentry consistent with the test function nameOnce you rename the test function to use
10576, this entry should be updated to stay in sync with the symbol name:- { "issue_10576" , test_issue_10567_metadata_preservation }, + { "issue_10576" , test_issue_10576_metadata_preservation },This keeps the usual “issue NNNN ↔ test_issue_NNNN” convention used elsewhere in this file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/internal/multiline.c(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/internal/multiline.c (4)
src/multiline/flb_ml.c (4)
flb_ml_append_object(764-863)flb_ml_create(876-928)flb_ml_flush_pending_now(143-149)flb_ml_destroy(989-1014)src/multiline/flb_ml_parser.c (4)
flb_ml_parser_create(200-224)flb_ml_parser_init(131-141)flb_ml_parser_instance_create(261-312)flb_ml_parser_instance_set(315-340)src/multiline/flb_ml_rule.c (1)
flb_ml_rule_create(48-115)src/multiline/flb_ml_stream.c (1)
flb_ml_stream_create(223-276)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
Signed-off-by: Mirko Lazarevic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/internal/multiline.c (2)
1646-1720: Metadata tracking callback is sound; optional stronger assertionsThe
metadata_resultstruct andflush_callback_metadata_checkcorrectly validate non-zero timestamps, count fields, and track records with/without metadata, matching the test’s intent. If you want the regression to be stricter, you could additionally assert that expected keys like"time","stream","log", and"file"are present instead of relying solely onfield_count > 1, but that’s optional for this test.
1795-1909: Issue 10576 test wiring and assertions align well with the regression being guardedThe
test_issue_10576setup (customparser_10576, regex rules, multiline context, and stream usingflush_callback_metadata_check) accurately exercises the “flush pending lines preserve metadata” path and asserts that every flushed record carries more than justlog. The control flow, resource cleanup, and use ofmetadata_resultcounters all look correct. As a tiny nit, the parameter comments in theflb_ml_parser_createcall listkey_groupbeforekey_pattern, whereas other call sites comment them in signature order; aligning that would avoid confusion if these ever become non‑NULL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/internal/multiline.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (2)
tests/internal/multiline.c (2)
1722-1792: Helper for feeding pre‑packed records into multiline engine looks correct
append_log_to_multiline_processorcleanly constructs the[timestamp, map]msgpack record, includes all expected fields, unpacks safely with error handling, and enforcesflb_ml_append_objectreturningFLB_MULTILINE_OKbefore returning. Buffer lifetimes (msgpack_unpacked_destroy/msgpack_sbuffer_destroy) are handled correctly, so this helper is safe to use in the test.
1932-1933: New test is properly registered in TEST_LISTRegistering
"issue_10576"inTEST_LISTensures the new regression test is executed alongside existing multiline tests; no issues here.
cosmo0920
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this patch contains issues for truncation happened.
I commented for that case straightforward solution.
src/multiline/flb_ml.c
Outdated
| if (stream_group->mp_sbuf.size == 0) { | ||
| flb_ml_register_context(stream_group, tm, full_map); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most straightforward / proper way to fix this is:
if (ret == FLB_MULTILINE_TRUNCATED) {
truncated = FLB_TRUE;
}
if (!truncated && stream_group->mp_sbuf.size == 0) {
flb_ml_register_context(stream_group, tm, full_map);
}and in another line:
if (!truncated && processed && metadata != NULL) {
msgpack_pack_object(&stream_group->mp_md_pck, *metadata);
}Reasons:
-
When TRUNCATED is returned,
- the entire lifecycle has already been completed inside
rule_process → flush_stream_group, - so it is not appropriate for
package_content()to start a new group implicitly at that moment.
- the entire lifecycle has already been completed inside
-
When the next multiline start-state line arrives,
the context should be initialized byflb_ml_rule_process(), not bypackage_content().
Addresses PR comments and adds correspoinding unit tests Signed-off-by: Mirko Lazarevic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/internal/multiline.c (3)
1752-1754: Fix typo in comment.Minor documentation issue.
-/* - * Helper funciton to append log with custom stream/file metadata - */ +/* + * Helper function to append log with custom stream/file metadata + */
1855-1865: Parameter comments are swapped.Comparing with
test_parser_elastic(lines 1006-1008), the actual signature order iskey_content,key_pattern,key_group. The comments here incorrectly label them. Functionally harmless since both are NULL, but misleading for future maintainers.mlp = flb_ml_parser_create(config, "parser_10576", /* name */ FLB_ML_REGEX, /* type */ NULL, /* match_str */ FLB_FALSE, /* negate */ 1000, /* flush_ms */ "log", /* key_content */ - NULL, /* key_group */ - NULL, /* key_pattern */ + NULL, /* key_pattern */ + NULL, /* key_group */ NULL, /* parser */ NULL); /* parser_name */
1969-1981: Same parameter comment swap as above.mlp = flb_ml_parser_create(config, "truncation_parser_10576", /* name */ FLB_ML_REGEX, /* type */ NULL, /* match_str */ FLB_FALSE, /* negate */ 1000, /* flush_ms */ "log", /* key_content */ - NULL, /* key_group */ - NULL, /* key_pattern */ + NULL, /* key_pattern */ + NULL, /* key_group */ NULL, /* parser */ NULL); /* parser_name */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/multiline/flb_ml.c(2 hunks)tests/internal/multiline.c(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
src/multiline/flb_ml.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (9)
src/multiline/flb_ml.c (4)
268-276: Context registration correctly guarded against truncation.The
!truncatedcondition ensures that after aFLB_MULTILINE_TRUNCATEDreturn fromflb_ml_rule_process, no stale context is registered on an already-flushed group. This aligns with the fix discussed in the prior review thread.
291-293: Context registration for ENDSWITH looks correct.This branch doesn't invoke
flb_ml_rule_process, so truncation cannot occur here. Registering context when the buffer is empty is appropriate for capturing first-line metadata.
325-327: Context registration for EQ looks correct.Same rationale as ENDSWITH—no truncation path exists in this branch, so registering context when the buffer is empty is safe.
349-351: Metadata packing correctly suppressed after truncation.Adding
!truncatedprevents packing stale metadata into a group that was already flushed during truncation, avoiding misattributed records.tests/internal/multiline.c (5)
1652-1663: Test helper structure looks appropriate for the test scope.The fixed-size arrays (10 elements) are sufficient for the test cases which use at most 6 lines.
1671-1750: Metadata verification callback is well-structured.The callback properly unpacks records, checks for expected metadata fields, and tracks results for assertion.
1826-1923: Comprehensive test for metadata preservation.The test correctly validates that all flushed records retain their original metadata fields (stream, file) even after line-by-line flushing. The assertions verify zero records with missing metadata.
1941-2065: Truncation test validates correct per-record metadata isolation.This test confirms that after truncation, a new start-state line correctly receives its own metadata rather than inheriting stale context from the truncated group. The explicit metadata value comparisons (stdout/stderr, app1.log/app2.log) are excellent for catching regressions.
2087-2088: Test list entries added correctly.
This fix ensures that when the buffer is
flushed, the record will have proper timestamp
and metadata instead of just the "log" field.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.