Skip to content

in_tail: add Windows UTF-8 path encoding mode#12020

Open
cosmo0920 wants to merge 7 commits into
masterfrom
cosmo0920-cover-multibyte-encoded-file-tailing
Open

in_tail: add Windows UTF-8 path encoding mode#12020
cosmo0920 wants to merge 7 commits into
masterfrom
cosmo0920-cover-multibyte-encoded-file-tailing

Conversation

@cosmo0920

@cosmo0920 cosmo0920 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary
This adds an opt-in Windows path encoding mode for the tail input:

windows.path_encoding: utf-8

By default, Windows keeps the existing ANSI-code-page behavior (ansi) for compatibility with CP932 and other legacy Windows locales. When utf-8 is selected, in_tail treats configured paths as UTF-8, converts them to UTF-16, and uses Windows wide-character file APIs for discovery, open/stat, exclude matching, and final path resolution.

Implementation

  • Added windows.path_encoding with default ansi.
  • Added Windows UTF-8/UTF-16 conversion helpers.
  • Added UTF-8-aware Windows file helpers for open/stat/lstat.
  • Routed UTF-8 mode through:
    • FindFirstFileW / FindNextFileW
    • PathMatchSpecW
    • CreateFileW
    • GetFinalPathNameByHandleW
  • Kept Fluent Bit internal path strings as UTF-8 so path_key and DB filename comparison remain consistent.

Tests

  • Added Windows integration coverage for a Unicode directory/file path using windows.path_encoding: utf-8.
  • Marked Windows-incompatible in_tail integration scenarios as skipped where they depend on POSIX rotation, symlink, permission, or inode behavior.

Verification

  • cmake --build build --target flb-plugin-in_tail: passed
  • cmake --build build --target fluent-bit-bin: passed
  • tests\integration\.venv\Scripts\python.exe -m pytest tests\integration\scenarios\in_tail\tests\test_in_tail_001.py -q: 15 passed, 15 skipped
  • Valgrind strict run was attempted but blocked on Windows because valgrind is not available.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Added Windows path encoding support for the tail input via windows.path_encoding (default ansi, with utf-8/utf8 support).
    • Enhanced Windows file discovery, exclusion matching, and globbing to correctly work with UTF-8 paths.
  • Bug Fixes

    • Improved Windows file open/stat/lstat, rotation detection, and final path resolution for non-ASCII filenames.
    • Updated Windows-tail tests to cover Unicode filenames and validate recorded absolute paths; POSIX-dependent scenarios are skipped on Windows.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@cosmo0920 cosmo0920 requested a review from edsiper as a code owner June 30, 2026 09:18
@cosmo0920 cosmo0920 changed the title in_tail: add Windows UTF-8 path encoding mode` in_tail: add Windows UTF-8 path encoding mode Jun 30, 2026
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae6748cf-481b-4d85-9e9c-dce252fe9fa5

📥 Commits

Reviewing files that changed from the base of the PR and between 94eb294 and 0a50efa.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_scan_win32.c
  • plugins/in_tail/win32/interface.h
  • plugins/in_tail/win32/io.c
  • plugins/in_tail/win32/stat.c
  • tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugins/in_tail/win32/io.c
  • plugins/in_tail/win32/interface.h
  • tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
  • plugins/in_tail/win32/stat.c
  • plugins/in_tail/tail_scan_win32.c
  • plugins/in_tail/tail_file.c

📝 Walkthrough

Walkthrough

Adds a windows.path_encoding option to Tail for Windows, with UTF-8-aware Win32 path conversion and file operations. Tail file scanning, rotation, and fs-stat now dispatch through encoding-aware helpers, and integration tests cover Unicode filenames while skipping POSIX-only scenarios on Windows.

Changes

Windows UTF-8 Path Encoding for in_tail

Layer / File(s) Summary
Config constants, struct field, and plugin option
plugins/in_tail/tail_config.h, plugins/in_tail/tail_config.c, plugins/in_tail/tail.c
Defines Windows path encoding constants, adds windows_path_encoding to struct flb_tail_config, and registers windows.path_encoding with default ansi, parsing, and error cleanup.
Win32 UTF-8 conversion and stat/open/path helpers
plugins/in_tail/win32/interface.h, plugins/in_tail/win32/path.c, plugins/in_tail/win32/io.c, plugins/in_tail/win32/stat.c, plugins/in_tail/CMakeLists.txt
Adds UTF-8/wide conversion helpers, UTF-8 full-path resolution, UTF-8 open, and UTF-8 stat/lstat/symlink helpers, with interface declarations and MSVC build wiring.
Dispatch wrappers and file lifecycle in tail_file.c
plugins/in_tail/tail_file.c
Adds encoding-aware file wrappers and updates append, rotation, and resolved-path naming logic to use the Windows UTF-8 path flow when configured.
fs_stat and Win32 scan UTF-8 integration
plugins/in_tail/tail_fs_stat.c, plugins/in_tail/tail_scan_win32.c
Adds a path-aware stat helper for backend fs_stat data and rewrites Windows scanning, exclusion matching, file registration, and globbing to follow the configured path encoding.
Integration tests: UTF-8 coverage and POSIX-only guards
tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
Adds a Windows UTF-8 discovery test and skips POSIX-semantics in-tail scenarios on Windows.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • fujimotos
  • koleini
  • edsiper

Poem

🐇 I hop through paths both wide and bright,
UTF-8 trails in Windows light.
A little wchar_t, a careful glide,
And Unicode logs come scampering ভিতরে? no — inside!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding an opt-in Windows UTF-8 path encoding mode for in_tail.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-cover-multibyte-encoded-file-tailing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94eb294748

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/in_tail/tail_scan_win32.c Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 2369-2371: The wide-path prefix stripping in tail_file.c is
copying too many wchar_t elements from wide_buf after detecting the "\\\\?\\"
prefix, which can read past the allocated PATH_MAX buffer. Update the memmove in
the wide-path handling block to copy only the remaining string including the
null terminator (from wide_buf + 4 into wide_buf), using the correct element
count derived from len so the source range stays within bounds.

In `@plugins/in_tail/win32/io.c`:
- Around line 56-79: win32_open_utf8 currently returns -1 without setting errno
on failure, so it should mirror the error handling used by win32_stat_utf8 and
win32_lstat_utf8. Update win32_open_utf8 to translate a NULL result from
win32_utf8_to_wide into EINVAL, and to call propagate_last_error_to_errno() when
CreateFileW returns INVALID_HANDLE_VALUE before returning -1. Keep the
successful _open_osfhandle path unchanged.

In `@tests/integration/scenarios/in_tail/tests/test_in_tail_001.py`:
- Line 955: Remove the `@skip_on_windows` decorator from the two affected in_tail
tests so they run on Windows as well. Specifically, update
test_in_tail_db_schema_upgrade_is_automatic and
test_in_tail_ignore_older_skips_stale_files in test_in_tail_001.py to no longer
be skipped, since their bodies exercise the stat/discovery path and do not rely
on Windows-incompatible rename, symlink, or permission behavior.
- Around line 354-360: The UTF-8 Windows path test in test_in_tail_001 still
uses characters that CP932 can encode, so it does not uniquely exercise the utf8
path handling. Update the setup in this scenario to use a log directory and/or
filename with characters outside the legacy ANSI/CP932 repertoire, or add a
separate negative case that confirms discovery fails under ansi; keep the
changes localized to the existing path-encoding test setup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91efb3a8-2b8e-48a9-a36b-9d1cc9acef85

📥 Commits

Reviewing files that changed from the base of the PR and between b81c252 and 94eb294.

📒 Files selected for processing (12)
  • plugins/in_tail/CMakeLists.txt
  • plugins/in_tail/tail.c
  • plugins/in_tail/tail_config.c
  • plugins/in_tail/tail_config.h
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_scan_win32.c
  • plugins/in_tail/win32/interface.h
  • plugins/in_tail/win32/io.c
  • plugins/in_tail/win32/path.c
  • plugins/in_tail/win32/stat.c
  • tests/integration/scenarios/in_tail/tests/test_in_tail_001.py

Comment thread plugins/in_tail/tail_file.c Outdated
Comment thread plugins/in_tail/win32/io.c
Comment thread tests/integration/scenarios/in_tail/tests/test_in_tail_001.py Outdated
Comment thread tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
… path

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant