out_s3: preserve $INDEX sequence state across restarts#11988
Conversation
The S3 output stores the $INDEX sequence counter in a plain file under the metadata stream directory (sequence/index_metadata/seq_index_<id>). This file is not a Chunk I/O chunk, so the metadata stream has zero registered fstore files. On (graceful) shutdown, flb_fstore_destroy() recursively deletes any stream that has no files, which removes the metadata stream and the persisted $INDEX value. On the next start the index is missing and resets to 0, so subsequent uploads reuse object keys (events_0, events_1, ...) and overwrite previously uploaded objects. This contradicts the documented behavior that $INDEX is saved in store_dir and continues from its last value after a restart on the same disk. Detach the metadata stream reference in s3_store_exit() before the store is destroyed (without deleting the directory), so the persisted index survives a restart and can be recovered by init_seq_index(). Signed-off-by: Damon P. Cortesi <d.lifehacker@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn Changes$INDEX metadata stream preservation on shutdown
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR follow-up: the exit-only version of this fix is not sufficient. End-to-end testing showed that the I am updating the proposed approach to make the This should cover both classes of deletion: graceful-exit fstore destruction and any startup/load cleanup path that prunes empty streams. |
There was a problem hiding this comment.
Looks good to me but I have several points of concerns for long term fixes:
The patch is a valid targeted mitigation for the reported shutdown-loss path. Today, $INDEX metadata is persisted as a plain file under the S3 sequence stream, but that file is invisible to fstore’s chunk accounting. On graceful shutdown, fstore treats the sequence stream as empty and deletes it, which removes the saved index and causes restart reuse of keys. Detaching ctx->stream_metadata before flb_fstore_destroy() directly prevents that deletion while preserving the existing recovery path in init_seq_index().
The important review caveat is scope. This fixes the immediate single-instance $INDEX case and stops the most dangerous behavior: losing metadata during normal shutdown and overwriting objects after restart. It is not a complete redesign of $INDEX persistence. In particular, it does not make index writes crash-atomic, and by itself it may not protect shared store_dir/<bucket> setups where another S3 output that does not use $INDEX can still see and delete the sequence stream during its own shutdown.
So I would not block the patch as invalid. I would review it as a bleeding-stop fix that matches the documented behavior better than current code. But I would ask for either:
- a follow-up patch that creates/protects the
sequencestream for every S3 instance using the same fstore root, or - a clearly documented limitation that this patch only protects the S3 instance that owns
ctx->stream_metadata.
But it's an optional fix on this PR. Let's document as TODOs.
A stronger long-term fix would persist $INDEX as first-class fstore-visible metadata, or at least make the raw index file update atomic with temp-file plus rename. But that is a larger correctness improvement, not required to justify the current shutdown metadata-preservation patch.
Summary
Fixes #11987.
The S3 output persists the
$INDEXsequence counter as a plain file under the metadata stream directory (<store_dir>/<bucket>/sequence/index_metadata/seq_index_<id>). Because this file is not a Chunk I/O chunk, thesequencefstore stream has zero registered files, andflb_fstore_destroy()deletes any stream with zero files on shutdown — recursively removing the directory and the persisted$INDEX. After a graceful restart the index resets to0and uploads overwrite previously-written objects, contradicting the documented behavior that$INDEXcontinues from its last value whenstore_diris on a persistent disk.This detaches the metadata stream reference in
s3_store_exit()(without deleting the directory) before the store is destroyed, so the index file survives on disk andinit_seq_index()recovers it on the next start.Root cause
$INDEXis written as a raw file viainit_seq_index()/write_seq_index()inplugins/out_s3/s3.c, inside thesequencefstore stream, under a nestedindex_metadata/subdirectory.cio_scan_stream_files()(lib/chunkio/src/cio_scan.c) only registersDT_REGentries directly under a stream dir, so the nestedindex_metadata/is ignored and thesequencestream has 0 files.flb_fstore_destroy()(src/flb_fstore.c) deletes any stream withfiles == 0(flb_fstore_stream_destroy(stream, FLB_TRUE)→cio_stream_delete()→ recursive directory delete).s3_store_exit()(plugins/out_s3/s3_store.c) callsflb_fstore_destroy()on shutdown, so the metadata stream and the persisted$INDEXare removed on every graceful exit.Fix
flb_fstore_stream_destroy(stream, FLB_FALSE)removes the fstore reference fromfs->streamswithout deleting the on-disk directory; the underlyingcio_streamis still freed by the subsequentcio_destroy(). Only thesequence(metadata) stream is affected —multipart_upload_metadatauses real Chunk I/O chunks, so it is unaffected by the empty-stream cleanup.Testing
store_diron a persistent volume, a marker file placed insidesequence/and theseq_index_<id>file are both gone after a SIGTERM restart (thesequence/dir returns with a fresh inode), while a marker placed one level above on the same volume survives — confirming the volume is healthy and Fluent Bit is deleting its own metadata stream. NoSuccessfully recovered indexlog appears, and$INDEXresets to 0.init_seq_index()takes theread_seq_index()recovery path.ctx->stream_metadata(plugins/out_s3/s3.h),flb_fstore_stream_destroy(struct flb_fstore_stream *, int delete)(include/fluent-bit/flb_fstore.h).Notes
4.2.x,5.0.x, andmaster(the$INDEX+ metadata-stream design is unchanged across these).init_seq_index()state file.Summary by CodeRabbit