Skip to content

Conversation

@ANtutov
Copy link

@ANtutov ANtutov commented Dec 8, 2025

Make custody backfill restart semantics match BackFillSync. Instead of immediately restarting after a failure, track failed attempts with a new failed_sync flag and only allow restart once a new fully-synced peer joins and sets restart_failed_sync. This prevents tight restart loops on the same bad peers and uses fully_synced_peer_joined from SyncManager to drive intentional retries.

@ANtutov ANtutov requested a review from jxs as a code owner December 8, 2025 11:33
@cla-assistant
Copy link

cla-assistant bot commented Dec 8, 2025

CLA assistant check
All committers have signed the CLA.

@cla-assistant
Copy link

cla-assistant bot commented Dec 8, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines 820 to 823
// Check if we need to restart custody backfill sync due to a cgc change.
if self.restart_if_required() {
return Ok(ProcessResult::Successful);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reasoning for deleting this?

@eserilev
Copy link
Member

eserilev commented Dec 9, 2025

we probably want to keep restart_failed_sync and actually use it correctly, instead of deleting it

@eserilev eserilev added waiting-on-author The reviewer has suggested changes and awaits thier implementation. syncing labels Dec 9, 2025
// inform the backfill sync that a new synced peer has joined us.
if new_state.is_synced() {
self.backfill_sync.fully_synced_peer_joined();
self.custody_backfill_sync.fully_synced_peer_joined();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we want to remove this, i think we want to actually get restart_failed_sync to work appropriately

@ANtutov ANtutov force-pushed the fix/custody-backfill-dead-restart-flag branch from bf01ec8 to a84a1cb Compare December 9, 2025 22:10
@ANtutov
Copy link
Author

ANtutov commented Dec 9, 2025

Also squashed tp reduce noise in commits

@eserilev
Copy link
Member

please read my comment above. We probably want to keep restard_failed_sync and fix its usage instead of deleting it

@ANtutov
Copy link
Author

ANtutov commented Dec 10, 2025

please read my comment above. We probably want to keep restard_failed_sync and fix its usage instead of deleting it

Done. I kept restart_failed_sync and wired it up like in BackFillSync, added a failed_sync flag so custody backfill only restarts after a failure once a new fully-synced peer joins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

syncing waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants