feat: Remove total timeout from signature and add test#566
feat: Remove total timeout from signature and add test#566ChaoticTempest wants to merge 8 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR removes the total timeout mechanism from signature request processing and implements an exponential backoff retry strategy. Instead of failing signature requests after a total timeout period, each SignRequest now tracks its retry attempts and applies increasing delays between retries.
Key Changes:
- Replaced total timeout logic with per-request retry attempt tracking
- Added exponential backoff mechanism with configurable base and max delays (1s to 60s by default)
- New integration tests verify retry behavior with message dropping
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
integration-tests/tests/cases/mpc.rs |
Adds two comprehensive tests for signature retry behavior and an unused helper function for proposer selection |
integration-tests/src/mpc_fixture/mod.rs |
Exports the new SignatureDropper test utility |
integration-tests/src/mpc_fixture/message_filters.rs |
Introduces SignatureDropper helper for selectively dropping signature messages in tests |
integration-tests/src/mpc_fixture/fixture_tasks.rs |
Updates message filtering to support multiple filters and improves logging |
integration-tests/src/mpc_fixture/fixture_interface.rs |
Adds task tracking and shutdown capability to fixture nodes |
integration-tests/src/mpc_fixture/builder.rs |
Changes filter storage from single to vector and tracks spawned task handles |
chain-signatures/node/src/util.rs |
Removes the now-unused is_elapsed_longer_than_timeout function |
chain-signatures/node/src/protocol/signature.rs |
Core implementation of retry attempt tracking, exponential backoff logic, and removal of total timeout checks |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
jakmeier
left a comment
There was a problem hiding this comment.
Great stuff!
I see you removed the exponential backoff. Makes sense, I don't think we need, as discussed on Monday. What I thought of later is that we could also make the timeout itself an exponential backoff. That is, wait longer and longer between retries, without actually removing it. Probably not needed either, but I thought I mention it in case we come around and add a backoff mechanism later.
Also, nice use of the component testing setup! Let me know if you have any thoughts after using it. I have some ideas what I want to improve but I'd really like your input before I put too much else into it.
| let mut should_filter = true; | ||
| for filter in filters.iter_mut() { | ||
| if !filter(&send_message) { | ||
| tracing::info!(?from, ?to, log_msg, "Dropping a message because it didn't pass the test's filter"); | ||
| should_filter = false; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if !should_filter { |
There was a problem hiding this comment.
Code clarity:
- Isn't the naming of
should_filterthe wrong way around? (In my mind: To filter = not sending the message. The code only sends the message ifshould_filter = true) - Up to you which way you like it better but you could avoid the entire
should_filterboolean by using a loop label and a directcontinue 'outerin place of the break.
There was a problem hiding this comment.
yep, corrected it
There was a problem hiding this comment.
Ah, yes the code here locally now makes more sense. But you also inverted the meaning of the filter in general, which wasn't what I meant. I'd really prefer staying with the semantics of the standard library filter.
Can we keep it as filter returning true means it passes the filter? The local variable should_filter here can stay as it is, just needs inverted assignment. Or maybe should_drop or should_send would be clearer naming.
| let log = network.output.msg_log.lock().await; | ||
| log.iter() | ||
| .filter(|entry| entry.starts_with("Signature")) | ||
| .count() |
There was a problem hiding this comment.
side note: Should we change how logs are represented? I just wanted some dumb way of keeping a log of what happened, without coupling the testing closely with internal implementation types. Curious to hear what you think after you have written your first tests with this approach.
There was a problem hiding this comment.
yeah I think we need more robust typing or better convenience helpers for this in the long term. It's going to be hard to follow what this is doing
There was a problem hiding this comment.
Added a bullet point to possibly improve this quarter in #583.
|
|
||
| let request = self.organize_request(stable, participants, indexed, 0); | ||
| let mut request = self.organize_request(stable, participants, indexed, 0); | ||
| if request.indexed.timestamp_sign_queue.is_none() { |
There was a problem hiding this comment.
Why is this field optional in the first place? Can we make it non optional and ensure timestamp is added at the indexer level?
Removes total timeout from signature requests, where we have a exponential backoff for old requests that couldn't be fulfilled.
Each
SignRequestinside theSignQueuenow tracks attempts such that we can increase the delay on the next time they need to be processed.