Trp debug by scarmuega · Pull Request #901 · txpipe/dolos

scarmuega · 2026-02-15T15:06:00Z

Summary by CodeRabbit

Release Notes

New Features
- Added configurable mempool backend (in-memory or persistent storage)
- Introduced transaction status tracking with lifecycle stages
- Added RPC methods: trp.checkStatus, trp.dumpLogs, trp.peekPending, trp.peekInflight for mempool inspection
- Implemented finalized transaction log with pagination
Improvements
- Enhanced transaction state management and confirmation tracking

coderabbitai · 2026-02-15T15:06:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
✅ Review completed - (🔄 Check again to review again)

📝 Walkthrough

Walkthrough

Adds a trait-based mempool with in-memory (EphemeralMempool) and Redb-backed implementations, wires mempool config and store into adapters and startup, updates callers to the new backend API, and exposes TRP RPC endpoints to inspect and page mempool state and finalized logs.

Changes

Cohort / File(s)	Summary
Workspace manifests `crates/core/Cargo.toml`, `crates/redb3/Cargo.toml`, `crates/testing/Cargo.toml`	Added workspace dependencies and testing dependency entries; manifest-only updates.
Core API & types `crates/core/src/mempool.rs`, `crates/core/src/lib.rs`, `crates/core/src/config.rs`	Redesigned MempoolTx (stages, confirmations, timestamps), added TxStatus/MempoolPage/MempoolError, introduced MempoolStore trait, added mempool store config and mempool_path, removed older Mempool/MempoolState trait usages.
In-memory backend `crates/core/src/builtin/mempool.rs`, `crates/core/src/builtin/mod.rs`	New EphemeralMempool and EphemeralMempoolStream implementing MempoolStore with Arc state, broadcast events, finalized log, pagination, and full lifecycle transitions.
Redb persistent backend `crates/redb3/src/mempool.rs`, `crates/redb3/src/lib.rs`	New RedbMempool and RedbMempoolStream implementing MempoolStore with persistent tables (pending/inflight/finalized_log), transactional moves, serialization, pagination, stream subscribe, and tests.
Storage adapter & wiring `src/adapters/storage.rs`, `src/adapters/mod.rs`	Added MempoolBackend/MempoolStreamBackend enums, forwarding MempoolStore impls, open_mempool_store startup wiring; Stores now include mempool and DomainAdapter switched to MempoolBackend.
Consumers & runtime wiring `src/sync/submit.rs`, `src/sync/emulator.rs`, `src/bin/dolos/common.rs`, `src/serve/grpc/submit.rs`	Switched consumers to MempoolBackend, replaced request/ack flows with peek_pending/mark_inflight/mark_acknowledged, added propagated_hashes tracking, and updated watcher/waiter stage sourcing to use tx.stage/check_status().
TRP RPC surface `crates/trp/src/lib.rs`, `crates/trp/src/methods.rs`, `crates/trp/src/error.rs`	Added RPC endpoints trp.checkStatus, trp.dumpLogs, trp.peekPending, trp.peekInflight with request/response types and metrics; added InvalidParams error mapping.
Testing & mocks `crates/testing/src/mempool.rs`, `crates/testing/src/harness/cardano.rs`, `crates/testing/src/toy_domain.rs`, `crates/testing/Cargo.toml`	Expanded mock/harness to implement the MempoolStore trait as inert/no-op where appropriate; removed older apply/check_stage/pending APIs; adjusted harness drain/run logic; added test deps.
Error surface & small fixes `src/prelude.rs`, `crates/minibf/src/routes/tx/submit/mod.rs`, `tests/...`, `tests/epoch_pots/*`	Added MempoolError to root Error, map DuplicateTx -> 409 CONFLICT, small test fixes and CSV predicate call fix, minor annotations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TRP as TRP RPC
    participant Domain
    participant Mempool as MempoolStore
    participant Storage as Backend

    Client->>TRP: trp.submit(tx)
    TRP->>Domain: submit(tx)
    Domain->>Mempool: receive(tx)
    Mempool->>Storage: persist Pending
    Mempool-->>Domain: emit MempoolEvent
    Domain-->>Client: ack

    Client->>TRP: trp.peekPending()
    TRP->>Mempool: peek_pending(limit)
    Mempool->>Storage: read Pending
    Storage-->>Mempool: pending list
    Mempool-->>TRP: PeekPendingResponse

    Note over Mempool,Storage: lifecycle: Pending → Propagated → Acknowledged → Confirmed → Finalized

    Domain->>Mempool: mark_inflight(hashes)
    Mempool->>Storage: move Pending→Inflight (Propagated)
    Mempool-->>Domain: MempoolEvent

    Domain->>Mempool: confirm(point, seen, unseen)
    Mempool->>Storage: update confirmations / rollback unseen
    Mempool-->>Domain: MempoolEvent

    Client->>TRP: trp.dumpLogs(cursor,limit)
    TRP->>Mempool: dump_finalized(cursor,limit)
    Mempool->>Storage: paginate Finalized
    Storage-->>Mempool: entries
    Mempool-->>TRP: DumpLogsResponse

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through crates with a curious squeak,
New stages and stores make the mempool speak,
Ephemeral hops, Redb burrows deep,
Streams sing events while RPCs peep,
A carrot for logs — may commits be quick! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Trp debug' is vague and does not clearly convey the main purpose of this substantial PR, which implements a complete mempool storage abstraction with ephemeral and Redb backends, new TRP RPC endpoints, and significant refactoring across the codebase.	Consider a more descriptive title such as 'Add mempool storage backends and TRP debug endpoints' or 'Implement MempoolStore trait with ephemeral and persistent backends' to better summarize the primary changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch trp-debug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/sync/submit.rs (1)

46-67: ⚠️ Potential issue | 🟠 Major

Transactions marked inflight before peer confirmation — risk of orphaned txs on send failure.

mark_inflight (line 54) moves transactions out of the pending queue before reply_tx_ids (line 59–64) succeeds. If the peer send fails and the worker restarts (via or_restart), the new Worker begins with an empty propagated_hashes (line 144), while the txs remain in the inflight/Propagated state in the mempool. Since they're no longer pending, they won't be re-propagated. They can only recover if chain-sync observes them on-chain or rolls them back; otherwise they remain stuck indefinitely.

Consider either:

Moving mark_inflight to after the successful peer reply, or

Adding a recovery mechanism that re-queues orphaned inflight txs back to pending on worker restart.

crates/core/src/mempool.rs (1)

257-301: ⚠️ Potential issue | 🟠 Major

exclude_inflight_stxis misleadingly scans only pending transactions, not inflight — inflight UTxOs are invisible during validation.

All three helper functions (scan_mempool_utxos, exclude_inflight_stxis, select_mempool_utxos) call mempool.peek_pending(usize::MAX) only. When transactions move to inflight via mark_inflight, they are removed from the pending queue and added to the inflight table. This creates two issues:

Double-spend risk: An incoming transaction can spend inputs already locked by inflight txs, since exclude_inflight_stxis doesn't check inflight state.

Broken chaining: An incoming transaction cannot chain from outputs produced by inflight txs.

The function name exclude_inflight_stxis is particularly misleading — it actually scans and excludes pending inputs, not inflight ones. The debug message at line 284 ("checking inflight tx") contradicts the implementation.

These functions should also scan peek_inflight when gathering UTxOs for validation, or the function name should be clarified to reflect that only pending transactions are considered.

🤖 Fix all issues with AI agents

In `@crates/core/src/builtin/mempool.rs`:
- Around line 91-103: The receive method on EphemeralMempool adds incoming
MempoolTxs unconditionally, causing duplicates; update EphemeralMempool::receive
to first read or write-lock self.state and check whether tx.hash already exists
in any of state.pending, state.proposed, or state.committed (or whichever state
collections track existing txs) and return Ok(()) early if found; only push to
state.pending, call self.notify(tx) and self.log_state(&state) when the tx is
new. Use the existing state variable, the pending field, and the receive method
to locate where to add this guard.
- Around line 20-26: MempoolState.finalized_log currently grows unbounded;
change the data structure or add pruning so old entries are dropped—e.g.,
replace finalized_log: Vec<MempoolTx> or keep it but enforce a max size constant
(MAX_FINALIZED_LOG_LEN) and trim older entries when appending in finalize()
and/or during housekeeping(): after pushing new MempoolTx, if
finalized_log.len() > MAX_FINALIZED_LOG_LEN remove oldest entries (or switch to
VecDeque and pop_front) to ensure the finalized_log size remains bounded and
memory usage is capped.
- Around line 247-275: check_status currently only looks in acknowledged,
inflight, and pending so finalized transactions moved by finalize() into
finalized_log return Unknown; update check_status to additionally check
state.finalized_log (e.g., state.finalized_log.get(tx_hash) or iter().find(...)
depending on its structure) before returning Unknown and, if found, return a
TxStatus with stage: MempoolTxStage::Finalized and the appropriate confirmations
and confirmed_at taken from the finalized_log entry; keep the existing
inflight/pending/acknowledged checks and use finalized_log as the final
fallback.

In `@crates/redb3/src/mempool.rs`:
- Around line 120-133: The current into_mempool_tx implementation uses
copy_from_slice and try_into().unwrap(), which can panic on malformed lengths;
change these to safe, non-panicking conversions: for the hash field, replace
hash_bytes.copy_from_slice(&self.hash) with a length check / try_into pattern
(e.g., if let Ok(arr) = self.hash.as_slice().try_into() { TxHash::from(arr) }
else { /* fallback: default hash or return Err */ }) and for confirmed_at (and
the other sites that call ChainPoint::from_bytes) replace
ChainPoint::from_bytes(b[..].try_into().unwrap()) with a safe match on
b[..].try_into() and only call ChainPoint::from_bytes when Ok(arr), otherwise
map to None or propagate an error; update the function signature to return
Result<MempoolTx, Error> if you prefer failing fast instead of using defaults,
and apply the same safe pattern to the other occurrences of
ChainPoint::from_bytes in the codebase.
- Around line 650-692: The implementation of check_status currently only queries
INFLIGHT_TABLE and PENDING_TABLE and returns MempoolTxStage::Unknown for
finalized transactions; update the check_status method to also open and query
FINALIZED_LOG_TABLE using the existing read transaction rx, lookup the tx_hash
key, deserialize the finalized entry (e.g., via the same pattern used for
InflightRecord or the appropriate FinalizedRecord type) and return its TxStatus
(mapping to MempoolTxStage::Finalized) when found; use the same error-tolerant
open_table/get logic as for INFLIGHT_TABLE and PENDING_TABLE so check_status
returns the finalized status instead of Unknown.

In `@crates/trp/src/methods.rs`:
- Around line 264-270: The code uses limit + 1 when calling mempool.peek_pending
(and similarly in trp_peek_inflight), which can overflow when params.limit is
usize::MAX; change the arithmetic to safe operations (e.g., compute let
peek_count = limit.saturating_add(1) or cap limit to a reasonable max before
adding) and pass peek_count to mempool.peek_pending; apply the same fix in
trp_peek_inflight to avoid panic/wraparound and preserve the has_more check
using peek_count instead of limit + 1.

🧹 Nitpick comments (10)

src/sync/emulator.rs (1)

52-77: Back-to-back mark_inflight → mark_acknowledged with no error handling.

In the emulator path, peek_pending → mark_inflight → mark_acknowledged is called sequentially without checking return values or propagating errors. This works for the emulator since there's no real network propagation, but the silent ignoring of potential failures could mask issues during development/debugging.

Consider at minimum logging if the mark operations affect fewer transactions than expected.

crates/trp/src/lib.rs (1)

83-145: New RPC registrations follow the established pattern consistently.

The four new endpoints (trp.checkStatus, trp.dumpLogs, trp.peekPending, trp.peekInflight) all mirror the existing trp.resolve/trp.submit registration pattern including metrics tracking and error handling.

The repetitive boilerplate across all six registrations could be extracted into a helper macro or closure, but this is optional given the existing codebase style.

crates/core/src/lib.rs (1)

617-619: Hardcoded finalization threshold — consider making it configurable.

MEMPOOL_FINALIZATION_THRESHOLD is hardcoded to 10 confirmations. The other housekeeping parameters (max_history for WAL and archive) are config-driven. Consider adding this to MempoolStoreConfig for operational flexibility.

src/sync/submit.rs (1)

80-106: Potential hot-loop with 10-second sleep as fallback.

The schedule_unfulfilled method sleeps 10 seconds when no pending txs are available (line 98). The TODO on line 96-97 already notes the need to watch the mempool for changes. With the new subscribe() method on MempoolStore, this could be improved by awaiting the mempool event stream instead of polling.

Would you like me to draft an implementation that uses the mempool's subscribe() stream to wake up when new txs arrive?
crates/core/src/builtin/mempool.rs (2)
219-245: Redundant cloning in finalize — both finalized and event_tx are identical.

After acknowledged.remove(&hash), the owned tx is cloned twice to produce finalized and event_tx, both with stage set to Finalized. One clone can be eliminated:
Proposed simplification
         for hash in to_finalize {
             if let Some(tx) = state.acknowledged.remove(&hash) {
-                let mut finalized = tx.clone();
+                let mut finalized = tx;
                 finalized.stage = MempoolTxStage::Finalized;
-                state.finalized_log.push(finalized);
-                let mut event_tx = tx.clone();
-                event_tx.stage = MempoolTxStage::Finalized;
-                info!(tx.hash = %tx.hash, "tx finalized");
-                self.notify(event_tx);
+                info!(tx.hash = %finalized.hash, "tx finalized");
+                self.notify(finalized.clone());
+                state.finalized_log.push(finalized);
             }
         }
42-47: Broadcast channel capacity of 16 may be tight under load.

The broadcast channel is created with a capacity of 16 (line 44). During bursts (e.g., a block confirmation touching many acknowledged txs), multiple events are emitted in a single confirm() or finalize() call. Slow subscribers will receive Lagged errors, mapped to MempoolError::Internal. Consider making this configurable or using a larger default.
crates/trp/src/methods.rs (2)
139-176: No upper bound on the number of hashes in trp_check_status.

A client can submit an arbitrarily large hashes array, each triggering a check_status call (which in the Redb backend performs a read transaction with table scans). Consider adding a reasonable cap (e.g., 100) to prevent abuse.
Proposed cap on input size
     let params: CheckStatusParams = params.parse()?;
 
+    if params.hashes.len() > 100 {
+        return Err(Error::InvalidParams("too many hashes (max 100)".into()));
+    }
+
     let mempool = context.domain.mempool();
202-235: trp_dump_logs has no upper bound on limit.

A client could request limit: 999999999, causing a large allocation and heavy I/O in dump_finalized. Consider capping similarly to the peek endpoints.
Proposed cap
     let cursor = params.cursor.unwrap_or(0);
-    let limit = params.limit.unwrap_or(50);
+    let limit = params.limit.unwrap_or(50).min(1000);
     let include_payload = params.include_payload.unwrap_or(false);
crates/redb3/src/mempool.rs (2)

112-151: unwrap() on CBOR encode/decode may panic on corrupted persistent data.

serialize() and deserialize() in both FinalizedLogEntry and InflightRecord use unwrap(). For an in-memory-only store this is fine, but since RedbMempool is a persistent store, corrupted data on disk would cause a panic instead of a graceful error.

The same pattern appears in peek_pending (line 461) and mark_inflight (line 499).

Consider returning Result from deserialize and propagating errors, or at minimum logging and skipping corrupted entries.

368-399: with_write_tx silently swallows write errors.

When the closure or commit fails, the error is logged at warn level and the function returns without notifying the caller. Since the lifecycle methods (mark_inflight, mark_acknowledged, confirm, finalize) return (), there's no way to propagate the failure. This means state transitions can silently fail — e.g., a mark_inflight could fail to persist, leaving txs stuck in pending without any external indication.

This is a deliberate trade-off given the () return type, but worth documenting or revisiting if reliability is a concern.

crates/core/src/builtin/mempool.rs

coderabbitai · 2026-02-15T15:24:38Z

crates/redb3/src/mempool.rs

+    fn into_mempool_tx(self) -> MempoolTx {
+        let mut hash_bytes = [0u8; 32];
+        hash_bytes.copy_from_slice(&self.hash);
+        MempoolTx {
+            hash: TxHash::from(hash_bytes),
+            payload: self.payload.unwrap_or(EraCbor(0, vec![])),
+            stage: MempoolTxStage::Finalized,
+            confirmations: self.confirmations,
+            confirmed_at: self.confirmed_at.map(|b| {
+                ChainPoint::from_bytes(b[..].try_into().unwrap())
+            }),
+            report: None,
+        }
+    }


⚠️ Potential issue | 🟡 Minor

try_into().unwrap() on confirmed_at bytes can panic if length is unexpected.

Lines 129, 185, and 211 all do ChainPoint::from_bytes(b[..].try_into().unwrap()). If a stored confirmed_at blob has an unexpected length (e.g., due to corruption or a schema change), this panics. Similarly, line 122 does hash_bytes.copy_from_slice(&self.hash) which panics if self.hash.len() != 32.

🤖 Prompt for AI Agents

In `@crates/redb3/src/mempool.rs` around lines 120 - 133, The current into_mempool_tx implementation uses copy_from_slice and try_into().unwrap(), which can panic on malformed lengths; change these to safe, non-panicking conversions: for the hash field, replace hash_bytes.copy_from_slice(&self.hash) with a length check / try_into pattern (e.g., if let Ok(arr) = self.hash.as_slice().try_into() { TxHash::from(arr) } else { /* fallback: default hash or return Err */ }) and for confirmed_at (and the other sites that call ChainPoint::from_bytes) replace ChainPoint::from_bytes(b[..].try_into().unwrap()) with a safe match on b[..].try_into() and only call ChainPoint::from_bytes when Ok(arr), otherwise map to None or propagate an error; update the function signature to return Result<MempoolTx, Error> if you prefer failing fast instead of using defaults, and apply the same safe pattern to the other occurrences of ChainPoint::from_bytes in the codebase.

crates/redb3/src/mempool.rs

coderabbitai · 2026-02-15T15:24:38Z

crates/trp/src/methods.rs

+    let limit = params.limit.unwrap_or(50);
+    let include_payload = params.include_payload.unwrap_or(false);
+
+    let mempool = context.domain.mempool();
+    let peeked = mempool.peek_pending(limit + 1);
+
+    let has_more = peeked.len() > limit;


⚠️ Potential issue | 🟠 Major

Integer overflow: limit + 1 can wrap around when limit == usize::MAX.

If a client sends a very large limit value, limit + 1 on line 268 will overflow (panic in debug, wrap to 0 in release). The same issue exists in trp_peek_inflight at line 320. Use saturating_add or cap the limit.

Proposed fix using saturating_add and a cap

let limit = params.limit.unwrap_or(50); + let limit = limit.min(1000); // cap to a reasonable max let include_payload = params.include_payload.unwrap_or(false); let mempool = context.domain.mempool(); - let peeked = mempool.peek_pending(limit + 1); + let peeked = mempool.peek_pending(limit.saturating_add(1));

Apply the same pattern to trp_peek_inflight.

🤖 Prompt for AI Agents

In `@crates/trp/src/methods.rs` around lines 264 - 270, The code uses limit + 1 when calling mempool.peek_pending (and similarly in trp_peek_inflight), which can overflow when params.limit is usize::MAX; change the arithmetic to safe operations (e.g., compute let peek_count = limit.saturating_add(1) or cap limit to a reasonable max before adding) and pass peek_count to mempool.peek_pending; apply the same fix in trp_peek_inflight to avoid panic/wraparound and preserve the has_more check using peek_count instead of limit + 1.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/core/src/mempool.rs (1)
262-307: ⚠️ Potential issue | 🟠 Major

Mempool-aware UTxO scanning only considers pending transactions — inflight/acknowledged txs become invisible.

scan_mempool_utxos, exclude_inflight_stxis, and select_mempool_utxos all call mempool.peek_pending(usize::MAX) exclusively (lines 268, 291, 312). Once a transaction transitions from Pending to Propagated/Acknowledged/Confirmed via mark_inflight, its produced UTxOs and consumed inputs will no longer be accounted for by MempoolAwareUtxoStore.

This means a second mempool transaction that depends on outputs from a first (now-inflight) transaction will fail UTxO resolution. Consider also scanning inflight txs via peek_inflight.
#!/bin/bash
# Check if peek_inflight is used anywhere in the UTxO-aware store or related code
rg -n 'peek_inflight' --type=rust -C3

🤖 Fix all issues with AI agents

In `@crates/core/src/builtin/mempool.rs`:
- Around line 288-308: dump_finalized currently treats cursor as a positional
index into the VecDeque finalized_log which breaks when finalized_log is pruned;
change the ephemeral mempool to use a monotonic sequence number as the cursor:
add a monotonically increasing sequence counter on the mempool state and attach
the sequence (e.g., seq: u64) to each finalized item (MempoolTx or a small
wrapper stored in finalized_log), increment the counter in the finalize path
(where items are pushed and pruning via MAX_FINALIZED_LOG happens), and modify
dump_finalized to select items by seq >= cursor (not by index), return items in
seq order up to limit, and set next_cursor to the sequence after the last
returned item (or None if done) so cursors remain stable across pruning;
references: dump_finalized, finalized_log, MempoolPage, MempoolTx,
MAX_FINALIZED_LOG, RedbMempool.
- Around line 191-223: In confirm (EphemeralMempool::confirm) handle unseen_txs
rollbacks by moving the tx out of state.acknowledged and into state.pending so
it will be re-submitted: for each tx_hash in unseen_txs, take the entry from
state.acknowledged (get_mut or remove), set its stage to
MempoolTxStage::Pending, reset confirmations/confirmed_at as done, insert it
into state.pending (using the same key/TxHash), notify with the RolledBack stage
as before, and ensure the acknowledged entry is removed so the in-memory
behavior matches RedbMempool and the trait contract.

In `@crates/testing/src/harness/cardano.rs`:
- Around line 256-264: The code holds the write lock from
self.domain.write_chain() in variable chain and then calls
self.drain_with_callback() while that guard is still held, causing a deadlock
because drain_with_callback() re-acquires the same RwLock; fix by dropping the
write guard before calling drain_with_callback(): check can_receive_block()
while the guard exists, then explicitly drop(chain) (or close that scope) prior
to calling self.drain_with_callback(&mut on_work) and after the callback
finishes re-acquire the write lock (e.g., call self.domain.write_chain() again)
to call receive_block(block)? on the fresh guard.

🧹 Nitpick comments (4)

tests/epoch_pots/main.rs (2)

87-105: SeedConfig appears to be entirely unused — consider removing it instead of suppressing the warning.

config.seeds is never accessed in run_epoch_pots_test (only config.snapshots is used at line 418). If this struct is being kept for future use, the #[allow(dead_code)] annotations are fine, but removing unused code is generally preferable to silencing warnings.

404-415: Suppressing clippy::too_many_arguments is acceptable for test code.

If this function's parameter list grows further, consider grouping the ground-truth CSV strings into a small struct (e.g., GroundTruth { epochs, pparams, eras, delegation, stake, rewards }), which would also make the macro-generated call sites cleaner.

crates/redb3/src/mempool.rs (2)

401-425: receive_inner performs a full table scan for duplicate detection — O(n) per receive.

Lines 408–414 iterate every entry in PENDING_TABLE to check for a duplicate hash. Since the pending table key is [seq ++ hash] (ordered by sequence), there's no efficient hash-based lookup. Under high submission rates, this linear scan could become a bottleneck.

Consider maintaining a secondary hash index (e.g., a separate PENDING_HASH_TABLE mapping hash → seq) or also checking the INFLIGHT_TABLE (which is keyed by hash) before scanning pending.

368-399: with_write_tx silently swallows all errors — callers have no indication of failure.

mark_inflight, mark_acknowledged, confirm, and finalize all use with_write_tx, which logs a warning and returns without propagating errors. Since the trait methods return (), there's no way to signal failure to the caller. A failed confirm or finalize, for instance, would silently leave the mempool in an inconsistent state.

This is a design constraint from the trait, but worth noting: if a write fails, the in-memory broadcast events are also skipped (lines 396–398), so subscribers won't be notified — but the caller also won't know to retry. Consider adding metrics or a health flag for operational visibility.

crates/core/src/builtin/mempool.rs

crates/testing/src/harness/cardano.rs

coderabbitai

🧹 Nitpick comments (4)

crates/core/src/builtin/mempool.rs (1)

96-109: Duplicate check only inspects pending, not inflight or acknowledged.

If a transaction was already promoted to inflight/acknowledged, a second receive call with the same hash would succeed and create a duplicate in pending. The Redb backend has the same limitation (only checks pending table), so this is at least consistent, but worth noting for the overall design.

crates/core/src/mempool.rs (1)

286-331: peek_pending(usize::MAX) may be expensive with a large Redb-backed mempool.

scan_mempool_utxos, exclude_inflight_stxis, and select_mempool_utxos all call mempool.peek_pending(usize::MAX), which loads every pending tx into memory. For the Redb backend, this means a full table scan and deserialization of all entries. This is acceptable for a small mempool, but worth keeping in mind if the pending queue grows large.

crates/redb3/src/mempool.rs (2)

383-414: with_write_tx silently swallows errors — operations may silently fail.

If the write transaction or commit fails, the method logs a warning and returns without propagating the error. This means mark_inflight, mark_acknowledged, confirm, and finalize can silently drop updates. This is a deliberate "best effort" design, but callers have no way to detect or retry failures.

Consider whether at least confirm and finalize should propagate errors, since silent data loss in these paths could lead to transactions being stuck in an incorrect state indefinitely.

416-440: receive_inner scans the full pending table for duplicate detection.

The duplicate check iterates every entry in PENDING_TABLE (line 423) to compare hashes. With the composite PendingKey layout (seq + hash), there's no direct hash-based lookup. For a small mempool this is fine, but it becomes O(n) per receive call. If the pending queue grows large, consider a secondary index or a separate hash-set table for O(1) dedup.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@crates/redb3/src/mempool.rs`:
- Around line 725-756: with_write_tx currently swallows all errors (begin_write,
closure f, and wx.commit) by logging a warning and returning (), which makes
callers like mark_inflight, mark_acknowledged, confirm, and finalize unaware of
failures; change with_write_tx to return a Result<Vec<MempoolTx>,
RedbMempoolError> (or Result<(), RedbMempoolError> depending on caller
expectations), propagate the underlying errors from self.db.begin_write(), the
closure f(&wx), and wx.commit() instead of only warn!-logging, and update
callers (mark_inflight, mark_acknowledged, confirm, finalize) to handle or
propagate the Result so failures aren’t silently ignored.
- Around line 758-767: The receive_inner method currently only checks
PendingTable::contains and can insert a tx that already exists in InflightTable;
update receive_inner to also query InflightTable::contains(&wx, &tx.hash) (using
the same write transaction) and return MempoolError::DuplicateTx if present
before calling PendingTable::insert, ensuring no duplicate appears in both
tables and preserving the existing wx.commit() and self.notify(tx) flow.

🧹 Nitpick comments (4)

crates/redb3/src/mempool.rs (4)

445-468: PendingTable::contains and contains_hash do a full table scan to find a hash.

Because the pending table is keyed by (seq, tx_hash), every hash lookup requires iterating all entries — O(n) per call. This is called on every receive and will degrade as the pending queue grows. Consider adding a secondary index table (hash → seq) or restructuring the key to allow direct hash lookups.

698-699: Broadcast channel capacity of 16 may be too small for burst scenarios.

If mempool operations produce events faster than subscribers consume them (e.g., a batch finalize of many transactions), the BroadcastStream will return Lagged errors and subscribers will miss events. Consider making the capacity configurable or using a larger default.

843-874: Confirm logic collects all inflight entries on every call.

InflightTable::collect_all(wx) (line 848) reads every inflight record into memory on each confirm call, even though only the seen_txs and unseen_txs sets are relevant. For a large inflight table, this is wasteful. Consider iterating only the hashes present in seen_txs and unseen_txs, and separately handling the "stale" mark for remaining entries.

876-895: finalize also collects all inflight entries into memory.

Same pattern as confirm — collect_all loads everything. For large inflight tables, consider iterating in-place or only selecting confirmed entries.

crates/redb3/src/mempool.rs

coderabbitai

🧹 Nitpick comments (3)

crates/redb3/src/mempool.rs (3)
445-468: PendingTable::contains / contains_hash perform O(n) full-table scans.

Because DbPendingKey orders by sequence number first, there is no efficient index on the tx hash. Both contains (called on every receive) and contains_hash (called on every check_status) iterate the entire pending table. Under sustained mempool load this will become a bottleneck.

Consider adding a secondary lookup table (e.g., TableDefinition<DbTxHash, ()>) that maps hash → existence, enabling O(log n) duplicate checks while keeping the seq-ordered table for FIFO iteration.

876-895: finalize clones the payload unnecessarily.

record.to_mempool_tx(hash) on line 883 clones the payload, then record.into_finalized_entry(hash) on line 884 consumes the original. You can avoid the extra allocation by consuming the record first and building the event MempoolTx from the FinalizedEntry fields (or by splitting differently).
♻️ Sketch
 for (hash, record) in entries {
     if record.is_finalizable(threshold) {
-        let mut tx = record.to_mempool_tx(hash);
-        let log_entry = record.into_finalized_entry(hash);
+        let log_entry = record.into_finalized_entry(hash);
+        let mut tx = log_entry.into_mempool_tx();
+        // into_mempool_tx already sets stage = Finalized
         InflightTable::remove(wx, &hash)?;
-        FinalizedTable::append(wx, log_entry)?;
-        tx.stage = MempoolTxStage::Finalized;
+        FinalizedTable::append(wx, log_entry)?;  // ← but log_entry is consumed above
This requires either cloning log_entry (cheaper if payload is stored by ref) or extracting the event fields before appending. One clean option: build the MempoolTx event directly inside into_finalized_entry returning both.
698-699: Broadcast channel capacity of 16 may be too small under load.

If consumers are slow, BroadcastStream receivers will get Lagged errors and miss events. With high tx throughput (e.g., a burst of confirmations or finalizations), 16 slots can fill quickly. Consider making this configurable via RedbMempoolConfig or increasing the default.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@crates/core/src/builtin/mempool.rs`:
- Around line 1-6: The PR must run and pass workspace lint/build/test checks;
run the specified commands (cargo clippy --workspace --all-targets
--all-features, cargo build --workspace --all-targets --all-features, cargo test
--workspace --all-features) and fix any clippy warnings, build errors, or
failing tests introduced by changes (start by checking the crate containing
builtin/mempool.rs and any touched modules), ensuring the workspace compiles
cleanly and tests pass before merging.

🧹 Nitpick comments (3)

skills/redb-patterns-and-practices/SKILL.md (3)
19-62: Add guidance on data validation to prevent read panics.

The example implementations use unwrap() in from_bytes and as_bytes (lines 29, 56, 59), which is unavoidable given redb's trait API. However, this means corrupted or invalid data will panic on read. Consider adding a subsection noting:

Data should be validated before writes

Invalid CBOR or mismatched array lengths will panic on deserialization

Callers of table.insert() should ensure domain invariants hold
📝 Suggested addition after line 62
 }
+
+### Validation considerations
+
+Since `redb::Value::from_bytes` does not return `Result`, any decode/parse failure will panic. To prevent read-time panics:
+- Validate domain constraints before calling `table.insert()`.
+- Ensure CBOR-encoded values are well-formed before writes.
+- For fixed-width keys, ensure byte slices match the expected length.
3-3: Clarify "implementation-agnostic" claim.

The statement "These are implementation-agnostic" is potentially misleading. These patterns are specific to redb and not applicable to other storage backends like fjall. Consider rephrasing to: "These apply to all redb-backed storage modules within this crate" or similar.

Based on learnings, storage implementations should maintain consistency between redb3 and fjall backends, but this document only addresses redb3 patterns.
📝 Proposed clarification
-Rules and patterns extracted from `crates/redb3/` (wal, state, archive, mempool modules). These are **implementation-agnostic** — they apply whenever adding or modifying redb-backed storage in this crate.
+Rules and patterns extracted from `crates/redb3/` (wal, state, archive, mempool modules). These patterns apply to all modules within this crate that use redb-backed storage (wal, state, archive, mempool, etc.).
265-278: Consider symbol-based references for maintainability.

The reference table uses hardcoded line ranges (e.g., "~~64-131") which will drift as the codebase evolves. While the "~~" prefix signals approximate ranges, these references require manual upkeep. Consider alternatively referencing by symbol/struct name (e.g., "DbChainPoint impl") to reduce maintenance burden, or accept this as a living-document trade-off.

coderabbitai · 2026-02-16T11:40:05Z

crates/core/src/builtin/mempool.rs

+//! Built-in in-memory mempool implementation.
+//!
+//! A basic FIFO mempool backed by in-memory data structures and a
+//! broadcast channel for event notifications. Suitable for single-node
+//! deployments and development/testing.
+


⚠️ Potential issue | 🟡 Minor

Run required workspace checks (clippy/build/test).

Please run the required Rust workspace checks before merging.

#!/bin/bash cargo clippy --workspace --all-targets --all-features cargo build --workspace --all-targets --all-features cargo test --workspace --all-features

As per coding guidelines: “Run cargo clippy --workspace --all-targets --all-features and resolve all clippy warnings before committing changes. Ensure the project builds without warnings by running cargo build --workspace --all-targets --all-features. Run cargo test --workspace --all-features to verify functionality of all changes”.

🤖 Prompt for AI Agents

In `@crates/core/src/builtin/mempool.rs` around lines 1 - 6, The PR must run and pass workspace lint/build/test checks; run the specified commands (cargo clippy --workspace --all-targets --all-features, cargo build --workspace --all-targets --all-features, cargo test --workspace --all-features) and fix any clippy warnings, build errors, or failing tests introduced by changes (start by checking the crate containing builtin/mempool.rs and any touched modules), ensuring the workspace compiles cleanly and tests pass before merging.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/core/src/mempool.rs (1)
298-362: ⚠️ Potential issue | 🟠 Major

Include inflight txs in mempool-aware UTxO scans.

scan_mempool_utxos, exclude_inflight_stxis, and select_mempool_utxos only iterate peek_pending(). Once a tx moves to Propagated/Acknowledged, its produced UTxOs and spent inputs are ignored, which can cause invalid validation of chained txs. Consider iterating over both pending and inflight/acknowledged.
🐛 Proposed fix (scan both pending and inflight)
-    for mtx in mempool.peek_pending(usize::MAX) {
+    for mtx in mempool
+        .peek_pending(usize::MAX)
+        .into_iter()
+        .chain(mempool.peek_inflight(usize::MAX))
+    {
Apply the same pattern in:

exclude_inflight_stxis (Line 321)

select_mempool_utxos (Line 342)

🤖 Fix all issues with AI agents

In `@crates/redb3/src/mempool.rs`:
- Around line 205-222: The from_bytes implementations (e.g., fn from_bytes for
DbEraCbor, InflightRecord, FinalizedEntry) currently call
minicbor::decode(data).unwrap() which can panic on malformed CBOR; replace the
unwrap with error handling: attempt minicbor::decode(data) and on Err either
return a safe default/sentinel instance (implement Default for the affected
wrapper types if needed) or propagate/log the decode error via your error
wrapper so the DB read doesn't panic. Ensure you update the matching type
constructors (DbEraCbor(...), InflightRecord(...), FinalizedEntry(...)) to
accept the fallback/default value or return a Result if you choose propagation,
and add logging of the corruption detail when handling the error.

🧹 Nitpick comments (2)

crates/redb3/src/mempool.rs (1)

736-747: Broadcast channel capacity of 16 may be insufficient under load.

The broadcast channel created at line 738 has a capacity of 16. During high-throughput scenarios (e.g., many transactions being processed), slow subscribers could cause Lagged errors. Consider making this configurable or using a larger default.

src/adapters/storage.rs (1)

106-119: unwrap_or_default() may silently produce empty path.

At line 112, config.storage.mempool_path().unwrap_or_default() returns an empty PathBuf if the path resolution fails. This could lead to attempting to create a database at the current directory rather than failing explicitly. Other store openers (e.g., open_wal_store at line 75) have the same pattern, so this appears intentional, but consider whether explicit failure would be safer for persistent stores.

coderabbitai · 2026-02-16T15:06:31Z

crates/redb3/src/mempool.rs

+    fn from_bytes<'a>(data: &'a [u8]) -> Self::SelfType<'a>
+    where
+        Self: 'a,
+    {
+        Self(minicbor::decode(data).unwrap())
+    }
+
+    fn as_bytes<'a, 'b: 'a>(value: &'a Self::SelfType<'b>) -> Self::AsBytes<'a>
+    where
+        Self: 'b,
+    {
+        minicbor::to_vec(&value.0).unwrap()
+    }
+
+    fn type_name() -> redb::TypeName {
+        redb::TypeName::new("mempool_era_cbor")
+    }
+}


⚠️ Potential issue | 🟡 Minor

minicbor::decode().unwrap() can panic on malformed CBOR.

The from_bytes implementations for DbEraCbor, InflightRecord, and FinalizedEntry (lines 209, 268, 316) use minicbor::decode(data).unwrap() which will panic if the stored data is corrupted or incompatible with the current schema.

Consider returning a default/sentinel value or propagating the error through a wrapper that logs the corruption.

🤖 Prompt for AI Agents

In `@crates/redb3/src/mempool.rs` around lines 205 - 222, The from_bytes implementations (e.g., fn from_bytes for DbEraCbor, InflightRecord, FinalizedEntry) currently call minicbor::decode(data).unwrap() which can panic on malformed CBOR; replace the unwrap with error handling: attempt minicbor::decode(data) and on Err either return a safe default/sentinel instance (implement Default for the affected wrapper types if needed) or propagate/log the decode error via your error wrapper so the DB read doesn't panic. Ensure you update the matching type constructors (DbEraCbor(...), InflightRecord(...), FinalizedEntry(...)) to accept the fallback/default value or return a Result if you choose propagation, and add logging of the corruption detail when handling the error.

scarmuega marked this pull request as ready for review February 15, 2026 15:17

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

crates/core/src/builtin/mempool.rs Outdated Show resolved Hide resolved

crates/core/src/builtin/mempool.rs Show resolved Hide resolved

crates/testing/src/harness/cardano.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

crates/redb3/src/mempool.rs Outdated Show resolved Hide resolved

crates/redb3/src/mempool.rs Show resolved Hide resolved

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

scarmuega added 19 commits February 16, 2026 11:13

tidy up interfaces before refactoring

89ad831

redb mempool attempt 1

35952a9

tidy up redb mempool

e49498a

tidy up mempool trait

059b4d7

tidy up mempool impl

f4418d4

wrap it up

8a5a14c

remove config until better approach

e771dbc

fix lints

bbc1922

apply feedback

51cd0ad

imprlement non-confirmations

8cd13a1

refactor mempool internals

b6400b0

more nomenclature fixes

5d8f4b7

docs: add skill for patterns on redb

9d23cd8

refactor: avoid collect-and-mutate pattern when possible

6d217cd

fix retry on built-in mempool

9ee9c96

fix deadlock on test harness

416fa24

make mark ops fallible

3b7c5ab

check inflight for duplicates too

fcbf759

introduce dropped txs

d17e7d6

scarmuega force-pushed the trp-debug branch from b7bbd1f to d17e7d6 Compare February 16, 2026 14:53

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

scarmuega merged commit 6137e12 into main Feb 16, 2026
12 checks passed

scarmuega deleted the trp-debug branch February 16, 2026 15:18

coderabbitai bot mentioned this pull request Feb 16, 2026

fix: hook mempool config to sync pipeline #904

Merged

Conversation

scarmuega commented Feb 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

scarmuega commented Feb 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 15, 2026 •

edited

Loading