Skip to content

[E5] Live mempool tracking — Phase 0/1 testing scaffold (Alchemy WS + MEV-Share SSE + pending-tx decoder + post-state sim) #117

@0xfandom

Description

@0xfandom

Context

Aether's current detection pipeline reacts to confirmed Sync / Swap events — i.e. it sees pool moves only after the block lands. To compete in live MEV (backrun-class arbitrage), we need to see pending victim swaps before the next block, simulate the post-tx state, and detect arbs against that state.

This issue is the free-tier testing scaffold that validates the entire plumbing stack — pending-tx subscription, calldata decoding, post-state simulation, and metrics — without paying for a private mempool feed yet. Once the scaffold proves out (we observe pending DEX swaps in real time and can run Bellman-Ford on the simulated post-state), a separate issue will scope the paid-feed integration (Chainbound Fiber / bloXroute / self-hosted Reth sentries — see the standalone deep-dive report for that).

What this issue ships

A log-only, zero-execution-risk pipeline:

Alchemy WS pending stream ──┐
                            ├─► PendingTxDecoder ─► PostStateSim ─► Detector ─► aether_pending_arb_candidates_total
Flashbots MEV-Share SSE ────┘                                                    (count only — no bundle build, no submission)

No backrun bundle is constructed. No bundle is submitted. The output of this issue is metrics + structured logs that prove every stage of the pipeline works, so the paid-feed swap is a one-line config change later.

Scope

Phase A — Pending-tx ingestion (Rust)

  • New crates/ingestion/src/mempool.rs with a PendingTxStream trait + an AlchemyPendingStream impl.
  • Subscribe via eth_subscribe with alchemy_pendingTransactions and toAddress filtered to UniV2 Router02, UniV3 SwapRouter, UniV3 SwapRouter02, SushiSwap Router, Curve registry routers, Balancer Vault, 1inch AggregationRouter.
  • Lock-free tokio::sync::broadcast channel pending_tx_tx so multiple consumers (decoder + future Fiber stream + future sentry) can fan-in / fan-out without re-subscribing.
  • Reuses the existing multi-node node_pool.rs reconnect / health-state-machine logic.
  • Per-source dedup keyed on tx hash.

Phase B — Calldata decoder (Rust)

  • New crates/pools/src/router_decoder.rs with alloy::sol! ABIs for the 7 router selectors above.
  • Decode each pending tx → (pool_address, token_in, token_out, amount_in, deadline).
  • Drop tx if pool is not in the registry; emit aether_pending_dex_tx_total{router, pool, decoded}.
  • Emit a decode_failure counter so we can see the long tail of unsupported router shapes (1inch v6 multi-step, Balancer batch swaps, etc.).

Phase C — Post-state simulation (Rust)

  • Extend crates/simulator with simulate_pending_then_detect:
    1. Fork latest block via existing revm CacheDB + EthersDB path.
    2. Apply the pending tx in the forked EVM.
    3. Read post-state pool reserves for every affected pool.
    4. Run Bellman-Ford on the post-state subgraph (reuse the existing BellmanFord from crates/detector).
  • Emit aether_pending_arb_candidates_total{router, profit_bucket} and structured log MEMPOOL ARB CANDIDATE with arb_id, victim_tx_hash, hops, gross_profit_wei, sim_us.
  • Strictly log-only. Do NOT publish to the gRPC arb stream — the Go executor must remain unaware in this phase to avoid accidental live submission.

Phase D — MEV-Share SSE consumer (Go)

  • New cmd/monitor/mev_share.go consuming https://mev-share.flashbots.net SSE.
  • Decode hints (tx_hash, function_selector, optional calldata, optional logs).
  • Emit aether_mev_share_hints_total{has_calldata, has_logs}.
  • Cross-check: when a hint and an Alchemy pending-tx point at the same tx_hash, log first-seen latency delta into aether_mempool_first_seen_delta_ms{source} so we have data on which signal is faster, per source.

Phase E — Metrics + Grafana panel

  • New panel on the existing observability dashboard (or a sub-dashboard Mempool — testing) showing:
    • rate(aether_pending_dex_tx_total[1m]) per router
    • rate(aether_pending_arb_candidates_total[1m]) per profit bucket
    • rate(aether_mev_share_hints_total[1m])
    • First-seen latency histogram (Alchemy vs MEV-Share)
    • Decoder failure rate

Acceptance Criteria

  • aether-rust subscribes to Alchemy alchemy_pendingTransactions WS at startup when MEMPOOL_TRACKING=1 is set; binary boots identically when unset (zero behaviour change for current users)
  • At least 95% of pending txs against the 7 supported routers decode cleanly (verified over a 1-hour staging run)
  • simulate_pending_then_detect produces a non-zero aether_pending_arb_candidates_total over a 1-hour run (proves the post-state path works)
  • cmd/monitor emits MEV-Share hint counts and first-seen latency deltas vs Alchemy
  • Grafana panel renders all five charts with live data
  • No bundle is constructed or submitted by the Go executor in any code path activated by this issue
  • cargo test --workspace --release, cargo clippy --workspace --all-targets -- -D warnings, go test ./... -race -count=1 all clean
  • Public-facing docs / README updated with the MEMPOOL_TRACKING=1 env-var contract

Out of scope (separate follow-up issues)

  • Paid mempool feeds — Chainbound Fiber gRPC, bloXroute Mempool Tx, Merkle searcher API. Each gets its own issue scoped against the MempoolSource trait introduced in Phase A.
  • Self-hosted Reth sentry mesh at NY5 / AMS / SGP. Infra workstream, separate budget approval.
  • Backrun bundle construction[victim_raw_tx, arb_tx, tip_tx] with revertingTxHashes, plus signing-flow updates in cmd/executor/bundle.go. Tracked separately to keep this scaffold log-only.
  • MEV Blocker / CoW DAO searcher bid integration — orderflow auction surface, separate Go-side workstream.
  • JIT-LP detection for UniV3 — different code path entirely.
  • Sandwich-class strategies — explicit team policy decision required before any code lands.

Acceptance metrics that justify moving to paid feeds

When this issue is merged and we have a week of staging data, the team should review:

  1. Coverage gap: pending txs we observe vs the public mempool tx-hash dump (mempool-dumpster). Anything <80% means partial-view risk and justifies Fiber / sentries.
  2. Decode hit rate: <90% across the 7 routers means the decoder needs more shapes before paid feeds add value.
  3. Arb-candidate rate: pending-arb candidates / minute. <0.1/min means the simulation path is too slow or the detection threshold is wrong; fix before scaling cost.
  4. First-seen latency vs MEV-Share: median Alchemy WS first-seen delta. >150ms is the line at which paid feeds pay for themselves.

Related

  • Mempool source comparison report (in-flight, separate doc)
  • crates/ingestion/src/node_pool.rs — existing multi-node pool we extend
  • crates/simulator/src/lib.rs — fork-mode simulator we add the new entry point to
  • CLAUDE.md "Hot Path" section — the new pipeline targets the same <15ms end-to-end budget

Epoch

E5 — live MEV detection. Unblocks paid-feed integration, backrun bundle construction, and every downstream live-execution feature.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions