Skip to content

feat(mempool): live tracking — subscribe + decode + analytical V2 post-state sim#118

Merged
Pablosinyores merged 20 commits into
developfrom
feat/mempool-tracking-scaffold
May 15, 2026
Merged

feat(mempool): live tracking — subscribe + decode + analytical V2 post-state sim#118
Pablosinyores merged 20 commits into
developfrom
feat/mempool-tracking-scaffold

Conversation

@0xfandom
Copy link
Copy Markdown
Collaborator

@0xfandom 0xfandom commented Apr 30, 2026

Summary

  • Lands [E5] live mempool tracking — Phase A → E end to end ([E5] Live mempool tracking — Phase 0/1 testing scaffold (Alchemy WS + MEV-Share SSE + pending-tx decoder + post-state sim) #117). Pipeline subscribes to Alchemy alchemy_pendingTransactions, decodes router calldata, runs analytical V2 post-state simulation through Bellman-Ford, and emits Prometheus metrics. Strictly log-only — no executor wiring, no bundle build, no submission.
  • MEMPOOL_TRACKING=1 gates the entire path; binaries boot identically when unset.
  • Co-runs Flashbots MEV-Share SSE consumer in the Go monitor; counts hint volume and calldata/log enrichment for cross-source coverage analysis.
  • Validated on live mainnet: 13 V2 swaps decoded, 2 ran sim end-to-end (no_profitable_cycle — expected on the 3-pool WETH-USDC registry), 4899 MEV-Share hints, 0 errors, p99 detection latency < 0.5 ms.

Phase breakdown

Phase Crate / file What
A — Pending-tx ingestion crates/ingestion/src/mempool.rs MempoolSource trait + AlchemyMempool impl. WS subscribe with toAddress filter on 6 routers (UniV2, UniV3 SwapRouter / SwapRouter02, Sushi, Curve registry, Balancer Vault). Per-source dedup, lock-free broadcast channel. Reuses node_pool.rs reconnect FSM.
B — Calldata decoder crates/pools/src/router_decoder.rs alloy::sol! ABIs for V2 router (swapExactTokensForTokens + 5 fee-on-transfer variants), V3 single-hop (exactInputSingle, exactOutputSingle). Decodes to (token_in, token_out, amount_in). Long-tail unsupported selectors emit aether_pending_decode_errors_total{reason}.
C-lite crates/grpc-server/src/mempool_pipeline.rs Decode loop, metric emission, structured logs.
C-full (Path A — analytical V2) same file Full post-state cycle scan without revm. Predict V2 reserves with constant-product math (dy = dx*0.997*y / (x + dx*0.997), x' = x + dx, y' = y - dy). Clone graph, mutate the 2 affected edges, run Bellman-Ford detect_from_affected, bucket profit (<10bps / 10-50 / 50-200 / >200bps), emit aether_pending_arb_candidates_total{router, profit_bucket}. Skip reasons: protocol_unsupported, token_*_unknown, pool_not_registered, graph_edge_missing, reserves_zero, no_profitable_cycle. Path B (full revm fork sim for V3/Curve/Balancer victims) is a follow-up issue — analytical covers V2/Sushi which dominate retail mempool volume at ~50 µs per tx vs 5–20 ms for revm.
D — MEV-Share consumer cmd/monitor/mev_share.go SSE stream from https://mev-share.flashbots.net. Decodes hints (tx_hash, optional calldata, optional logs). Emits aether_mev_share_hints_total, _with_calldata_total, _with_logs_total, _errors_total.
E — Metrics + Grafana deploy/docker/grafana/dashboards/mempool.json 9-panel dashboard: pending DEX rate (1m), decode success %, hints/s, hints-with-logs %, decode failures by reason, arb candidates rate, arb candidates by profit bucket, sim skipped by reason, MEV-Share stream health.

Files Changed (Phase C-full slice — e227e73)

File Change
crates/grpc-server/src/main.rs Build SimContext (pool registry + token index + snapshot manager + BF detector), pass to spawn_mempool_pipeline. Logs sim=true at startup when registry is wired.
crates/grpc-server/src/mempool_pipeline.rs New SimContext struct, try_post_state_scan function, predict_v2_post_state math helper, profit_bucket mapper. 7 unit tests covering the math edges (zero reserves, zero input, fee_factor zero, normal V2 swap, profit bucket boundaries).
crates/grpc-server/src/metrics.rs aether_pending_arb_candidates_total{router, profit_bucket} IntCounterVec, aether_pending_arb_sim_skipped_total{reason} IntCounterVec, inc_pending_arb_candidates and inc_pending_arb_sim_skipped helper methods.
deploy/docker/grafana/dashboards/mempool.json 3 new panels for Phase C-full signal: arb candidates 1m rate stat, arb candidates by profit bucket time series, sim skipped by reason time series.

Acceptance Criteria (#117)

  • MEMPOOL_TRACKING=1 gates the path end-to-end; unset = identical boot
  • Alchemy alchemy_pendingTransactions WS subscription with router filter
  • V2 / Sushi calldata decode (95%+ within supported routers; long-tail selectors logged)
  • Post-state cycle scan emits aether_pending_arb_candidates_total (analytical Path A; verified end-to-end)
  • MEV-Share SSE consumer emits hint counters + calldata/log enrichment %
  • Grafana dashboard renders 9 charts including arb candidates and sim skipped
  • Zero executor submission — no bundle build, no eth_sendBundle call from this path
  • cargo build --workspace --release clean
  • cargo clippy --workspace --all-targets --release -- -D warnings clean
  • cargo test --workspace --release — 436 passed, 0 failed
  • go build ./..., go vet ./... clean
  • go test ./... -race -count=1 — all packages ok
  • forge test — 59 passed, 2 skipped, 0 failed
  • docker compose config validates
  • Live mainnet smoke: pipeline produced signal end-to-end with zero errors over a 5-min sample

Deferred to follow-up issues (out of scope per #117 itself):

  • 95% decode rate gate — needs UniV3 multicall (universal-router) + Curve + 1inch decoders
  • Non-zero candidates over a 1-hour run — needs pool registry expansion (current config/pools.toml ships 3 pools; a non-trivial fraction of pending V2 swaps skip on token_*_unknown against this narrow set, exactly as expected)
  • First-seen latency delta histogram (Alchemy ↔ MEV-Share)
  • MEMPOOL_TRACKING README + docs site update
  • Path B revm fork sim for V3/Curve/Balancer victims
  • Backrun bundle construction — explicitly out of scope per [E5] Live mempool tracking — Phase 0/1 testing scaffold (Alchemy WS + MEV-Share SSE + pending-tx decoder + post-state sim) #117 ("keep this scaffold log-only")

Live mainnet evidence (5-min sample, ETH mainnet)

mempool decoded:        13 UniswapV2 swaps
sim ran end-to-end:     2 (graph clone → V2 prediction → BF scan → no_profitable_cycle)
sim skipped:            11 (token_in_unknown / token_out_unknown — 3-pool registry too narrow)
decode errors:          1 (unknown_selector long-tail)
mev-share hints:        4899
mev-share with calldata: 1799 (37%)
mev-share with logs:    1655 (34%)
mev-share errors:       0
engine detection p99:   < 0.5 ms

Two end-to-end sim runs prove the pipeline plumbs Alchemy → decode → token_index → pool_registry → snapshot → V2 math → graph mutation → Bellman-Ford → metric emission against live mainnet traffic. Zero candidates is the correct signal on a registry of 3 WETH-USDC pools — that pair is the most-arbed on Ethereum and retail-sized swaps don't crack the 30 bps fee on UniV2 vs Sushi vs UniV3.

Test plan

  • cargo build --workspace --releaseFinished release profile [optimized] target(s) in 9.61s
  • cargo clippy --workspace --all-targets --release -- -D warnings — clean
  • cargo test --workspace --release — 436 passed, 0 failed, 0 ignored
  • go build ./... — clean
  • go vet ./... — clean
  • go test ./... -race -count=1executor, monitor, pooldiscovery, internal/config, internal/risk all ok
  • forge test — 59 passed, 2 skipped (4 suites, 159 ms)
  • docker compose -f deploy/docker/docker-compose.yml config — valid
  • Live mainnet 5-min run: rust + monitor + Prometheus + Grafana — pipeline produced the signal table above with 0 errors

Operator usage

set -a && source .env && set +a
MEMPOOL_TRACKING=1 \
  MEMPOOL_WS_URL="wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}" \
  RUST_METRICS_PORT=9092 \
  RUST_LOG=info,aether_grpc_server::mempool_pipeline=info \
  ./target/release/aether-rust

MEMPOOL_TRACKING=1 METRICS_PORT=9094 ./bin/aether-monitor

Sample after a soak window:

curl -s http://localhost:9092/metrics | grep -E "^aether_pending_(dex_tx|decode_errors|arb_candidates|arb_sim_skipped)_total"
curl -s http://localhost:9094/metrics | grep "^aether_mev_share"

Grafana: http://localhost:3000/d/aether-mempool/aether-mempool-testing-scaffold.

Closes #117

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aether Ready Ready Preview, Comment May 14, 2026 2:58pm
aether-63xv Ready Ready Preview, Comment May 14, 2026 2:58pm

@0xfandom 0xfandom requested a review from Pablosinyores April 30, 2026 12:44
@0xfandom 0xfandom changed the title feat(mempool): live tracking — subscribe + decode + MEV-Share SSE (in progress) feat(mempool): live tracking — subscribe + decode + analytical V2 post-state sim May 1, 2026
@0xfandom 0xfandom marked this pull request as ready for review May 1, 2026 14:06
Copy link
Copy Markdown
Owner

@Pablosinyores Pablosinyores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — feat(mempool): live tracking — subscribe + decode + analytical V2 post-state sim

Solid scaffold and the live mainnet evidence (13 swaps decoded, 2 ran end-to-end through decode → V2 prediction → BF → metric over 5 min, 0 errors) is convincing. Verified locally: cargo build --workspace --release clean, decoder unit tests 7/7 pass, mempool_pipeline math tests pass, go build ./... + go vet ./... clean, monitor tests pass. CI all green.

The flag-gate is real: MEMPOOL_TRACKING unset → no goroutines spawned, no broadcast subscribers, no metric movement. Hot path on main is unchanged.

Blocking (CRITICAL)

crates/pools/src/router_decoder.rs:325 — SushiSwap pending txs are silently tagged as Protocol::UniswapV2

decode_v2_call unconditionally sets protocol: Protocol::UniswapV2 regardless of which router address was called. The comment ("SushiSwap callers rely on metric label, not this") points at a re-tagging path that does not exist anywhere in the diff. Trace:

  1. SushiSwap Router02 (0xd9e1cE17f2641f24aE83637ab66a2cca9C378B9F, listed in crates/ingestion/src/mempool.rs:247) shares the V2 ABI, so all Sushi calldata is decoded by try_uni_v2_family (lines 209-307) → decode_v2_callProtocol::UniswapV2.
  2. crates/grpc-server/src/mempool_pipeline.rs:167-174 maps swap.protocol to target_protocol. The Protocol::SushiSwap => ProtocolType::SushiSwap arm at line 169 is dead code — the decoder never produces it.
  3. crates/grpc-server/src/mempool_pipeline.rs:189-194 filters the registry on m.protocol == target_protocol. Sushi pools (config/pools.toml protocol = "sushiswap"ProtocolType::SushiSwap) never match, so the sim either picks the wrong UniV2 pool for the same pair (wrong reserves, wrong simulation) or skips with pool_not_registered.

Impact:

  • aether_pending_dex_tx_total{protocol="uniswap_v2"} is inflated by Sushi traffic; {protocol="sushiswap"} is permanently zero.
  • aether_pending_arb_candidates_total never counts Sushi-victim candidates.
  • The PR body's acceptance criterion "Path A — analytical V2 covers V2/Sushi" is false. The "13 V2 swaps decoded" in the live evidence may include Sushi txs misattributed.
  • The Sushi router test missing from router_decoder.rs:468+ is what would have caught this — see suggestion below.

Fix sketch: thread the router address through to decode_v2_call and dispatch on it (Sushi Router02 → Protocol::SushiSwap, UniV2 Router02 → Protocol::UniswapV2), or split try_uni_v2_family into per-router entry points that pass the right Protocol constant.

Should Fix Before Merge (WARNING)

  • deploy/docker/prometheus.yml:28-29 vs cmd/monitor/metrics.go:146-147 — port mismatch for host-mode monitor. Prometheus scrapes aether-host-monitor at host.docker.internal:9094, but the Go binary defaults METRICS_PORT=9090. There is no compose service for aether-monitor, so an operator running go run ./cmd/monitor/... without setting METRICS_PORT=9094 will serve metrics on 9090 (also colliding with the aether-go executor). All aether_mev_share_* metrics will be invisible in Grafana. Fix: change the default to 9094, or document METRICS_PORT=9094 as required for the host-mode monitor in .env.example.

  • crates/grpc-server/src/main.rs:148-152 — mempool detector shares the main loop's 3 ms detection budget. Both detectors run on the same tokio runtime with no CPU separation; under high V2 mempool activity, the per-event graph clone (mempool_pipeline.rs:231 — ~3.8 MB at the 5000-pool target × multiple txs/sec) plus a concurrent BF scan can erode the main loop's <3 ms p99 detection latency target through cache contention. Acceptable for the scaffold (low mempool rate, opt-in), but track as a promotion gate: either give the mempool detector a separate larger budget (~10 ms, advisory) or pin it to a non-core CPU before un-gating.

  • crates/grpc-server/src/mempool_pipeline.rs:72-78Lagged(n) count is logged but not metered. When the pipeline can't keep up, events are silently dropped without a counter. Add metrics.inc_pending_decode_errors("lagged") (or a dedicated counter) incremented by n so the Grafana dashboard can detect when the pipeline is falling behind. Combined with the synchronous handle_event and the broadcast capacity sized for the per-event work, this is the only signal of pipeline overload.

  • crates/grpc-server/src/mempool_pipeline.rs:190registry.values().find(...) is O(N) per decoded swap. At the 5000-pool target this is 5000 iterations per V2 pending tx, on top of the graph clone. Index by (token0, token1, protocol) for O(1) lookup. Not blocking at current rates, but compounds with the other scaling concerns when this gets un-gated.

  • cmd/monitor/mev_share.go:72,110-163 — no inactivity timeout on the SSE stream. Timeout: 0 is correct for SSE (a non-zero client timeout would kill the stream), but there is no read-deadline either. Flashbots emits :ping every 15 s, so 5 min without any line is a wedged connection. Wrap each streamOnce in context.WithTimeout(ctx, 5*time.Minute) and reset on each successful scanner.Scan(). Today the goroutine can stall forever without reconnecting; metric goes flat, but no auto-recovery.

  • crates/ingestion/src/mempool.rs:240-250 — Curve Router is in the filter set but the decoder explicitly punts on Curve. Every Curve Router pending tx will pass the toAddress filter, hit the decoder, fail with UnknownSelector, and inflate aether_pending_decode_errors_total{reason="unknown_selector"} with expected failures. Either drop the Curve Router from default_router_addresses() until the decoder supports it, or add a "known_unsupported" reason label so dashboards can distinguish expected vs unexpected failures.

  • .env.example does not flag alchemy_pendingTransactions as Alchemy-specific. crates/grpc-server/src/main.rs:120-122 falls back from MEMPOOL_WS_URL to ETH_RPC_URL, so an operator pointing ETH_RPC_URL at Reth IPC, QuickNode, or Infura plus turning on MEMPOOL_TRACKING=1 gets a silent subscribe-failure loop with warn-level logs and no counter for the failure. Add a clear "Alchemy required" comment in .env.example, plus a startup error!-level log or a Prometheus counter for subscription connect failures so silent breakage is visible.

Nice to Have (SUGGESTION)

  • crates/pools/src/router_decoder.rs:468-596 — add a SushiSwap router test. A unit test that hands decode_pending the Sushi Router02 address with V2-shaped calldata and asserts decoded.protocol == Protocol::SushiSwap would have caught the CRITICAL above and will guard the fix.

  • crates/grpc-server/src/metrics.rsaether_pending_dex_tx_total{router=…} uses raw router address as a label. The router filter is bounded today (6 addresses), but if default_router_addresses() ever loads from config or a discovery feed, this becomes unbounded label cardinality. Map to a stable enum label (uni_v2_router02, sushi_router02, …) before un-gating.

  • crates/ingestion/src/mempool.rs:223-225 — comment says "Linear bounded backoff" but the code is backoff = (backoff * 2).min(...) (exponential). Fix the comment.

  • crates/grpc-server/src/mempool_pipeline.rs:222-223 / 285 / 307-321 — three doc-vs-code drifts in the V2 math helpers.

    • u256_to_f64_saturating doc claims "f64 mantissa is enough for token amount magnitudes seen on-chain (up to ~2^53 ≈ 9e15)". 1 ETH = 10^18 wei = 2^59.8, so the bound is wrong as stated. The math is still fine (relative error stays well below the BF detection epsilon), but rewrite the comment so a future reader doesn't trust the wrong bound.
    • post_out.max(1.0) clamps to 1 wei for 18-decimal tokens (price graph stores reserves in raw token units as f64). Note the unit in the comment so the next reader doesn't think it means "1 token".
    • // SushiSwap callers rely on metric label, not this at decoder line 325 should be deleted (or fixed) once the protocol-tagging bug above is addressed.
  • crates/grpc-server/src/main.rs:149-150EngineConfig::default() is called twice. Bind once and reuse.

  • crates/ingestion/src/mempool.rs:90-102MempoolSource trait has only one impl. Premature abstraction by the letter of CLAUDE.md, but the docstring lists planned impls (Chainbound Fiber, bloXroute, Reth IPC) and the trait surface is minimal. Defensible for the scaffold; just don't add more methods to it before a second impl arrives.

  • deploy/docker/grafana/dashboards/mempool.json panel queries lack {job="…"} filters. Today the metric names are disjoint between processes (Rust aether_pending_* vs Go aether_mev_share_*), so no collision. Defensive: add the filter so future metric additions can't double-count.

What's Good

  • Flag gate genuinely is a no-op when unset: crates/grpc-server/src/main.rs:118, cmd/monitor/metrics.go:183 are the only entry points; both flag-gated.
  • No eth_sendBundle / submission code anywhere in the new files. Log-only claim verified.
  • MEMPOOL_TRACKING truthy parsing is bit-for-bit symmetric across Rust (mempool.rs:47-55) and Go (mev_share.go:198-205): 1|true|yes|on, case-insensitive.
  • predict_v2_post_state correctly implements UniV2 constant-product: dy = dx*fee_factor*y / (x + dx*fee_factor), x' = x + dx (full pre-fee amount), y' = y - dy. Matches on-chain.
  • BellmanFord::detect_from_affected seeded only on the two affected vertices is the correct optimization for a single-swap post-state change.
  • MEV-Share Go test includes an httptest SSE server validating end-to-end stream parsing including :ping keepalives and partial events.
  • ProtocolType enum stays aligned across Rust (types.rs:7), proto, and Solidity — this PR doesn't touch the cross-layer contract.
  • Decoder failure labels (too_short, unknown_selector, abi_decode, empty_path) are bounded cardinality.
  • Fee-on-transfer V2 variants are decoded — meme-token routing is the dominant V2 mempool shape, and the non-FOT-only set of selectors would have produced near-zero hit rate.

Verdict

REQUEST CHANGES — gated on the CRITICAL Sushi protocol-tagging bug. The WARNINGs around the Prometheus port, missing lagged metric, SSE inactivity timeout, and Alchemy constraint should land in the same revision since they all surface real silent-failure modes the dashboard won't catch.

0xfandom added a commit that referenced this pull request May 5, 2026
UniswapV2 Router02 and SushiSwap Router02 share the same ABI byte-for-
byte; the only signal the decoder has is the 'to' address. The previous
implementation hard-coded Protocol::UniswapV2 for every V2-shape decode,
so every Sushi pending tx was misattributed: the registry pool lookup
hunted in the UniV2 pool set, missed every Sushi victim, and inflated
the uniswap_v2 metric label while the sushiswap label never moved.

Adds a small SUSHISWAP_ROUTERS const table and a router_to_v2_protocol
helper that decode_v2_call now uses to dispatch on router address.
Mainnet Sushi Router02 (d9e1cE17…378B9F) is the seed entry; future
Sushi-flavoured routers (SushiX, RouteProcessor) only need to be added
to the const list.

Two new tests pin both halves of the dispatch:
- decode_sushiswap_router_tagged_as_sushi_not_uni_v2
- decode_uni_v2_router_still_tagged_as_uni_v2

Removes the orphaned 'SushiSwap callers rely on metric label' comment.

Refs PR #118 review (critical / blocking).
0xfandom added a commit that referenced this pull request May 5, 2026
The aether-host-monitor scrape job pointed at host.docker.internal:9094,
but cmd/monitor/metrics.go defaults METRICS_PORT to 9090. Without an
explicit METRICS_PORT=9094 on the host process, MEV-Share and mempool
metrics never reached Prometheus and dashboards stayed empty.

Aligns the scrape port with the binary default. The in-compose
'aether-go' job at port 9090 was already correct; only the host-side
mirror was wrong.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The previous post-state scan ran registry.values().find(...) per pending
swap, which is O(N) on the full pool registry. At 5000+ pools this
single linear scan dominates the per-event budget and starves the main
detection loop's 3 ms p99 target.

Adds a (token0_canonical, token1_canonical, protocol) → Vec<PoolMetadata>
index to SimContext, cached behind a Mutex<Option<(registry_ptr,
Arc<PairIndex>)>>. The index is rebuilt only when ArcSwap reports a new
underlying Arc (pointer comparison), so steady-state cost per lookup is
one Mutex acquire + one HashMap probe — independent of pool count.

Cache invalidation rides on the existing pool_registry Arc identity
rather than introducing a separate generation counter, so register_pool
needs no changes.

Replaces the public struct-literal constructor with SimContext::new so
the cache initialises consistently (fixes the only call site in main.rs
to use the helper, and de-duplicates EngineConfig::default() calls
while in there).

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Adds aether_pending_pipeline_lagged_total counter, incremented by the n
returned from broadcast::RecvError::Lagged(n) so dashboards see exactly
how many pending-tx events were dropped, not how many lag events fired.

Without this, sustained pipeline overload only showed up in tracing
logs; an alert on monotonic non-zero growth of this counter is the
trigger to widen the broadcast channel or shed mempool sources.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The MEV-Share SSE consumer set http.Client.Timeout=0 (correct for a
streaming endpoint, since per-request deadlines would force reconnect
every minute regardless of stream health). Combined with no read
deadline on the body, a half-open TCP socket — NAT idle timeout, broker
crash, network partition — would leave the consumer parked on a
permanently silent connection. Metrics would simply stop moving and
the existing 'mev-share stream connected' log line would be the last
signal until process restart.

Adds an idleTimeoutReader that wraps resp.Body, runs a time.Timer
keyed off every successful Read, and cancels the request context when
no bytes arrive for sseReadIdleTimeout (60 s). Flashbots' :ping every
15 s gives 4 intervals of slack before forcing reconnect, so legitimate
quiet windows do not flap. The outer Run loop already has exponential
backoff and counts the resulting error in aether_mev_share_errors_total.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Curve routers were already in the default address filter so the firehose
stays representative of real router traffic, but the decoder cannot yet
parse exchange / exchange_multiple — every such pending tx therefore
fell through to UnknownSelector and inflated the unknown_selector
counter, masking genuine selector gaps in other protocols.

Adds CURVE_ROUTERS const list, is_unsupported_curve_router predicate,
and a CurveUnsupported(Address) DecodeError variant. decode_pending
short-circuits on Curve before selector dispatch, mempool_pipeline
maps the error to a curve_unsupported metric reason. Adding a Curve
router (e.g. Curve Router NG) needs only a const append.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
alchemy_pendingTransactions is an Alchemy-proprietary subscribe method.
Reth, QuickNode, Infura and self-hosted Geth complete the WS upgrade
but never deliver events, so MEMPOOL_TRACKING=1 against any of those
silently produces zero metrics with no obvious failure mode.

.env.example calls this out explicitly with the list of incompatible
providers and a pointer to the MempoolSource trait for adding
non-Alchemy backends. AlchemyMempool::new now runs a hostname heuristic
on the WS URL and logs a structured warning when no Alchemy marker is
present, so non-Alchemy bring-ups fail loudly instead of silently.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The post-state scan clones the price graph (~3.8 MB at current pool
counts) and runs Bellman-Ford on the affected vertices. Running it
inline on the broadcast recv loop ate the same tokio worker threads
the main detection loop uses, sharing the engine's 3 ms p99 budget
with mempool work.

Dispatches the scan onto tokio::task::spawn_blocking so it runs on
the disjoint blocking pool. The recv loop returns to next event
immediately; mempool sim contention with main detection is gone.
Steady-state allocation grows by one Arc clone of metrics + sim_ctx
+ DecodedSwap per pending swap (cheap; everything inside is Arc).

handle_event signature switches to &Arc<...> so the inner closure
can clone without dereferencing through a borrow.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
- Reconnect comment in AlchemyMempool::run was 'Linear bounded backoff'
  but the code is exponential (backoff *= 2). Fix to match.
- Document u256_to_f64_saturating's precision contract: 53-bit mantissa,
  exact up to 2^53, saturating to f64::MAX for adversarial inputs that
  would otherwise produce +inf in the V2 math kernel.
- is_none_or replaces map_or(true, ...) on the pair_index cache check
  (clippy suggestion under -D warnings).
- Mempool Grafana dashboard injects a job=~ filter into every metric
  selector so each panel is bounded to the producing service:
  aether_pending_* → aether-rust|aether-host-rust,
  aether_mev_share_* → aether-go|aether-host-monitor.
  Defensive cardinality control if the same metric ever leaks from
  another scrape job.

Refs PR #118 review (nits).
@0xfandom
Copy link
Copy Markdown
Collaborator Author

0xfandom commented May 5, 2026

Re-requesting review. All 1 critical + 7 should-fix + 7 nits from the previous round addressed across 9 atomic commits (e227e73..cb9e5f4):

Critical

  • 8da115e fix(decoder): tag SushiSwap router as Protocol::SushiSwapSUSHISWAP_ROUTERS const + router_to_v2_protocol dispatch in decode_v2_call. Two new tests pin both halves (decode_sushiswap_router_tagged_as_sushi_not_uni_v2, decode_uni_v2_router_still_tagged_as_uni_v2).

Should-fix

  • 0dba13f fix(observability): host-monitor scrape port 9094 → 9090 (matches cmd/monitor/metrics.go default).
  • 2a18dfa perf(mempool): O(1) cached pair index in SimContext; rebuild keyed off pool_registry Arc pointer identity, so steady-state cost is one Mutex acquire + one HashMap probe.
  • a19d2f2 feat(mempool): aether_pending_pipeline_lagged_total incremented by n from RecvError::Lagged(n) so the metric reflects events dropped, not lag events fired.
  • 87e37ea fix(monitor): 60s SSE read-idle deadline via idleTimeoutReader wrapping resp.Body; cancels the request context on silence and surfaces the cause distinctly to the outer reconnect loop.
  • ad2d753 fix(decoder): CurveUnsupported(Address) variant + is_unsupported_curve_router short-circuit; mempool_pipeline maps to a curve_unsupported reason so genuine unknown_selector gaps stay visible.
  • cb22d25 docs(mempool): .env.example documents the Alchemy-only requirement; warn_if_non_alchemy_endpoint heuristic logs at startup if WS host has no Alchemy marker.
  • 19c0773 perf(mempool): post-state scan dispatched via tokio::task::spawn_blocking so the 3.8 MB graph clone + Bellman-Ford run on the blocking pool, not on the main runtime workers shared with the engine's 3 ms p99 detection budget.

Nits (cb9e5f4)

  • mempool.rs reconnect comment "Linear bounded backoff" → "Exponential bounded backoff" (matches backoff * 2).
  • u256_to_f64_saturating documents the 53-bit mantissa precision contract.
  • is_none_or replaces map_or(true, ...) (clippy under -D warnings).
  • EngineConfig::default() deduped at the SimContext call site (rolled into 2a18dfa).
  • Mempool Grafana dashboard injects {job=~"aether-rust|aether-host-rust"} / {job=~"aether-go|aether-host-monitor"} into every selector for defensive cardinality control.
  • Orphaned "SushiSwap callers rely on metric label" comment removed (rolled into 8da115e).

Local: cargo clippy --workspace --all-targets -- -D warnings clean, 439 Rust tests pass, go test ./... clean.

@0xfandom 0xfandom requested a review from Pablosinyores May 5, 2026 07:55
@0xfandom 0xfandom changed the base branch from main to develop May 5, 2026 11:01
0xfandom added a commit that referenced this pull request May 5, 2026
UniswapV2 Router02 and SushiSwap Router02 share the same ABI byte-for-
byte; the only signal the decoder has is the 'to' address. The previous
implementation hard-coded Protocol::UniswapV2 for every V2-shape decode,
so every Sushi pending tx was misattributed: the registry pool lookup
hunted in the UniV2 pool set, missed every Sushi victim, and inflated
the uniswap_v2 metric label while the sushiswap label never moved.

Adds a small SUSHISWAP_ROUTERS const table and a router_to_v2_protocol
helper that decode_v2_call now uses to dispatch on router address.
Mainnet Sushi Router02 (d9e1cE17…378B9F) is the seed entry; future
Sushi-flavoured routers (SushiX, RouteProcessor) only need to be added
to the const list.

Two new tests pin both halves of the dispatch:
- decode_sushiswap_router_tagged_as_sushi_not_uni_v2
- decode_uni_v2_router_still_tagged_as_uni_v2

Removes the orphaned 'SushiSwap callers rely on metric label' comment.

Refs PR #118 review (critical / blocking).
@0xfandom 0xfandom force-pushed the feat/mempool-tracking-scaffold branch from cb9e5f4 to f0ab7b2 Compare May 5, 2026 11:14
0xfandom added a commit that referenced this pull request May 5, 2026
The aether-host-monitor scrape job pointed at host.docker.internal:9094,
but cmd/monitor/metrics.go defaults METRICS_PORT to 9090. Without an
explicit METRICS_PORT=9094 on the host process, MEV-Share and mempool
metrics never reached Prometheus and dashboards stayed empty.

Aligns the scrape port with the binary default. The in-compose
'aether-go' job at port 9090 was already correct; only the host-side
mirror was wrong.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The previous post-state scan ran registry.values().find(...) per pending
swap, which is O(N) on the full pool registry. At 5000+ pools this
single linear scan dominates the per-event budget and starves the main
detection loop's 3 ms p99 target.

Adds a (token0_canonical, token1_canonical, protocol) → Vec<PoolMetadata>
index to SimContext, cached behind a Mutex<Option<(registry_ptr,
Arc<PairIndex>)>>. The index is rebuilt only when ArcSwap reports a new
underlying Arc (pointer comparison), so steady-state cost per lookup is
one Mutex acquire + one HashMap probe — independent of pool count.

Cache invalidation rides on the existing pool_registry Arc identity
rather than introducing a separate generation counter, so register_pool
needs no changes.

Replaces the public struct-literal constructor with SimContext::new so
the cache initialises consistently (fixes the only call site in main.rs
to use the helper, and de-duplicates EngineConfig::default() calls
while in there).

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Adds aether_pending_pipeline_lagged_total counter, incremented by the n
returned from broadcast::RecvError::Lagged(n) so dashboards see exactly
how many pending-tx events were dropped, not how many lag events fired.

Without this, sustained pipeline overload only showed up in tracing
logs; an alert on monotonic non-zero growth of this counter is the
trigger to widen the broadcast channel or shed mempool sources.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The MEV-Share SSE consumer set http.Client.Timeout=0 (correct for a
streaming endpoint, since per-request deadlines would force reconnect
every minute regardless of stream health). Combined with no read
deadline on the body, a half-open TCP socket — NAT idle timeout, broker
crash, network partition — would leave the consumer parked on a
permanently silent connection. Metrics would simply stop moving and
the existing 'mev-share stream connected' log line would be the last
signal until process restart.

Adds an idleTimeoutReader that wraps resp.Body, runs a time.Timer
keyed off every successful Read, and cancels the request context when
no bytes arrive for sseReadIdleTimeout (60 s). Flashbots' :ping every
15 s gives 4 intervals of slack before forcing reconnect, so legitimate
quiet windows do not flap. The outer Run loop already has exponential
backoff and counts the resulting error in aether_mev_share_errors_total.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Curve routers were already in the default address filter so the firehose
stays representative of real router traffic, but the decoder cannot yet
parse exchange / exchange_multiple — every such pending tx therefore
fell through to UnknownSelector and inflated the unknown_selector
counter, masking genuine selector gaps in other protocols.

Adds CURVE_ROUTERS const list, is_unsupported_curve_router predicate,
and a CurveUnsupported(Address) DecodeError variant. decode_pending
short-circuits on Curve before selector dispatch, mempool_pipeline
maps the error to a curve_unsupported metric reason. Adding a Curve
router (e.g. Curve Router NG) needs only a const append.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
alchemy_pendingTransactions is an Alchemy-proprietary subscribe method.
Reth, QuickNode, Infura and self-hosted Geth complete the WS upgrade
but never deliver events, so MEMPOOL_TRACKING=1 against any of those
silently produces zero metrics with no obvious failure mode.

.env.example calls this out explicitly with the list of incompatible
providers and a pointer to the MempoolSource trait for adding
non-Alchemy backends. AlchemyMempool::new now runs a hostname heuristic
on the WS URL and logs a structured warning when no Alchemy marker is
present, so non-Alchemy bring-ups fail loudly instead of silently.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 5, 2026
The post-state scan clones the price graph (~3.8 MB at current pool
counts) and runs Bellman-Ford on the affected vertices. Running it
inline on the broadcast recv loop ate the same tokio worker threads
the main detection loop uses, sharing the engine's 3 ms p99 budget
with mempool work.

Dispatches the scan onto tokio::task::spawn_blocking so it runs on
the disjoint blocking pool. The recv loop returns to next event
immediately; mempool sim contention with main detection is gone.
Steady-state allocation grows by one Arc clone of metrics + sim_ctx
+ DecodedSwap per pending swap (cheap; everything inside is Arc).

handle_event signature switches to &Arc<...> so the inner closure
can clone without dereferencing through a borrow.

Refs PR #118 review.
@0xfandom 0xfandom force-pushed the feat/mempool-tracking-scaffold branch from 39964be to 88636c3 Compare May 8, 2026 12:26
0xfandom added a commit that referenced this pull request May 8, 2026
alchemy_pendingTransactions is an Alchemy-proprietary subscribe method.
Reth, QuickNode, Infura and self-hosted Geth complete the WS upgrade
but never deliver events, so MEMPOOL_TRACKING=1 against any of those
silently produces zero metrics with no obvious failure mode.

.env.example calls this out explicitly with the list of incompatible
providers and a pointer to the MempoolSource trait for adding
non-Alchemy backends. AlchemyMempool::new now runs a hostname heuristic
on the WS URL and logs a structured warning when no Alchemy marker is
present, so non-Alchemy bring-ups fail loudly instead of silently.

Refs PR #118 review.
0xfandom added a commit that referenced this pull request May 8, 2026
The post-state scan clones the price graph (~3.8 MB at current pool
counts) and runs Bellman-Ford on the affected vertices. Running it
inline on the broadcast recv loop ate the same tokio worker threads
the main detection loop uses, sharing the engine's 3 ms p99 budget
with mempool work.

Dispatches the scan onto tokio::task::spawn_blocking so it runs on
the disjoint blocking pool. The recv loop returns to next event
immediately; mempool sim contention with main detection is gone.
Steady-state allocation grows by one Arc clone of metrics + sim_ctx
+ DecodedSwap per pending swap (cheap; everything inside is Arc).

handle_event signature switches to &Arc<...> so the inner closure
can clone without dereferencing through a borrow.

Refs PR #118 review.
0xfandom and others added 20 commits May 12, 2026 19:10
Adds Alchemy alchemy_pendingTransactions WS source, router calldata
decoder (UniV2/V3/Sushi/Balancer), Go-side MEV-Share SSE consumer,
metrics + Grafana panel. Log-only behind MEMPOOL_TRACKING flag.
Live mempool sample showed UniV2 Router02 traffic dominated by
swapExact*SupportingFeeOnTransferTokens — non-FOT decoder alone
gave near-zero coverage on real traffic.
Adds host scrape jobs so Prometheus picks up aether-rust + monitor
metrics when the binaries run on the host while only the obs stack
runs in compose. Compose-internal targets stay intact.
Decoded V2/Sushi swaps now drive a constant-product post-state
prediction over a clone of the live price graph; Bellman-Ford runs
on the affected vertices and surfaces profitable cycles via
aether_pending_arb_candidates_total{router, profit_bucket}.
UniswapV2 Router02 and SushiSwap Router02 share the same ABI byte-for-
byte; the only signal the decoder has is the 'to' address. The previous
implementation hard-coded Protocol::UniswapV2 for every V2-shape decode,
so every Sushi pending tx was misattributed: the registry pool lookup
hunted in the UniV2 pool set, missed every Sushi victim, and inflated
the uniswap_v2 metric label while the sushiswap label never moved.

Adds a small SUSHISWAP_ROUTERS const table and a router_to_v2_protocol
helper that decode_v2_call now uses to dispatch on router address.
Mainnet Sushi Router02 (d9e1cE17…378B9F) is the seed entry; future
Sushi-flavoured routers (SushiX, RouteProcessor) only need to be added
to the const list.

Two new tests pin both halves of the dispatch:
- decode_sushiswap_router_tagged_as_sushi_not_uni_v2
- decode_uni_v2_router_still_tagged_as_uni_v2

Removes the orphaned 'SushiSwap callers rely on metric label' comment.

Refs PR #118 review (critical / blocking).
The aether-host-monitor scrape job pointed at host.docker.internal:9094,
but cmd/monitor/metrics.go defaults METRICS_PORT to 9090. Without an
explicit METRICS_PORT=9094 on the host process, MEV-Share and mempool
metrics never reached Prometheus and dashboards stayed empty.

Aligns the scrape port with the binary default. The in-compose
'aether-go' job at port 9090 was already correct; only the host-side
mirror was wrong.

Refs PR #118 review.
The previous post-state scan ran registry.values().find(...) per pending
swap, which is O(N) on the full pool registry. At 5000+ pools this
single linear scan dominates the per-event budget and starves the main
detection loop's 3 ms p99 target.

Adds a (token0_canonical, token1_canonical, protocol) → Vec<PoolMetadata>
index to SimContext, cached behind a Mutex<Option<(registry_ptr,
Arc<PairIndex>)>>. The index is rebuilt only when ArcSwap reports a new
underlying Arc (pointer comparison), so steady-state cost per lookup is
one Mutex acquire + one HashMap probe — independent of pool count.

Cache invalidation rides on the existing pool_registry Arc identity
rather than introducing a separate generation counter, so register_pool
needs no changes.

Replaces the public struct-literal constructor with SimContext::new so
the cache initialises consistently (fixes the only call site in main.rs
to use the helper, and de-duplicates EngineConfig::default() calls
while in there).

Refs PR #118 review.
Adds aether_pending_pipeline_lagged_total counter, incremented by the n
returned from broadcast::RecvError::Lagged(n) so dashboards see exactly
how many pending-tx events were dropped, not how many lag events fired.

Without this, sustained pipeline overload only showed up in tracing
logs; an alert on monotonic non-zero growth of this counter is the
trigger to widen the broadcast channel or shed mempool sources.

Refs PR #118 review.
The MEV-Share SSE consumer set http.Client.Timeout=0 (correct for a
streaming endpoint, since per-request deadlines would force reconnect
every minute regardless of stream health). Combined with no read
deadline on the body, a half-open TCP socket — NAT idle timeout, broker
crash, network partition — would leave the consumer parked on a
permanently silent connection. Metrics would simply stop moving and
the existing 'mev-share stream connected' log line would be the last
signal until process restart.

Adds an idleTimeoutReader that wraps resp.Body, runs a time.Timer
keyed off every successful Read, and cancels the request context when
no bytes arrive for sseReadIdleTimeout (60 s). Flashbots' :ping every
15 s gives 4 intervals of slack before forcing reconnect, so legitimate
quiet windows do not flap. The outer Run loop already has exponential
backoff and counts the resulting error in aether_mev_share_errors_total.

Refs PR #118 review.
Curve routers were already in the default address filter so the firehose
stays representative of real router traffic, but the decoder cannot yet
parse exchange / exchange_multiple — every such pending tx therefore
fell through to UnknownSelector and inflated the unknown_selector
counter, masking genuine selector gaps in other protocols.

Adds CURVE_ROUTERS const list, is_unsupported_curve_router predicate,
and a CurveUnsupported(Address) DecodeError variant. decode_pending
short-circuits on Curve before selector dispatch, mempool_pipeline
maps the error to a curve_unsupported metric reason. Adding a Curve
router (e.g. Curve Router NG) needs only a const append.

Refs PR #118 review.
alchemy_pendingTransactions is an Alchemy-proprietary subscribe method.
Reth, QuickNode, Infura and self-hosted Geth complete the WS upgrade
but never deliver events, so MEMPOOL_TRACKING=1 against any of those
silently produces zero metrics with no obvious failure mode.

.env.example calls this out explicitly with the list of incompatible
providers and a pointer to the MempoolSource trait for adding
non-Alchemy backends. AlchemyMempool::new now runs a hostname heuristic
on the WS URL and logs a structured warning when no Alchemy marker is
present, so non-Alchemy bring-ups fail loudly instead of silently.

Refs PR #118 review.
The post-state scan clones the price graph (~3.8 MB at current pool
counts) and runs Bellman-Ford on the affected vertices. Running it
inline on the broadcast recv loop ate the same tokio worker threads
the main detection loop uses, sharing the engine's 3 ms p99 budget
with mempool work.

Dispatches the scan onto tokio::task::spawn_blocking so it runs on
the disjoint blocking pool. The recv loop returns to next event
immediately; mempool sim contention with main detection is gone.
Steady-state allocation grows by one Arc clone of metrics + sim_ctx
+ DecodedSwap per pending swap (cheap; everything inside is Arc).

handle_event signature switches to &Arc<...> so the inner closure
can clone without dereferencing through a borrow.

Refs PR #118 review.
- Reconnect comment in AlchemyMempool::run was 'Linear bounded backoff'
  but the code is exponential (backoff *= 2). Fix to match.
- Document u256_to_f64_saturating's precision contract: 53-bit mantissa,
  exact up to 2^53, saturating to f64::MAX for adversarial inputs that
  would otherwise produce +inf in the V2 math kernel.
- is_none_or replaces map_or(true, ...) on the pair_index cache check
  (clippy suggestion under -D warnings).
- Mempool Grafana dashboard injects a job=~ filter into every metric
  selector so each panel is bounded to the producing service:
  aether_pending_* → aether-rust|aether-host-rust,
  aether_mev_share_* → aether-go|aether-host-monitor.
  Defensive cardinality control if the same metric ever leaks from
  another scrape job.

Refs PR #118 review (nits).
De-scoping the MEV-Share private-flow ingestion from the current branch.
Stage 1 plan is to prove the Alchemy public mempool path end-to-end on
mainnet first, then re-introduce MEV-Share with a Rust-side MempoolSource
that feeds the same engine pipeline (not a parallel monitor goroutine).

Removed:
- cmd/monitor/mev_share.go             (SSE consumer, retry, hint decode)
- cmd/monitor/mev_share_test.go
- 4 atomic counters from cmd/monitor/metrics.go (hints, with_logs,
  with_calldata, errors) and their /metrics text exposition
- spawn block in monitor main() guarded by mempoolTrackingEnabled()
- 3 panels from the mempool Grafana dashboard
  (hints rate, hints-with-logs %, stream health)

Kept:
- env vars MEV_SHARE_URL placeholder in .env.example
- general architecture mentions in CLAUDE.md / docs-site (system-level
  description of intended bundle submission paths is unaffected)
- prometheus.yml comment mentioning MEV-Share counters (harmless)

Both go build and cargo check are clean post-removal.
Hoist the (token_in, token_out, protocol) pool-registry membership check
out of `try_post_state_scan` and into `handle_event`, before the
`spawn_blocking` that clones the live price graph (~3.8 MB per event).
Drop swaps that would land nowhere useful — same-token, zero-amount, or
no pool registered for the pair — under a new
`aether_mempool_filtered_total{reason}` counter.

Stable label set:
  - same_token       — decoder returned a self-swap
  - zero_amount      — amount_in == 0 (no profit possible)
  - not_in_registry  — neither (token, token, protocol) tuple is in
                       pool_registry

Distinct from `pending_arb_sim_skipped_total`: that counter fires
*after* the sim task starts and discovers a missing graph edge / zero
reserves / etc. The new filter short-circuits before the spawn so the
graph-clone cost is avoided.

V3 / Balancer protocols still flow into the sim task and bump
`pending_arb_sim_skipped{protocol_unsupported}` so the existing dashboard
panel keeps measuring the analytical-sim coverage gap.

Live verification (60 s window, mainnet via Alchemy WSS, 3 pools loaded):
  aether_mempool_filtered_total{reason="not_in_registry"} 2
  aether_mempool_filtered_total{reason="zero_amount"} 3
  aether_pending_arb_sim_skipped_total                  0

Also adds scripts/mempool_smoke.sh — a 60 s engine boot that scrapes
/metrics twice, prints a PASS / FAIL verdict from the pending-tx
counters. Runs without Postgres / Anvil / executor, NoopLedger path.

Tests:
  - pre_sim_filter_drops_same_token_swaps
  - pre_sim_filter_drops_zero_amount
  - pre_sim_filter_drops_pair_absent_from_registry
  - pre_sim_filter_passes_unsupported_protocols_through
  - decoder_protocol_to_type_maps_supported_protocols
Companion of scripts/mempool_smoke.sh. Boots aether-rust against the
configured Alchemy WSS endpoint with MEMPOOL_TRACKING=1, runs for the
configured duration (default 600 s), scrapes /metrics at midpoint and
end, parses the engine log into:

  - 05_pending_dex_txs.csv (ts, tx_hash, router, protocol,
                            token_in, token_out, amount_in, fee_bps)
  - 06_filter_drops.csv    (per-metric per-reason drop counts)
  - 07_summary.md          (verdict, counters, block heights, sample
                            tx hashes for Etherscan cross-check)

Outputs land in reports/mempool-stage1-proof/, which is in .gitignore by
existing repo policy — committed log files would bloat the repo.

The script is the reproducible artefact; per-run output stays local.
`-D clippy::doc_overindented_list_items` rejects 24-space continuation
under the `sim_evm_fallback_total` and `mempool_filtered_total` doc
comment label-set lists. Reflow to 4-space continuation.

No behaviour change.
Closes the V3 / Balancer half of the mempool decode pipeline gap that
PR #128 deliberately deferred. After this commit decoded V3 `exactInputSingle`
and Balancer V2 `swap` calldata flow through the same shadow-graph
arbitrage scan as V2 / Sushi, using the analytical post-state predictors
that landed on develop in PR #128.

Changes:
  - SimContext gains a `pool_states: PoolStateCache` field, populated
    from `engine.pool_states()` at startup. Cheap to clone (Arc).
  - `decoder_protocol_to_type` now maps `Protocol::UniswapV3` and
    `Protocol::BalancerV2` to their workspace ProtocolType variants, so
    `pre_sim_filter` does the registry-membership check for those
    families and short-circuits non-registered swaps before the
    spawn_blocking + 3.8 MB graph clone.
  - `try_post_state_scan` matches V3 / Balancer alongside V2 / Sushi.
    For V3 / Balancer it looks up the pool's PoolState in the cache,
    calls `predict_post_state_with_fallback` (the closure bumps
    `aether_sim_evm_fallback_total{reason}` on low-confidence paths),
    maps the returned UnifiedPostState onto graph-edge reserves via the
    new `unified_to_post_reserves` helper, and runs Bellman-Ford
    against the cloned graph the same way V2 does.
  - `unified_to_post_reserves` documents the synthetic-reserves mapping
    per family. V3 derives the post-state spot price from
    `new_sqrt_price_x96` ((sqrt/2^96)^2) and routes via the swap
    direction; Balancer reads new_balance0/1 in canonical order.
  - New skip-reasons `pool_state_missing`, `predictor_low_confidence`,
    `post_state_invalid` cover the new failure modes; the previous
    `protocol_unsupported` reason is removed since no decoded protocol
    routes there anymore.

Side-effect: `aether_pending_arb_sim_skipped_total{reason="protocol_unsupported"}`
will go to zero after this lands, replaced by traffic on the new reasons
above and (when V3 swaps cross a tick or Balancer pools have unequal
weights) on `aether_sim_evm_fallback_total`.

Tests:
  - `pre_sim_filter_drops_v3_and_balancer_when_pair_not_registered`
    replaces the prior pass-through assertion.
  - `decoder_protocol_to_type_maps_all_decoded_protocols` updated.
  - 488 workspace tests, clippy clean.
Adds info-level log lines at each pipeline decision point so operators
can trace a single pending tx end-to-end without recompiling with
`RUST_LOG=...=debug`:

- `FILTER DROP` / `FILTER PASS` in `pre_sim_filter` — exposes which
  decoded swaps the registry / zero-amount / same-token gates drop.
- `REVM SIM OK` / `REVM SIM REVERTED` on every fork sim verdict —
  previously success was silent and reverts logged only at debug.
- `CYCLE REJECTED: below min profit threshold` upgraded from debug to
  info with both wei and ETH-formatted profit fields.

Counters already exist for all of these; the log lines exist for the
'follow one tx through the pipeline' workflow that motivated PR #118's
operator-visibility goal.

No behaviour change.
…n tests

Extracts the truthy-string parser out of `is_enabled` and exercises it directly
from the unit test, removing the std::env::set_var / remove_var calls. Those
are technically `unsafe` since Rust 1.80 (process-wide env, race-prone under
parallel cargo test) and become a hard error on edition 2024 -- this keeps the
crate building cleanly across the edition bump and broadens the truthy/falsy
coverage to eight values each.

Addresses the WARNING from the latest review on PR #118. No behaviour change.
Copy link
Copy Markdown
Owner

@Pablosinyores Pablosinyores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification on 39e920f

Both the clippy CRITICAL and the env-safety WARNING from the prior round are resolved.

CRITICAL — clippy -D warnings passes

$ cargo clippy --workspace --all-targets -- -D warnings
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 3.57s

The doc_overindented_list_items lint at metrics.rs:55-57 is gone (already addressed in 373ec1f fix(metrics): dedent doc-list items to satisfy clippy).

WARNING #1 — env mutation in tests

Fixed in 39e920f fix(mempool): pure is_enabled_from_str helper to avoid env mutation in tests. The truthy-string parser is now a pure fn is_enabled_from_str(&str) -> bool exercised directly from the unit test; is_enabled() is a one-line wrapper reading env. Removes both the Rust 1.80+ unsafe issue and the parallel-cargo-test race, and broadens coverage to 8 truthy + 8 falsy values.

WARNING #2 + SUGGESTIONs — deferred

  • Router-address label cardinality (mempool_pipeline.rs:274): bounded at 6 hard-coded routers today; reviewer agreed the dashboard aggregates by protocol/profit_bucket so the raw-address label is non-load-bearing. Will tighten when default_router_addresses() becomes config-loaded — tracked in follow-up.
  • f64 precision call-site comment + per-channel capacity: low-priority polish, deferred to follow-up.

Test results

cargo test --workspace            506 passed / 0 failed / 0 ignored
cargo clippy --workspace -D warns clean

Live mainnet evidence retained

Per the 2026-05-07 capture in the previous comment: 51 mainnet blocks observed via push, 8 decoded swaps, 6 mined (lead time +1.74s … +11.94s, mean +7.4s pre-mine), pending_arb_sim_skipped_total = 0 after pre-sim filter intercepts upstream.

Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E5] Live mempool tracking — Phase 0/1 testing scaffold (Alchemy WS + MEV-Share SSE + pending-tx decoder + post-state sim)

3 participants