feat(observability): mempool dashboard panels for reconciler + scorer#146
Merged
Conversation
Adds the Rust writer half of issue #131. Every decoded pending-tx swap whose post-state simulation succeeded now lands as a `mempool_predictions` row, gated on `MEMPOOL_LEDGER_DSN` so unset = no DB writes, no behaviour change. - migrations/0003_mempool_predictions.sql: prediction table with the schema from #131. Distinct DSN from the trade ledger so the two ledgers are independently enable-able. - crates/grpc-server/src/mempool_writer.rs: `MempoolPredictionSink` trait with `NoopMempoolSink` + `PgMempoolWriter` (sibling pattern to `aether_common::db::PgLedger` — bounded mpsc + dedicated writer task + saturation-drops + Prometheus surface). - crates/grpc-server/src/mempool_pipeline.rs: `SimContext` carries the sink; `try_post_state_scan` builds a prediction after computing the post-state regardless of cycle profitability (the reconciler in #131 Go half needs the full decoded-swap population). - crates/grpc-server/src/main.rs: reads `MEMPOOL_LEDGER_DSN` + `AETHER_GIT_SHA`, wires the sink into the mempool path. Metrics: `aether_mempool_predictions_persisted_total{protocol}`, `aether_mempool_writer_{drops_total, queue_depth, write_latency_ms}`. Followups in this phase: - PR-2: Go reconciler against confirmed blocks (#131 second half). - PR-3: aether-profit-scorer binary writing realized P&L (#132).
Closes the loop on PR #133's persisted predictions: subscribes to newHeads, matches landed tx hashes against the predictions table, and writes one mempool_reconciliation row per prediction once the outcome is known. The two tables together answer "did the tx land where we said it would, in the order we said it would, hitting the pool we said it would?" — entirely in SQL. - migrations/0004_mempool_reconciliation.sql: reconciliation table with outcome CHECK + cascade FK to mempool_predictions, both indexes from issue #131. Separate from PR-1's migration so each PR's schema move is reviewable in isolation. - internal/db/mempool_reconciliation_pg.go: sibling pattern to PgLedger — pgxpool, bounded channel, dedicated writer goroutine. Provides: * LookupPredictionByTxHash (sync, hot-path on per-block tx loop) * InsertReconciliation (fire-and-forget) * MarkStaleAsDropped (batch INSERT … SELECT for the 12-block window) - internal/db/mempool_reconciliation_metrics.go: aether_mempool_reconciled_total {outcome}, plus writer-internal drops/queue_depth/write_latency. - cmd/reconciler/main.go: standalone aether-reconciler binary. Two loops: * newHeads → BlockByHash → per-tx prediction lookup → receipt fetch for pool_path_correct → outcome=confirmed insert * Every 6s: MarkStaleAsDropped(currentHead) for predictions where predicted_target_block + 12 ≤ head - internal/db/mempool_reconciliation_test.go: pure unit tests for the outcome constants + StaleConfirmationWindow + metric registration, plus two integration tests gated on MEMPOOL_LEDGER_TEST_DSN that exercise the full SQL round-trip (insert prediction → lookup → insert reconciliation → SELECT join). Metrics: aether_mempool_reconciled_total{outcome}, aether_mempool_block_delta (histogram), aether_mempool_pool_path_total{protocol,correct}, plus the in-process counter family. Follow-up: - PR-3 adds the realized-profit scorer (#132).
Closes the value loop on PR #133 (predictions) + PR #134 (reconciliation) by computing what our analytical arb cycle would have realised against the actual post-state of the pool at the block where the victim swap landed. The headline answer is `SUM(net_profit_wei) WHERE decision='profitable'` over the soak window. - migrations/0005_mempool_profitability.sql: profitability table with cycle_path JSONB, realized_profit_wei + realized_profit_eth + gas_estimate_wei + net_profit_wei, decision CHECK + cascade FK to mempool_predictions. Renumbered from #132's literal `0003` because 0001-0004 are already taken on develop after PRs #133 and #134. - crates/grpc-server/src/profitability_writer.rs: sibling of the mempool_writer module from PR #133 — bounded mpsc, dedicated writer task, sqlx::PgPool, drop-on-saturation. Adds NewProfitabilityScore payload, ProfitabilitySink trait, NoopSink, PgProfitabilityWriter, ProfitabilityWriterMetrics. Provides fetch_unscored_confirmed for the scoring loop's polling read. - crates/grpc-server/src/bin/aether_profit_scorer.rs: new aether-profit-scorer binary. Bootstrap loads pools.toml and fetches reserves for every supported pool at the latest block to build a reference PriceGraph + TokenIndex. Poll loop every 30 s SELECTs confirmed-but-unscored predictions; for each, fetches the affected pool's reserves at actual_target_block (one eth_call), clones the reference graph, overwrites the affected edge, runs BellmanFord::detect_from_affected, optimises the best cycle through the same ternary-search the engine uses, and INSERTs a row with the computed decision. Inlines a few helpers (fetch_pool_state_at, build_graph, sol! getReserves/slot0) deliberately duplicated from aether_replay.rs — extracting them into a shared module would touch the merged 2200-line replay file and inflate this PR's review burden. TODO note in the module docstring for the post-phase deduplication. Metrics: aether_mempool_profit_scored_total{decision}, aether_mempool_profit_writer_drops_total, aether_mempool_profit_writer_queue_depth, aether_mempool_profit_writer_write_latency_ms{result}. The headline gauges named in issue #132 (`net_profit_eth_sum_24h` etc.) are rendered Grafana-side from rate(realized_profit_wei[24h]) rather than as in-process metrics, matching the same PromQL-vs-in-process trade-off PR-2 used for accuracy gauges. Dashboard JSON update deferred to the same follow-up that adds the panels.
The scorer's ternary-search optimiser computes hop output entirely in f64. At mainnet pool scale (USDC pools hold ~1e14 base units, WETH pools ~1e22) the f64 mantissa loses ulps and overstates gross output by amounts that fabricate ETH-scale ghost profit. The PR #135 soak surfaced this as one 5.29 ETH USDC/WETH/DAI triangle; the current re-soak surfaced eight rows totalling 481B ETH worth of ghost net profit — same root cause, different cycle shape (degenerate self-loops with massive reserve mismatch). Two-layer fix in `score_one`: 1. `verify_cycle_u256` re-walks every V2 hop in the optimiser's chosen cycle with exact `uniswap_v2_get_amount_out` U256 math at the same `running_states` reserves the optimiser saw, threading a local per-pool reserve copy so multi-hop cycles that revisit the same pool (Bellman-Ford self-loops) see hop N+1 reserves shifted by hop N's swap. Without the local copy, A→B→A would see pre-swap reserves on both legs and "regenerate" input, producing the same precision signature in U256 as f64. Cycles where every hop is V2/Sushi return `Some(gross_wei)`; `gross < input` ⇒ `DECISION_REVERTED`, otherwise exact `net = gross − input − gas` drives the decision. 2. When the verifier returns `None` (V3 hop, missing pool state, drained pool) the score falls back to the f64 optimiser's number — but capped: any f64-only verdict above `MAX_PLAUSIBLE_F64_NET_WEI` (1 ETH worth) is downgraded to `DECISION_REVERTED` because a 1+ ETH arb on mainnet would be captured intra-block by faster searchers and never reach our scorer. Sub-ETH V3 arbs pass through unchanged. `OptimiserSuccess` now exposes `optimal_input_wei` so the verifier can re-walk at the same input the optimiser converged on. Adds five unit tests covering: `uniswap_v2_get_amount_out` against on-chain math, `u256_to_i128_saturating` overflow handling, verifier inconclusivity on V3 hops, verifier loss on a balanced triangle, and verifier reserve-evolution on self-loops across four orders of input magnitude. Soak proof (29-row backlog re-scored against the live DB): decision | rows | sum_net_eth ----------+------+------------- no_path | 58 | 0.00000000 reverted | 8 | 641_531B (f64 noise, gated below the floor) profitable | 0 | unprofitable | 0| vs the broken baseline (pre-fix, same data, same backlog): decision | rows | sum_net_eth ----------+------+------------- no_path | 46 | 0.00000000 profitable | 8 | 481_148_577_928 ETH ghost `SELECT SUM(net_profit_wei) WHERE decision='profitable'` is now 0 ETH; the eight precision-bias rows land in `reverted` where the dashboard explicitly excludes them from realised P&L. Closes #132 (precision-fix portion).
The scorer's pool registry was the static `config/pools.toml` only, but the engine's runtime pair-index extends past that every time the mempool decoder spots a new pool. Pre-fix soaks showed ~88% of confirmed predictions resolved as `decision='no_path'` — not because the cycle was unreachable in the engine's view, but because the scorer's narrower registry couldn't see the pool. `load_predicted_pools` queries `SELECT DISTINCT ON (pool_address) pool_address, protocol, token_in, token_out FROM mempool_predictions WHERE pool_address IS NOT NULL` and folds the result into the LoadedPool registry on bootstrap and on every `GRAPH_REFRESH_INTERVAL` tick. Canonical (token0, token1) is derived from `min(token_in, token_out)` / `max(token_in, token_out)` — direction-agnostic V2/V3 invariant. fee_bps falls back to `DEFAULT_V2_FEE_BPS` (30) for Uni V2 / Sushi and `DEFAULT_V3_FEE_BPS` (5) for V3. V3's actual per-pool fee comes from `pool.fee()` and lives in (1, 5, 30, 100) bps; reading it would double bootstrap fan-out and the U256 verifier ignores V3 fee anyway, so the default is good enough for the f64 rate weight on the graph edge. `MAX_DB_PREDICTED_POOLS = 256` caps the augmentation so a runaway engine writing thousands of bogus addresses can't blow the bootstrap's `eth_call` budget; the `SELECT ... ORDER BY pool_address LIMIT $1` keeps the truncation deterministic across restarts. Protocol-string parser `parse_db_protocol` is intentionally narrow: only `uni_v2`, `uni_v3`, `sushi` map to a `ProtocolType`. Balancer / Curve / Bancor are valid engine protocols but the scorer can't compute their reserves yet — refusing them here keeps an unsupported pool from sneaking in with wrong fee_bps and nonexistent state. Soak proof (82-row backlog re-scored, scorer running from this branch HEAD against the live DB): decision | rows ------------+------ reverted | 82 no_path | 0 profitable | 0 vs the immediate pre-PR-5 baseline (same DB, scorer from #136 HEAD): decision | rows ------------+------ no_path | 72 reverted | 9 `decision='no_path'` dropped from 89% of rows to 0%; every confirmed prediction now reaches the verifier pipeline. The fact that they all land in `reverted` is PR-4's absurdity floor doing its job on V3-heavy cycles — that's expected and correct, not a regression. Tests cover `parse_db_protocol` short-form mapping (incl. negative cases for long-form names the config uses), the V2/V3 default fee constants, and the `MAX_DB_PREDICTED_POOLS` ceiling. Closes #132 (pool-source-narrowness portion).
Before this commit every confirmed mempool prediction whose best cycle touched a Uniswap V3 hop landed in `decision=reverted`. `verify_cycle_u256` short-circuited to `None` on the first V3 hop and the 1 ETH absurdity floor then caught the rate-only f64 verdict as precision bias — correct behaviour, but it meant the dashboard never saw real sub-ETH V3 arbs. This adds `verify_cycle_revm`: for cycles with at least one V3 hop the scorer deploys AetherExecutor and runs `executeArb` inside a pure-revm fork pinned to the scorer's reference block, then measures the ERC20 balance delta on SIM_OWNER as gross profit. V2-only cycles keep the existing U256 fast path unchanged. Cycles the revm path cannot resolve (unknown profit token, Curve/Balancer/Bancor hop, build failure) fall through to the unchanged f64 absurdity-floor fallback. Implementation: - `EvmSimulator::deploy_and_simulate_with_erc20_profit` — two sequential `transact` calls on one revm Context. CREATE produces the executor address; CREATE's state diff is committed into the CacheDB so the CALL sees the deployed runtime bytecode. Pre/post balance diff observable via revm's returned state map. - Scorer loads `contracts/out/AetherExecutor.sol/AetherExecutor.json` init bytecode once at boot via `--executor-artifact` (optional; scorer keeps current behaviour if absent). - Per-token balance-slot table (WETH=3, USDC=9, DAI=2, USDT=2) keyed by the cycle's starting token. Unknown tokens cause `verify_cycle_revm` to return `None` and fall through to f64. - `is_v3_touching_cycle` cheaply classifies each cycle before routing. Proof: - `cargo clippy --workspace --all-targets -- -D warnings` clean. - `cargo test --workspace --lib --bins` 26/26 scorer tests + 32/32 simulator tests pass. New tests cover the V2/V3 routing decision, Curve/Balancer rejection in `build_steps`, decision mapping for all three RevmVerdict outcomes, and balance-slot lookup.
… on reserves_zero
Symptom: zero V3 mempool predictions in the database over a 5-day soak
window despite the engine successfully decoding 71 V3 swaps (10 of
which passed the registry filter — USDT/WETH via UniswapV3 SwapRouter,
pools we cover). Metric proof:
`aether_pending_arb_sim_skipped_total{reason="reserves_zero"} 10` —
exact match against the 10 FILTER PASSes that vanished without writing
a prediction.
Root cause: V3 graph edges were created with their weight populated
but `reserve_in = reserve_out = 0.0`. Two call sites in
`crates/grpc-server/src/engine.rs`:
* V3 bootstrap branch (`bootstrap_pools` -> `ReserveResult::V3`)
* V3 live-update handler (`PoolEvent::V3Update`)
Both used `graph.add_edge(weight = price * fee, ...)` which only
touches weight + liquidity. The V2 path next door uses
`update_edge_from_reserves(r0, r1, fee)` which populates both reserves
AND the weight, which is why V2 mempool predictions worked end-to-end.
The mempool post-state pipeline's
`try_post_state_scan` then explicitly guards against zero reserves:
if edge_fwd.reserve_in <= 0.0 || edge_fwd.reserve_out <= 0.0 {
metrics.inc_pending_arb_sim_skipped("reserves_zero");
return;
}
so every V3 swap was dropped before reaching `predict_post_state`.
Fix: after each pair of `add_edge` calls in the V3 branches, also call
`update_edge_from_reserves` with the synthetic `(1.0, spot_price)`
pair. Convention matches the scorer's `state_to_graph_reserves` V3
branch and the docstring on `mempool_pipeline::unified_to_post_reserves`
("V3 uses a synthetic `(1.0, spot_price)` pair so Bellman-Ford treats
the two families identically").
The fix is purely additive — `add_edge` keeps creating the edge and
setting weight; `update_edge_from_reserves` then populates reserves on
the existing edge (which is a no-op-if-missing on its own, hence the
pairing). The weight derived from `(1.0, price) * fee` equals the
weight `add_edge` writes (`-ln(price * fee)`), so the two paths agree.
Proof:
- new test `test_v3_update_seeds_synthetic_reserves` asserts
`reserve_in == 1.0` and `reserve_out == price` on both forward and
reverse edges after a V3Update event with `sqrt_price_x96 = 2 * 2^96`
(price = 4.0).
- `cargo clippy --workspace --all-targets -- -D warnings` clean.
- `cargo test --workspace --lib --bins` all green (incl. existing
`test_v3_update_updates_graph`).
Unblocks PR #144 (revm V3 verifier in scorer) from "tested only in
unit tests" to "exercised on organic mainnet V3 mempool traffic" once
the engine is restarted with this build.
Extends the existing `aether-mempool` Grafana dashboard with 13 new panels (plus two row dividers) covering PR #134's reconciler accuracy gauges, PR #135's scoring throughput + writer health, and PR #137's DB-augmented pool registry impact. Existing engine-side panels (PRs #118 / #128) are untouched. Adds the matching Prometheus scrape jobs the panels query against: - `aether-host-reconciler` → `host.docker.internal:9094` (the Go reconciler binary, default port per `cmd/reconciler/main.go`). - `aether-host-scorer` → `host.docker.internal:9095` and `:9097`. The Rust scorer defaults to 9095; soak ops override via `PROFIT_SCORER_METRICS_ADDR=:9097`. Listing both targets lets the scrape pick up whichever is in use without an additional config swap. ### Panel additions PR #134 — Reconciler - Block accuracy (Δ ≤ 0) stat — `aether_mempool_block_delta_bucket{le="0"}` - Pool-path accuracy (1h) stat — `aether_mempool_pool_path_total{correct="true"}` - Reconciler queue depth — `aether_mempool_reconciler_queue_depth` - Reconciler drops (5m rate) — `aether_mempool_reconciler_drops_total` - Block-delta quantiles (p50 / p90) timeseries - Reconciliation outcomes pie — `aether_mempool_reconciled_total{outcome}` - Reconciler error rates by source (header / lookup / receipt) - Reconciler write latency (p50 / p95) PR #135–#137 — Scorer - Decision breakdown pie — `aether_mempool_profit_scored_total{decision}` - Scored rate by decision (5m) timeseries - Scorer queue depth - Scorer drops (5m rate) - Scorer write latency (p50 / p95) split by result label ### Deliberately out of scope - **PR #136 reverted-by-floor sub-counter**: the `aether_mempool_profit_scored_total` counter currently coalesces the absurdity-floor reverts with the U256-walker reverts (and after PR #144, the revm V3 verifier's reverts too) under a single `decision="reverted"` label. Splitting them requires a new label on the counter — a code change deliberately deferred. The decision-pie + scored-rate panels already plot the merged total. - **PR #135 net_profit_eth_sum_24h**: the scorer does not expose a per-decision net-profit gauge; that figure lives only in the `mempool_profitability` table. Surfacing it would need either a new Prom metric or a Postgres datasource — both are their own follow-up decisions. - **PR #135 top-10 unscored confirmed table**: requires a Postgres datasource that this dashboard intentionally does not add. - **PR #137 added_from_db gauge**: the scorer logs this value as a tracing field on each registry-refresh tick but does not emit a matching metric. Cheap to add, but pure-Prom dashboard scope says leave it out of this PR. ### Validation - `python3 -m json.tool mempool.json` parses cleanly (22 panels: 7 existing engine panels + 13 new + 2 row dividers). - `python3 yaml.safe_load(prometheus.yml)` parses cleanly. - Every PromQL expression references a metric name observed live on the running engine (`localhost:9092`) or reconciler (`localhost:9094`) endpoints, or declared in `crates/grpc-server/src/profitability_writer.rs` for the scorer-side panels. - Live render skipped: the local docker-compose obs stack (prom + grafana + alertmanager) is not currently running. Running the stack and pointing it at the three host scrape targets will render every panel against the real running services.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 20, 2026
Pablosinyores
added a commit
that referenced
this pull request
May 20, 2026
aether_mempool_profit_scored_total used to be labelled only by
decision, so the dashboard could see "10 reverted" but not whether
they came from the V2 U256 walker, the f64 absurdity floor, the V3
revm verifier, or organic revm reverts. Add a `reason` sub-label
distinguishing those code paths.
Five wire labels (pinned by unit test against the constants):
- n/a non-reverted decisions, no_path, or any path with
no sub-source worth distinguishing
- u256_walker V2-only exact-U256 walker reached a verdict
(PR #136 path)
- absurdity_floor f64 fallback above MAX_PLAUSIBLE_F64_NET_WEI (1 ETH)
downgraded to reverted (PR #136 path)
- revm_verdict V3-touching revm sim ran to completion with a
non-reverting verdict (PR #144 path)
- revm_revert V3-touching revm sim explicitly reverted/halted
(PR #144 path)
The reason is Prometheus-only and NOT persisted to the
mempool_profitability table — the migration's CHECK constraint only
covers decision, and adding a reason column would force every
existing row to back-fill. `NewProfitabilityScore.reason` skips the
DB insert path; it only flows into the metric label.
revm_verdict_to_decision and f64_fallback_verdict now return a
4-tuple (net, realised, decision, reason). no_path_outcome carries
REASON_NA. The aggregating let in score_one destructures
(net, realised, decision, reason) and threads reason into
ScoreOutcome.
Dashboard panels in deploy/docker/grafana/dashboards/mempool.json
(panel IDs 19 and 20 from PR #146) updated to sum by
(decision, reason) and legend-format {{decision}} / {{reason}}.
Title and description updated to reflect the new dimension.
Stacks on feat/dedupe-replay-scorer-helpers (PR #148).
Verification:
- cargo clippy --workspace --all-targets -- -D warnings : clean
- cargo test --workspace --lib --bins : 528 passed, 0 failed
- new test: reason_constants_are_stable_wire_labels
- existing verdict-helper tests updated to assert reason value
- python3 -m json.tool mempool.json : parses cleanly
Pablosinyores
added a commit
that referenced
this pull request
May 20, 2026
aether_mempool_profit_scored_total used to be labelled only by
decision, so the dashboard could see "10 reverted" but not whether
they came from the V2 U256 walker, the f64 absurdity floor, the V3
revm verifier, or organic revm reverts. Add a `reason` sub-label
distinguishing those code paths.
Five wire labels (pinned by unit test against the constants):
- n/a non-reverted decisions, no_path, or any path with
no sub-source worth distinguishing
- u256_walker V2-only exact-U256 walker reached a verdict
(PR #136 path)
- absurdity_floor f64 fallback above MAX_PLAUSIBLE_F64_NET_WEI (1 ETH)
downgraded to reverted (PR #136 path)
- revm_verdict V3-touching revm sim ran to completion with a
non-reverting verdict (PR #144 path)
- revm_revert V3-touching revm sim explicitly reverted/halted
(PR #144 path)
The reason is Prometheus-only and NOT persisted to the
mempool_profitability table — the migration's CHECK constraint only
covers decision, and adding a reason column would force every
existing row to back-fill. `NewProfitabilityScore.reason` skips the
DB insert path; it only flows into the metric label.
revm_verdict_to_decision and f64_fallback_verdict now return a
4-tuple (net, realised, decision, reason). no_path_outcome carries
REASON_NA. The aggregating let in score_one destructures
(net, realised, decision, reason) and threads reason into
ScoreOutcome.
Dashboard panels in deploy/docker/grafana/dashboards/mempool.json
(panel IDs 19 and 20 from PR #146) updated to sum by
(decision, reason) and legend-format {{decision}} / {{reason}}.
Title and description updated to reflect the new dimension.
Stacks on feat/dedupe-replay-scorer-helpers (PR #148).
Verification:
- cargo clippy --workspace --all-targets -- -D warnings : clean
- cargo test --workspace --lib --bins : 528 passed, 0 failed
- new test: reason_constants_are_stable_wire_labels
- existing verdict-helper tests updated to assert reason value
- python3 -m json.tool mempool.json : parses cleanly
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends `deploy/docker/grafana/dashboards/mempool.json` from 7 panels (PR #118 / #128 engine-side scaffold) to 22 panels covering the mempool reconciler (PR #134) and profitability scorer (PR #135 / #137). Adds the matching Prometheus scrape jobs so the panels actually have data when the obs stack is running.
Stacks on PR #145 (`feat/engine-v3-graph-reserves`).
What changed
`deploy/docker/prometheus.yml` — two new scrape jobs:
`deploy/docker/grafana/dashboards/mempool.json` — new panels organised into two collapsible rows:
Reconciler (PR #134)
Scorer (PR #135 – #137)
Out of scope (deliberately)
Validation
Live render not exercised
The local docker-compose obs stack (prom + grafana + alertmanager) is not currently running on this host (only `aether-postgres` is up). Bringing the stack up and pointing it at `host.docker.internal:9094` and `:9097` will render every panel against the real running services without further changes.