Skip to content

feat(ledger): trade-ledger foundation — schema + no-op access layer#115

Merged
0xfandom merged 8 commits into
mainfrom
feat/trade-ledger-foundation
May 5, 2026
Merged

feat(ledger): trade-ledger foundation — schema + no-op access layer#115
0xfandom merged 8 commits into
mainfrom
feat/trade-ledger-foundation

Conversation

@0xfandom
Copy link
Copy Markdown
Collaborator

Heads-up for reviewers — this PR does NOT close #95.
It ships only the foundation slice of the trade-ledger work: schema + no-op access layer. The Postgres-backed writes, engine / executor wiring, scheduled rollup, and CI integration-test container all land in separate follow-up PRs (links in Out of scope below). #95 stays open until those land.

Why split: the full issue touches Rust + Go + infra + CI + docs (~25-40 files) and has merge-conflict surface with in-flight PRs (#93 shadow, #98 obs, #114 polling). Foundation-only keeps this PR mergeable today with zero behaviour change when DATABASE_URL is unset.

Summary

  • Adds postgres:17 compose service (healthcheck + named volume).
  • Adds migrations/0001_trade_ledger.sql — 5 tables: arbs, bundles, inclusion_results, pool_registry, pnl_daily. U256 → NUMERIC(78,0), variable-shape path data → JSONB, all timestamps TIMESTAMPTZ. Indexes per the issue spec.
  • Adds scripts/db_migrate.sh — sqlx-cli wrapper, loads .env, applies migrations.
  • Adds Ledger trait + NoopLedger default in Rust (crates/common/src/db.rs) with NewArb / NewPool / InclusionUpdate payloads.
  • Adds Ledger interface + NoopLedger default in Go (internal/db/ledger.go) with NewBundle / NewInclusion / PnLDailyDelta payloads.
  • .env.example documents DATABASE_URL + POSTGRES_USER/PASSWORD/DB (commented out so unset behaviour stays the default).

Backward compatibility: binaries still build and run identically to current main. NoopLedger discards every write and logs once at startup so operators can grep for ledger disabled.

Files Changed

File Purpose
migrations/0001_trade_ledger.sql Initial schema (5 tables + indexes)
scripts/db_migrate.sh sqlx-cli migrate wrapper
crates/common/src/db.rs Rust Ledger trait + NoopLedger
crates/common/src/lib.rs Export db module
crates/common/Cargo.toml Promote serde_json + tracing to runtime deps
internal/db/ledger.go Go Ledger interface + NoopLedger
internal/db/ledger_test.go Go ledger smoke test
deploy/docker/docker-compose.yml postgres:17 service + postgres-data volume
.env.example DATABASE_URL + POSTGRES_* lines

Acceptance Criteria (issue scope satisfied by this PR)

  • docker compose up postgres brings up Postgres 17 with healthcheck (compose config validated; reviewer to do live up)
  • scripts/db_migrate.sh is in place; applies 0001_trade_ledger.sql idempotently against any reachable Postgres
  • With DATABASE_URL unset: both binaries boot and run identically to today (no writes, no errors)
  • cargo build --workspace, cargo clippy --workspace --all-targets -- -D warnings, cargo test -p aether-common clean
  • go build ./..., go test ./... -race -count=1, go vet ./... clean
  • Deferred to follow-up PRs: rows actually written by live / shadow / replay; row counts matching Prometheus counters; CI integration-tests job against a real Postgres service container

Test plan

  • cargo build --workspace — green
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo test -p aether-common --release — 40 pass incl new NoopLedger tests
  • go build ./... — green
  • go test ./... -count=1 — green incl new ledger smoke test
  • go vet ./... — clean
  • docker compose -f deploy/docker/docker-compose.yml config — postgres service + postgres-data volume registered correctly
  • Reviewer manual: docker compose up postgres -d + DATABASE_URL=postgres://aether:aether@localhost:5432/aether ./scripts/db_migrate.sh applies the migration

Out of scope — tracked under #95, will land in follow-ups

  • Rust PgLedgersqlx::PgPool impl of Ledger; engine wiring (insert_arb on every ARB PUBLISHED, insert_pool on auto-registered pools); workspace sqlx + uuid deps.
  • Go PgLedgerpgxpool impl; executor wiring (InsertBundle on bundle build, InsertInclusion on GetBundleStats resolve).
  • pnl_daily rollup — scheduled job, once per UTC day.
  • CIpostgres:17 service container in the Rust integration-test job, schema-applied + round-trip test.
  • Redis state cache — separate hot-path concern, different access pattern.
  • mev-inspect-py importer + Grafana dashboards backed by this schema — downstream of wiring.

Refs #95

Adds postgres:17 service, initial migration, sqlx-cli migrate script,
and Ledger traits in Rust + Go with no-op defaults. Binaries stay
backward-compatible when DATABASE_URL is unset.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aether Ready Ready Preview, Comment May 4, 2026 1:41pm
aether-63xv Ready Ready Preview, Comment May 4, 2026 1:41pm

Copy link
Copy Markdown
Owner

@Pablosinyores Pablosinyores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — feat(ledger): trade-ledger foundation

Foundation slice is well-scoped and the "zero behaviour change when DATABASE_URL unset" promise holds in code. Verified locally: cargo build -p aether-common, cargo test -p aether-common --lib db:: (2/2), go build ./internal/db/..., go test ./internal/db/..., go vet ./internal/db/..., docker compose config, bash -n scripts/db_migrate.sh — all clean. Schema↔payload mapping walked column-by-column on both sides.

Hot-path latency impact verified zero: NoopLedger is empty-body vtable dispatch, no alloc, no atomic. OnceLock/sync.Once only fires in the constructor.

Blocking (CRITICAL)

1. internal/db/ledger.go:40Error string is not nullable

SQL inclusion_results.error TEXT is nullable. Go zero-value "" will write an empty string instead of NULL, so WHERE error IS NULL returns zero rows for successful inclusions. Rust counterpart at crates/common/src/db.rs:70 correctly uses Option<String>.

-    Error         string
+    Error         *string

2. internal/db/ledger.go:39LandedTxHash []byte has no length contract

Schema is semantically a 32-byte tx hash; Rust counterpart at crates/common/src/db.rs:69 is Option<B256> (exactly 32 bytes). The unbounded []byte lets a caller silently write 0/20/64 bytes — Rust readers would then panic or get garbage. Suggest *[32]byte (nullable) or at minimum a documented length contract.

Should Fix Before Merge (WARNING)

  • go.mod:29github.com/google/uuid is marked // indirect but is now imported directly in internal/db/ledger.go:16. Any go mod tidy will rewrite go.mod/go.sum. Run go mod tidy and commit the result.
  • crates/common/src/db.rs:117-136uuid_compat::Uuid has no documented byte-order contract. google/uuid is RFC 4122; if the future uuid::Uuid swap uses a different layout, the FK correlation arbs.arb_idbundles.arb_id breaks silently. Add // TODO: replace with uuid::Uuid (RFC 4122 byte order) before PgLedger impl ships on the struct so the swap is gated on the byte-layout match.
  • crates/common/src/db.rs:52NewPool.protocol: String is stringly-typed; ProtocolType enum already exists at crates/common/src/types.rs:7. Casing drift in the DB will silently break WHERE protocol = ... filters. Bind through the enum's Display impl.
  • deploy/docker/docker-compose.yml:177 — postgres service starts unconditionally on every docker compose up, consuming port 5432 and ~50MB RAM even when no binary is wired to it. The reth service at line 199 uses profiles: [node] for the same reason. Add profiles: [ledger] (or similar) to preserve the "no surprise" contract for current operators.
  • Clock-authority inconsistencyinternal/db/ledger.go:25 (SubmittedAt) and :41 (ResolvedAt) are client-set time.Times that override the SQL DEFAULT now(), while arbs.ts has no Rust field and is DB-authoritative. If the Go executor's clock drifts vs Postgres, submitted_at/resolved_at aren't safely comparable with arbs.ts. Either document this as intentional (Go clock = "when Go acted", DB clock = "when row was written") or unify.

Nice to Have (SUGGESTION)

  • migrations/0001_trade_ledger.sql:62inclusion_results_bundle_id_idx is redundant with the composite PK (bundle_id, builder) (line 59); the PK B-tree already serves any leading-column query. Drop it.
  • internal/db/ledger.go:57-59 — interface methods take structs by value. NewBundle.SignedTxHex is a ~500B string. NoopLedger is fine (compiler elides), but the future PgLedger is forced to copy on every call. Pointer receivers cost nothing now and avoid a future interface-breaking change.
  • crates/common/src/db.rs:81 — trait methods are infallible by design. This boxes the future PgLedger into log-and-drop, which works for fire-and-forget but rules out caller-side retry/metric emission. Consider -> Result<(), LedgerError> with an extension trait that auto-logs-and-drops for hot-path callers.

What's Good

  • Single coherent commit, conventional format, clean message.
  • 9 files, no drive-by edits, scope matches the PR body.
  • NUMERIC(78,0) correctly sized for U256 (2^256 has 78 digits). BYTEA/TIMESTAMPTZ choices are correct.
  • OnceLock<()> / sync.Once startup-warn is symmetric across both binaries with identical log message, so operators can grep ledger disabled on either side.
  • Trait object safety is explicitly tested (db.rs:151).
  • No existing code imports aether_common::db or internal/db outside the new files — runtime behaviour vs main is genuinely unchanged.

Verdict

REQUEST CHANGES — gated on the two CRITICAL items above. Once those land, the WARNINGs are cheap fixes that lock the schema/trait into a shape that won't fight the PgLedger follow-up.

0xfandom added 6 commits May 4, 2026 18:57
…sion

NewInclusion previously used a non-nullable string for Error and an
unbounded []byte for LandedTxHash. Both diverged from the SQL schema
and the Rust counterpart (Option<String>, Option<B256>):

- string would silently write "" to a NULL-allowed column, breaking
  WHERE error IS NULL queries used by the reconciliation job.
- []byte allowed any length, while inclusion_results.landed_tx_hash is
  semantically a 32-byte tx hash and Rust enforces B256.

Switching to *string and *[32]byte makes nil round-trip to SQL NULL
and pushes the 32-byte invariant into the type system, eliminating
silent corruption when readers expect a fixed-length hash.

Refs PR #115 review.
go mod tidy hoists github.com/google/uuid (used by internal/db) and the
four go.opentelemetry.io packages already imported directly elsewhere
out of the indirect block. Required after the trade-ledger foundation
imported uuid as a direct dep.

Refs PR #115 review.
Replaces the stringly-typed protocol field with the ProtocolType enum
so callers cannot construct a pool record with a value the rest of the
system does not recognise. The Postgres column remains TEXT — the
PgLedger impl will serialise via ProtocolType's serde repr, preserving
schema compatibility while pushing the validation up to compile time.

Adds Default for ProtocolType (UniswapV2) so NewPool keeps its derived
Default; UniswapV2 = 1 mirrors the on-chain enum's first variant and
never silently maps to an undefined protocol code.

Refs PR #115 review.
Pins the in-memory layout to RFC 4122 network byte order, matching
Postgres's uuid binary wire format and uuid::Uuid::{as_bytes,
from_bytes}. Documents the contract on the module and on both
constructors so the eventual swap to uuid::Uuid is a straight
round-trip with no endianness surprises.

Refs PR #115 review.
Devs not using the trade ledger should not pay for an idle Postgres
process. Adding the service to the 'ledger' profile keeps it out of
the default 'docker compose up' set; opt in with:

  docker compose --profile ledger up -d postgres

Mirrors the existing 'node' profile used for the optional Reth
service. .env.example updated with the new bring-up sequence.

Refs PR #115 review.
Splits the timestamp columns into two classes:

- Event-time (arbs.ts, bundles.submitted_at, inclusion_results
  .resolved_at) are client-set; DEFAULT now() is a fallback for
  ad-hoc psql inserts only. Application writers populate the value
  at the moment the event happens to keep latency analysis and
  back-tests honest.
- Persistence-boundary (pool_registry.first_seen/last_seen,
  pnl_daily.updated_at) are DB-set, since they record when the row
  landed, not when the event occurred.

Documents the policy at the head of the migration so future writers
do not silently fall back to now() and skew downstream analysis.

Refs PR #115 review.
The manual Default impl introduced in 0944ef3 trips
clippy::derivable_impls under cargo clippy --workspace -- -D warnings,
which is the gate the CI rust job runs. Replacing it with #[derive
(Default)] + #[default] on UniswapV2 keeps the same semantics
(UniswapV2 = 1 as the safe sentinel) and unblocks CI.

Refs PR #115 review.
Copy link
Copy Markdown
Owner

@Pablosinyores Pablosinyores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — all prior findings addressed

Verified locally on e4fc873: cargo build -p aether-common, cargo test -p aether-common --lib db:: (2/2), cargo clippy -p aether-common --all-targets -- -D warnings, go build ./internal/db/..., go test ./internal/db/..., go vet ./internal/db/... — all clean. CI green (rust 2m55s, go 29s, solidity 17s, toolchain 4s). mergeStateStatus: CLEAN.

Blocking findings — both resolved

internal/db/ledger.go:48-49LandedTxHash *[32]byte and Error *string now match the Rust counterpart's nullability + length contract. The struct doc explicitly calls out the bugs the previous shapes would have caused ("" instead of NULL; arbitrary-length tx hashes), which closes the loop nicely for a future reader.

Non-blocking findings — all resolved

  • go.mod:7-12github.com/google/uuid and the four go.opentelemetry.io/otel packages promoted to the direct require block. go mod tidy is now a no-op for contributors.
  • crates/common/src/db.rs:124-137uuid_compat::Uuid byte-order contract documented as RFC 4122 network byte order, with explicit notes on Postgres binary wire format compatibility, uuid::Uuid::as_bytes/from_bytes round-trip, and the gotcha around from_bytes_le / Windows GUIDs. The future uuid::Uuid swap is now a typed contract, not an implicit assumption.
  • crates/common/src/db.rs:59NewPool.protocol is now ProtocolType, not String. Stringly-typed casing drift in the DB is no longer possible. crates/common/src/types.rs also derives Default mapping to ProtocolType::UniswapV2 = 1 so the Default derive on NewPool keeps working — the rationale is in the doc comment, which I think is the right safe sentinel.
  • deploy/docker/docker-compose.ymlpostgres service is now gated behind profiles: [ledger]. Verified: docker compose config --services returns 9 services (no postgres); docker compose --profile ledger config --services returns 10 (postgres added). The "no surprise" contract for current operators is intact.
  • migrations/0001_trade_ledger.sql:11-22 — clock-authority policy is now an explicit comment in the schema header: event-time columns (arbs.ts, bundles.submitted_at, inclusion_results.resolved_at) are CLIENT-SET, persistence-boundary columns (pool_registry.first_seen/last_seen, pnl_daily.updated_at) are DB-SET. The reasoning (event-vs-row time skew under load) is captured. Future wiring PRs have a clear contract.

One minor follow-up (non-blocking, just for tracking)

The new clock-authority policy says arbs.ts is CLIENT-SET, but NewArb at crates/common/src/db.rs:30-48 still has no ts field. The wiring follow-up that ships PgLedger will need to add pub ts: chrono::DateTime<Utc> (or equivalent) to NewArb and have callers populate it at arb-creation time. Same applies to NewBundle.SubmittedAt and NewInclusion.ResolvedAt on the Go side, which already have client-set fields — those are good. Just calling out the asymmetry so it doesn't slip when PgLedger lands.

Quality of the fixes is high — author didn't just patch the field types but also added doc comments explaining the bugs the original shapes would have caused, which makes the intent durable.

Verdict: APPROVE

@0xfandom 0xfandom merged commit 2f6bc26 into main May 5, 2026
7 checks passed
0xfandom added a commit that referenced this pull request May 5, 2026
…sion

NewInclusion previously used a non-nullable string for Error and an
unbounded []byte for LandedTxHash. Both diverged from the SQL schema
and the Rust counterpart (Option<String>, Option<B256>):

- string would silently write "" to a NULL-allowed column, breaking
  WHERE error IS NULL queries used by the reconciliation job.
- []byte allowed any length, while inclusion_results.landed_tx_hash is
  semantically a 32-byte tx hash and Rust enforces B256.

Switching to *string and *[32]byte makes nil round-trip to SQL NULL
and pushes the 32-byte invariant into the type system, eliminating
silent corruption when readers expect a fixed-length hash.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
go mod tidy hoists github.com/google/uuid (used by internal/db) and the
four go.opentelemetry.io packages already imported directly elsewhere
out of the indirect block. Required after the trade-ledger foundation
imported uuid as a direct dep.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Replaces the stringly-typed protocol field with the ProtocolType enum
so callers cannot construct a pool record with a value the rest of the
system does not recognise. The Postgres column remains TEXT — the
PgLedger impl will serialise via ProtocolType's serde repr, preserving
schema compatibility while pushing the validation up to compile time.

Adds Default for ProtocolType (UniswapV2) so NewPool keeps its derived
Default; UniswapV2 = 1 mirrors the on-chain enum's first variant and
never silently maps to an undefined protocol code.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Pins the in-memory layout to RFC 4122 network byte order, matching
Postgres's uuid binary wire format and uuid::Uuid::{as_bytes,
from_bytes}. Documents the contract on the module and on both
constructors so the eventual swap to uuid::Uuid is a straight
round-trip with no endianness surprises.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Devs not using the trade ledger should not pay for an idle Postgres
process. Adding the service to the 'ledger' profile keeps it out of
the default 'docker compose up' set; opt in with:

  docker compose --profile ledger up -d postgres

Mirrors the existing 'node' profile used for the optional Reth
service. .env.example updated with the new bring-up sequence.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Splits the timestamp columns into two classes:

- Event-time (arbs.ts, bundles.submitted_at, inclusion_results
  .resolved_at) are client-set; DEFAULT now() is a fallback for
  ad-hoc psql inserts only. Application writers populate the value
  at the moment the event happens to keep latency analysis and
  back-tests honest.
- Persistence-boundary (pool_registry.first_seen/last_seen,
  pnl_daily.updated_at) are DB-set, since they record when the row
  landed, not when the event occurred.

Documents the policy at the head of the migration so future writers
do not silently fall back to now() and skew downstream analysis.

Refs PR #115 review.
0xfandom added a commit that referenced this pull request May 5, 2026
Replaces fire-and-forget tokio::spawn per write with a single dedicated
writer task draining a bounded mpsc::channel(1024). Every Ledger trait
method becomes a non-awaiting try_send; saturation drops the row and
bumps a metric instead of fanning out unbounded background tasks while
Postgres is slow.

Adds LedgerMetrics, registered on the engine's existing prometheus
Registry via the new EngineMetrics::registry() accessor so a single
/metrics endpoint emits both engine and ledger families:

  aether_ledger_writes_total{op, result}     Counter
  aether_ledger_drops_total{op}              Counter
  aether_ledger_queue_depth                  Gauge
  aether_ledger_write_latency_ms{op}         Histogram

The writer task observes per-op latency from dequeue to query
completion, decrements queue_depth on dequeue, and tags every write
ok/err. A failing write logs and drops — never propagates to the
engine.

Channel size sized for ~3 s of bursty inserts at 200 arbs/s peak, so
saturation is the alert signal that Postgres is the bottleneck. The
sqlx pool stays bounded at 8 connections, so even if every slot stalls
the channel still drops cleanly rather than blocking.

Refs PR #115 follow-up; PR-1 of the 3-PR plan to close issue #95.
0xfandom added a commit that referenced this pull request May 5, 2026
Replaces fire-and-forget tokio::spawn per write with a single dedicated
writer task draining a bounded mpsc::channel(1024). Every Ledger trait
method becomes a non-awaiting try_send; saturation drops the row and
bumps a metric instead of fanning out unbounded background tasks while
Postgres is slow.

Adds LedgerMetrics, registered on the engine's existing prometheus
Registry via the new EngineMetrics::registry() accessor so a single
/metrics endpoint emits both engine and ledger families:

  aether_ledger_writes_total{op, result}     Counter
  aether_ledger_drops_total{op}              Counter
  aether_ledger_queue_depth                  Gauge
  aether_ledger_write_latency_ms{op}         Histogram

The writer task observes per-op latency from dequeue to query
completion, decrements queue_depth on dequeue, and tags every write
ok/err. A failing write logs and drops — never propagates to the
engine.

Channel size sized for ~3 s of bursty inserts at 200 arbs/s peak, so
saturation is the alert signal that Postgres is the bottleneck. The
sqlx pool stays bounded at 8 connections, so even if every slot stalls
the channel still drops cleanly rather than blocking.

Refs PR #115 follow-up; PR-1 of the 3-PR plan to close issue #95.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2] Trade ledger: Postgres schema + access layer (Rust + Go)

2 participants