Skip to content

feat(metrics): split aether_decode_errors_total by reason#109

Merged
0xfandom merged 1 commit into
mainfrom
fix/decode-errors-by-reason
Apr 27, 2026
Merged

feat(metrics): split aether_decode_errors_total by reason#109
0xfandom merged 1 commit into
mainfrom
fix/decode-errors-by-reason

Conversation

@0xfandom
Copy link
Copy Markdown
Collaborator

Summary

  • Replace single IntCounter with IntCounterVec labelled by reason
  • decode_log now returns Result<PoolEvent, DecodeReason> so the reason propagates end-to-end
  • Three reason labels: unknown_topic (benign, high-volume in discovery mode), malformed_payload (real data-integrity bug — page on spikes), insufficient_topics (upstream producer bug)
  • Per-reason counter values asserted in unit tests, not just presence

Files Changed

File Change
crates/ingestion/src/event_decoder.rs Add DecodeReason enum + as_str(); change decode_log + all inner decoders to return Result<PoolEvent, DecodeReason>; add 3 new reason-label tests (malformed, unknown, reason label contract)
crates/grpc-server/src/metrics.rs IntCounterIntCounterVec with reason label; inc_decode_errors(&str); render test asserts all three label series
crates/grpc-server/src/provider.rs record_decode_failure now takes DecodeReason; trace log includes reason field; two new end-to-end tests (malformed_payload, insufficient_topics) in addition to the updated unknown_topic test
crates/integration-tests/tests/anvil_fork_test.rs Migrate call sites from Option to Result

Acceptance Criteria

  • aether_decode_errors_total{reason="..."} emits at least the three labels (unknown_topic, malformed_payload, insufficient_topics)
  • Each call site in the decoder picks the correct label — asserted at decoder level (DecodeReason unit tests) AND at the provider level (end-to-end process_logs tests)
  • Per-reason counter values asserted in tests (aether_decode_errors_total{reason="malformed_payload"} 1), not just metric presence
  • cargo test --workspace --release + cargo clippy --workspace --release --all-targets -- -D warnings clean on touched crates
  • Grafana panel updated to stack by reason — tracked separately when dashboards PR lands (per issue spec)

Test plan

  • cargo build -p aether-ingestion -p aether-grpc-server --release — clean
  • cargo test --workspace --release — all tests pass (72 ingestion, 64 grpc-server, rest of workspace)
  • cargo clippy --workspace --release --all-targets -- -D warnings — no warnings
  • New tests: test_decode_sync_malformed_payload, test_decode_v3_swap_malformed_payload, test_decode_reason_label_strings, test_process_logs_malformed_payload_reason_label, test_process_logs_insufficient_topics_reason_label
  • Updated tests: test_process_logs_decode_failure_increments_counter now asserts the reason="unknown_topic" series specifically

Closes #92

Replace single IntCounter with IntCounterVec labelled by reason
(unknown_topic / malformed_payload / insufficient_topics).
decode_log now returns Result<PoolEvent, DecodeReason> so the
reason propagates to record_decode_failure.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aether Ready Ready Preview, Comment Apr 23, 2026 10:08am
aether-63xv Ready Ready Preview, Comment Apr 23, 2026 10:08am

@0xfandom 0xfandom merged commit b49d55d into main Apr 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2] Split aether_decode_errors_total by reason label

1 participant