Skip to content

Conversation

@alexluong
Copy link
Collaborator

@alexluong alexluong commented Nov 30, 2025

ClickHouse LogStore Implementation

Overview

This PR adds ClickHouse as a storage backend for the LogStore, using a single denormalized table design optimized for ClickHouse's columnar storage and aggregation functions.


Schema

CREATE TABLE event_log (
  -- Event fields (duplicated across delivery rows)
  event_id String,
  tenant_id String,
  destination_id String,
  topic String,
  eligible_for_retry Bool,
  event_time DateTime64(3),
  metadata String,
  data String,

  -- Delivery fields
  delivery_id String,
  delivery_event_id String,
  status String,            -- 'pending', 'success', 'failed'
  delivery_time DateTime64(3),
  code String,
  response_data String,

  INDEX idx_topic topic TYPE bloom_filter GRANULARITY 4,
  INDEX idx_status status TYPE set(100) GRANULARITY 4
) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (tenant_id, destination_id, event_time, event_id, delivery_time);

Key Design Decisions

Decision Rationale
Single denormalized table Avoids JOINs; each row = one delivery attempt for an event
DateTime64(3) Millisecond precision for accurate time comparisons
Daily partitions Efficient pruning for time-range queries
ORDER BY Optimized for typical query pattern: tenant → destination → time range

Data Model

An event with 2 delivery attempts = 2 rows:

event_id status delivery_time delivery_id
evt_001 failed 10:00:00.000 del_001_a
evt_001 success 10:00:01.000 del_001_b

To get the "current" status: argMax(status, delivery_time) → returns success


Operations

1. InsertManyDeliveryEvent

INSERT INTO event_log (
  event_id, tenant_id, destination_id, topic, eligible_for_retry, event_time, metadata, data,
  delivery_id, delivery_event_id, status, delivery_time, code, response_data
)
  • Uses batch insert for efficiency
  • Pending events: delivery_id = "", status = "pending", delivery_time = event_time
    • Using event_time ensures pending "loses" to any real delivery in argMax()

2. ListEvent

SELECT
    e.event_id,
    any(e.tenant_id) as tenant_id,
    any(e.destination_id) as destination_id,
    any(e.topic) as topic,
    any(e.eligible_for_retry) as eligible_for_retry,
    max(e.event_time) as event_time,
    any(e.metadata) as metadata,
    any(e.data) as data,
    argMax(e.status, e.delivery_time) as status      -- Latest status
FROM event_log AS e
WHERE e.tenant_id = ?
    AND e.event_time >= fromUnixTimestamp64Milli(?)  -- Preserves ms precision
    AND e.event_time <= fromUnixTimestamp64Milli(?)
    [AND e.destination_id IN (?)]
    [AND e.topic IN (?)]
GROUP BY e.event_id
[HAVING status = ?]                                   -- Status filter (aggregate)
ORDER BY event_time DESC, event_id DESC
LIMIT N+1                                             -- +1 to detect more pages

Key points:

  • GROUP BY event_id collapses delivery rows into one event
  • argMax(status, delivery_time) returns status from the row with max delivery_time
  • any() for event fields (identical across rows)
  • HAVING for status filter (can't use WHERE on aggregates)
  • fromUnixTimestamp64Milli() for proper DateTime64(3) comparison

Cursor pagination:

-- Next page (DESC): records older than cursor
HAVING event_time < cursor OR (event_time = cursor AND event_id < cursor_id)

-- Prev page: query ASC, then reverse results in Go
HAVING event_time > cursor OR (event_time = cursor AND event_id > cursor_id)
ORDER BY event_time ASC, event_id ASC

3. RetrieveEvent / RetrieveEventByDestination

SELECT
    event_id, tenant_id, destination_id, topic, eligible_for_retry,
    event_time, metadata, data,
    argMax(status, delivery_time) as status
FROM event_log
WHERE tenant_id = ? AND event_id = ?
      [AND destination_id = ?]
GROUP BY event_id, tenant_id, destination_id, topic, eligible_for_retry, event_time, metadata, data
LIMIT 1
  • GROUP BY all event fields (required by ClickHouse for non-aggregated columns)
  • argMax() for latest status

4. ListDelivery

SELECT
    delivery_id, delivery_event_id, event_id, destination_id,
    status, delivery_time, code, response_data
FROM event_log
WHERE event_id = ?
    AND delivery_id != ''    -- Exclude pending rows
ORDER BY delivery_time DESC
  • No aggregation - returns all delivery attempts
  • Filters out pending rows (empty delivery_id)

Known Concerns

COUNT Query

The current implementation runs a COUNT query on every ListEvent call. At high volumes, this may have performance implications. Worth discussing whether we want to keep COUNT in our pagination response or rethink the approach.

@vercel
Copy link

vercel bot commented Nov 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
outpost-docs Ready Ready Preview Comment Dec 2, 2025 5:53am
outpost-website Ready Ready Preview Comment Dec 2, 2025 5:53am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants