Local-first multi-machine sync

Progress branch: https://github.com/maphew/agentsview/tree/docs/local-first-multi-machine-sync

# Local-First Multi-Machine Sync: the Artifact Ledger

| | |
| ------- | ----------------------------------------------------------- |
| Status | Draft proposal |
| Date | 2026-06-12 |
| Authors | maphew, with Claude Fable 5 (multi-agent research workflow) |
| Inputs | `local-first-sync-research/` (audits, research, critique) |

## Summary

Make agentsview local-first with serverless multi-machine sync, in the
mold of fossil-scm: every machine holds the complete archive, sync is an
idempotent exchange of immutable artifacts over any dumb transport
(Syncthing folder, S3 bucket, or another agentsview instance over HTTP),
and no machine is architecturally privileged. No CRDT library is needed:
session content is single-writer-per-machine and append-mostly (a
grow-only set), and the thin layer of user-mutable metadata (renames,
trash, stars, pins) is handled by a fossil-ticket-style append-only
change log replayed deterministically with hybrid logical clocks.

SQLite remains exactly what the project already declares it to be: a
local, rebuildable derivation. The live database file never crosses the
wire. PostgreSQL push survives as an optional aggregation peer, demoted
from coordination point.

## Motivation

Today a user with a laptop and a desktop has three partial options, each
with a mandatory hub or a one-way arrow:

- `pg push` / `pg serve`: requires an always-on PostgreSQL server.
- `agentsview sync --host` (SSH pull): serverless and proven, but
  pull-only, manual/cron-only, re-downloads a full tar of every agent
  dir each run, only covers file-based agents, and never propagates
  user metadata (a rename or star made on one machine is invisible
  everywhere else).
- `duckdb push` / `quack serve`: a read mirror, beta, no FTS.

Upstream demand for something better is visible: issue 572 ("How to
synchronize data across multiple macOS devices"), issue 412 (periodic
SSH remote sync), issue 517 (multiple named pg targets), issue 484
(stars/pins in pg serve), issue 332 (pg push machine attribution), and
issue 655 (pg push same-id collision ping-pong).

### Goals

1. Every machine ends up with the full archive: all sessions from all
   machines, queryable and searchable locally (FTS5 intact).
2. User curation converges: renames, trash/restore, stars, pins made on
   any machine reach all machines.
3. No mandatory server. Any always-on peer, NAS share, or object-store
   bucket improves availability, but none is required by the
   architecture (fossil's "central server is a social convention").
4. Mixed app versions keep syncing: machine A on a newer release must
   interoperate with machine B on an older one.
5. The existing single-machine experience is unchanged when sync is not
   configured.

### Non-goals

- Real-time collaborative editing. Convergence latency is transport
  latency plus sync cadence (seconds to minutes).
- Partial/selective sync, subscriptions, or multi-tenant sharing. The
  trust model is a fully mutually trusted personal fleet (see Trust).
- Replacing `pg push`/`pg serve`; they remain as an optional hub.

## Why no CRDT engine

The data shape decides this. Audit of every table and column
(`local-first-sync-research/01-codebase-audits.md`) shows two classes:

1. **Bulk session content** (sessions, messages, tool_calls,
   tool_result_events, usage_events, secret_findings): derived
   deterministically from session files, created by exactly one
   machine, append-mostly. Merging two machines' archives is set-union
   of records that cannot conflict — the degenerate "grow-only set"
   CRDT that needs no library. Fossil's own docs describe its artifact
   bag the same way.
2. **User-mutable metadata** (sessions.display_name,
   sessions.deleted_at, starred_sessions, pinned_messages + notes,
   excluded_sessions tombstones, worktree_project_mappings): the only
   genuinely multi-writer data, edited rarely and by one human. An
   append-only log of timestamped change events with last-writer-wins
   replay — fossil's ticket model verbatim — is sufficient, auditable,
   and deterministic.

General-purpose CRDT machinery would solve a problem this data does not
have, while charging real costs (below).

## Alternatives considered and rejected

Full sourcing in `local-first-sync-research/03-technology-research.md`.
State of the ecosystem as of mid-2026:

- **Automerge**: `automerge-go` is effectively unmaintained (last
  commit Oct 2024, no tagged release, wraps a pre-3.0 core via cgo, no
  transport layer). Even Automerge 3 requires full in-memory document
  loads; the maintainers scope documents as "units of collaboration"
  and their own bulk-data research (sedimentree) moves large content
  out of the CRDT into content-addressed blobs — i.e. toward this
  design. Rejected for bulk and for metadata.
- **cr-sqlite**: upstream dormant since v0.16.3 (Jan 2024); the only
  maintained lineage is Fly.io's purpose-built fork for Corrosion. CRR
  constraints collide with this schema head-on: FTS5 virtual tables
  cannot be CRRs, the messages rowid PK and external-content FTS
  linkage break, triggers/CASCADE FKs are restricted, `__crsql_clock`
  shadow tables bloat a multi-GB DB, and Fly documented an ALTER TABLE
  metadata-backfill storm — this repo alters the sessions table
  routinely. Rejected.
- **SQLite session extension** (changesets): maintained forever as part
  of SQLite and accessible from Go via zombiezen/modernc (not via
  mattn/go-sqlite3 — issue 825 there, open since 2020) — but the binary
  changeset format is coupled to table column count, so every release
  that adds a sessions column (frequent here; dataVersion is at 36)
  bricks mixed-version sync. NDJSON's ignore-unknown-fields tolerance
  gives version skew handling for free. Rejected; its HLC/two-tier
  semantics are adopted, its codec is not.
- **Whole-DB replication** (Litestream v0.5 read replicas,
  sqlite3_rsync): healthy tools, but they produce N separate replica
  DBs. The entire Store interface, UI, and analytics assume one
  queryable DB; cross-replica fan-out would touch everything. Also
  one-way by design. Rejected as the architecture; fine as a backup
  strategy alongside.
- **Raw-session-file mirror over Syncthing** (sync the agent dirs
  themselves, let each machine parse everything): fastest to build and
  the best version-skew story, but structurally blind to non-file
  sessions — at least 7 agents in the registry are `FileBased:false`,
  plus uploads, claude.ai/ChatGPT imports, SSH-pulled sessions, and
  orphan-preserved sessions whose files are gone. It would also
  file-copy Zed's live `threads.db` SQLite database, the exact
  corruption hazard this design exists to avoid. Rejected as the sync
  unit; raw files return as optional fallback artifacts (see
  Invariants).
- **Server-light engines** (ElectricSQL, PowerSync, Evolu, Jazz,
  Ditto, Turso/libSQL embedded replicas, Marmot v2): all require a
  sync service, an always-on cluster, or have no Go story. Rejected.

## Design

### Overview

Each install maintains a write-once, content-addressed artifact store
alongside (never inside) the SQLite DB:

```text
$AGENTSVIEW_DATA_DIR/artifacts/<origin>/
  checkpoints/cp-<seq>.json        append-only numbered index files
  manifests/<hash>.json.zst        session manifests
  segments/<hash>.ndjson.zst       message segments
  meta/<hlc>-<hash>.json           user-edit change feed
  raw/<hash>                       optional raw source file fallback
```

A machine writes only under its own origin prefix (single-writer per
prefix means transports cannot conflict). Sync between any two stores —
or between a store and a folder/bucket/peer — is idempotent set-union
of immutable files. Ingestion derives the local SQLite rows from
foreign artifacts through the existing write paths.

```text
machine A                                            machine B
sync engine -> SQLite -> exporter -> artifacts/A --\
                                                    >-- transport --
ingester <- artifacts/B <---------------------------/   (folder/S3/
    |                                                    HTTP peer)
    v
SQLite (A + B merged, FTS5 maintained by normal triggers)
```

### Origin identity

Each install generates and persists an origin ID once: configured
machine name (default `os.Hostname()`, reusing the validation that
rejects the `local` sentinel, `internal/config/config.go`,
`internal/postgres/sync.go`) plus a short random suffix, e.g.
`thinkpad-x9k2`. The suffix survives hostname changes and distinguishes
restored/cloned machines; persistence copies the `EnsureAuthToken`
pattern (`internal/config/config.go`).

Global session identity is `(origin, native_session_id)`. Locally
produced rows are untouched (bare IDs, `machine='local'`). Foreign
sessions are stored as `id = origin + "~" + nativeID`, `machine =
origin` — byte-for-byte the proven SSH remote-sync convention
(`EngineConfig.IDPrefix`/`Machine` in `internal/ssh/sync.go`,
`applyRemoteRewrites` in `internal/sync/engine.go`, `StripHostPrefix`
in `internal/parser/types.go`), which every read path, the UI, and
`GetMachines` already render correctly. This avoids composite-PK
surgery across SQLite/PG/DuckDB under the Backend Parity rule.

### Artifact kinds

1. **Message segment**: canonical NDJSON (zstd) of N consecutive parsed
   messages keyed by natural coordinates — ordinal, source_uuid, role,
   content, tool_calls by (ordinal, call_index), tool_result_events by
   event_index, token fields. Natural-coordinate keying is already the
   schema's cross-copy convention (secret_findings; the orphan copier's
   (session_id, ordinal) joins). Message rowids are explicitly unstable
   (`nextMessageIDTx`) and never appear in artifacts.
2. **Session manifest**: small JSON with the parser-derived session
   header (the same field set `sessionPushFingerprint` enumerates), an
   ordered list of segment hashes, inline usage_events, the producer's
   data_version, and a generation counter. A newer manifest for the
   same session supersedes older ones, ordered by (data_version,
   generation). Steady-state appends emit one tail segment plus a new
   manifest reusing prior segment hashes. Superseded manifests/segments
   become unreferenced and GC-able after a grace window.
3. **Meta change event**: tiny JSON
   `{v, hlc, origin, session_gid, op, value}` with op in {rename,
   soft_delete, restore, star, unstar, pin, unpin, purge}; pins anchor
   by source_uuid with ordinal fallback (the existing
   `savePinsTx`/`restorePinsTx` logic). Append-only forever; the full
   edit history is retained.
4. **Checkpoint**: `cp-<seq>.json` mapping session_gid to current
   manifest hash plus the meta-feed high-water mark. Append-only
   numbered files keep the store fully write-once; discovery of changes
   is O(changed), not O(store).
5. **Raw source fallback** (optional, on by default for file-based
   agents): the original session file stored as a content-addressed
   blob and referenced from the manifest. See Invariants for why.

### Export

Export reads from the DB, not from source files, so non-file agents,
uploads, imports, SSH-pulled sessions, and orphan-preserved sessions
all publish. After each successful session write, the session is queued
and debounced by reusing the existing pg-watch loop: the artifact
exporter implements the same small target interface
(`cmd/agentsview/pg_watch.go`), so `agentsview sync --watch` is the
existing daemon with a different sink. Change detection reuses the
fingerprint-skip discipline (`sessionPushFingerprint` plus per-session
last-exported-manifest state, modeled on `pg_sync_state`). Export is
scoped to machine-owned rows; `machine='local'` is rewritten to the
origin ID at export time. Uploads (which default to `machine='remote'`)
are explicitly included — both prior designs fumbled this.

### Ingestion

A new `internal/artifact` importer per foreign origin: read the latest
checkpoint, diff manifest hashes against an `artifact_sync_state` table
(modeled on `pg_sync_state`), fetch and hash-verify missing segments,
assemble `db.Session` plus `[]db.Message`, apply the `origin~` prefix,
and write through the existing paths (`UpsertSession`,
`ReplaceSessionMessages`/`WriteSessionBatchAtomic`). That single
decision inherits, for free: FTS5 maintenance via the normal triggers
(including the bulk trigger-swap fast path), excluded/trashed tombstone
rejection, pin re-attachment by source_uuid, and stats triggers. The
importer then replays new meta events in HLC order and fires the SSE
broadcaster (closing the existing gap where non-engine writes never
emit `data_changed`).

Manifests that reference segments not yet delivered are recorded as
phantoms (fossil's term) and retried on the next pass, tolerating
arbitrary delivery order from dumb transports.

### Metadata ledger

Every user-mutation handler (rename, soft-delete/restore/permanent
delete, star, pin) additionally appends one meta event to the
machine's own feed. Replay is ordered by (HLC, artifact-hash tiebreak)
— a data-intrinsic ordering key, so every node derives identical state
from identical artifact sets regardless of local clocks. The HLC is
persisted across restarts, monotonic per node, with a bounded-drift
clamp (the Actual Budget pattern). Per-field last-writer-wins; when two
origins write the same field within clock-skew distance, the losing
value is appended to a `meta_conflicts` log and the UI shows a fork
badge with both values — converge automatically, never silently lose
(fossil's lesson). Applied events go through the existing DB mutators
in a suppress-re-export mode so no echo loops arise.

### Deletes

- Soft delete / restore: ordinary meta events; `deleted_at` converges.
- Permanent delete (`purge`): per-event opt-in only ("delete
  everywhere" confirmation). It propagates an `excluded_sessions`
  tombstone, which `UpsertSession` already enforces against
  resurrection, and peers locally shun the session's bulk artifacts.
  Default remains today's semantics: EmptyTrash is local-only.
- Checkpoint absence is never deletion (see Invariants).

### Transports

One verb, three target shapes, all the same set-union:

1. **Folder** — `agentsview sync /path/to/share` (Syncthing, Dropbox,
   NFS, rclone mount). Safe for dumb file sync because every file is
   immutable, written temp+rename, and single-writer-per-prefix. The
   live SQLite DB never crosses the wire — the documented corruption
   class (SQLite's "How To Corrupt", Zotero's KB, Syncthing forums)
   does not apply to write-once artifact files.
2. **HTTP peer** — `agentsview sync https://desktop:8080`. Four routes
   on the existing embedded server behind the existing Bearer-token
   middleware: list origins, get checkpoint, get artifact by hash, post
   artifact (hash-verified, write-once, idempotent). Stateless and
   resumable; fossil's igot/gimme reduced to HTTP GETs because
   content-addressing makes "have" a stat call. Any running agentsview
   is a rendezvous, like `fossil ui`.
3. **Object storage** — same layout under an S3/B2 prefix; rclone
   against the folder shape covers it until native support lands.

### Interaction with resync and dataVersion

The artifact store lives outside the DB file, so `ResyncAll`'s atomic
swap does not touch it; `artifact_sync_state` is carried across the
swap alongside the existing metadata copy (`CopySessionMetadataFrom`).
After a parser-version resync, changed sessions re-export with bumped
data_version manifests — the same "force full push after resync" rule
pg push uses. Segments whose canonical content is unchanged keep their
hashes; a parser change touching a common message field genuinely
re-ships content, which the raw-file fallback hedges (peers may
re-derive locally instead of re-downloading).

Mixed versions: bundles are NDJSON with ignore-unknown-fields and
skip-unknown-ops rules plus an explicit format version, so an older
reader skips fields it does not know and a newer reader tolerates their
absence. Each machine's own parser and dataVersion govern only its own
DB.

### What pg push becomes

Short term: unchanged (ingested peer sessions are ordinary rows, and
their machine column carries true origin). Medium term: extract the
small SessionSink interface latent in push.go's orchestration (which
contains no SQL); PG becomes one sink, the artifact exporter another.
Long term: PostgreSQL is an optional aggregation/analytics peer.
Prerequisite fixes regardless of this design: machine-scoped export
(upstream issues 332 and 655).

## Invariants (pinned before any code)

1. **Canonical serialization is a forever-contract.** Sorted keys,
   fixed number formatting, explicit format version; golden tests
   enforce byte-stability. Any silent change re-hashes every segment
   and triggers a fleet-wide reship. The raw-source fallback artifact
   exists so that even a broken contract degrades to local re-derive,
   not re-download.
2. **Checkpoint absence is never deletion.** Tombstone events are the
   only delete mechanism. A session vanishing from an origin's
   checkpoint (local EmptyTrash, export bug, truncation) must not
   propagate removal.
3. **`ErrSessionTrashed`/`ErrSessionExcluded` on import means retry
   later, never advance the watermark.** Meta events are tiny and
   segments large, so a soft-delete routinely arrives before content;
   if the watermark advanced anyway, a later restore would strand stale
   content with nothing to trigger a re-fetch.
4. **Single-writer-per-prefix is the only write rule on shared
   transports.** Colliding origin IDs (cloned machine, restored backup)
   must be detected (checkpoint seq conflict) and surfaced loudly, not
   merged.
5. **The live SQLite file never crosses the wire.** Documentation must
   say this explicitly and warn against syncing the data dir.

## Trust model

A fully mutually trusted personal fleet. Folder transports have no
per-writer identity (prefix discipline is convention; Syncthing has no
per-subdir ACL), and HTTP mode is one shared symmetric Bearer token —
any peer can forge any origin's metadata. That is acceptable for one
person's machines and must be documented as exactly that. Per-peer
tokens are the minimum follow-up before any sharing story; origin
signatures are the eventual answer.

Practical availability note, stated plainly in docs: two
intermittently-on laptops sync only when both are online. A NAS folder,
S3 bucket, or any always-on peer is the practical rendezvous — by
social convention, not architecture, exactly as in fossil.

## Migration

Fully additive. Upgrade generates an origin ID; no rewrite of existing
rows. `agentsview sync --init` backfills artifacts for the whole
existing DB (including orphans) and seeds the meta feed from current
display_name/deleted_at/stars/pins timestamped with local_modified_at.
Machines without sync configured behave exactly as today. New tables
(`artifact_sync_state`, `meta_clock`/conflicts) arrive via the existing
idempotent migration pattern; no dataVersion bump, no resync.

## Phasing

1. **Prereq fixes (days)**: machine-scoped pg push export; preserve
   per-session machine at push time (upstream issues 332, 655). Real
   bugs regardless of this design.
2. **Phase 1 (2-4 weeks)**: artifact store, canonical serializer with
   golden tests, exporter, folder-transport set-union, importer,
   `sync --init`. Delivers the headline want — every machine sees all
   sessions — read-only, over Syncthing/Dropbox/NFS, no schema surgery.
3. **Phase 2 (2-3 weeks)**: HLC, meta ledger, deterministic replay,
   fork badges, purge with confirm UX. Delivers converging curation.
4. **Phase 3 (1-2 weeks)**: HTTP peer endpoints behind existing auth,
   `sync --watch` via the pg-watch loop, peers page in the UI.
5. **Phase 4 (ongoing)**: GC of superseded artifacts, native S3 target,
   SessionSink refactor of pg push, two-instance E2E harness.

Estimated 6-10 weeks total for one developer; each phase ships value
alone.

## Risks

- Canonical-serialization drift (highest variance; mitigated by golden
  tests, format version, raw fallback).
- HLC/LWW edge cases: skewed clocks, restart persistence, tie
  determinism — table-driven tests required; replay must be idempotent
  under any feed permutation.
- Storage growth: a third local copy of the corpus (source files + DB +
  compressed artifacts) plus peers'. zstd gives 5-10x on JSONL; GC of
  superseded bulk artifacts is mandatory, with a grace window against
  slow peers.
- Meta feed file count: one small file per edit grows forever;
  personal-scale fine, needs a batching/compaction story eventually.
- FTS5 initial ingest at multi-GB scale re-tokenizes every peer's
  corpus; use the existing Drop/Rebuild bulk path for first ingest.
- N-times row counts make the machine filter load-bearing UX for the
  sidebar and analytics.
- Scope creep: this deliberately stops at set-union plus LWW. Partial
  sync, subscriptions, or content merging would erode the simplicity
  that makes it safe to own (~3-4k LOC).

## Open questions

- Should the raw-source fallback be mandatory rather than optional for
  file-based agents? (Cost: storage; benefit: version-skew immunity.)
- Per-agent or per-project export excludes (selective publish, fossil
  private-branch analog) in v1 or later?
- Dotfile-synced agent dirs produce visible duplicates under two
  origins (today: silent same-id merge). Document "pick one transport
  per agent dir", or attempt content-hash coalescing in the UI later?
- Does `worktree_project_mappings` (already machine-keyed) ride the
  meta ledger or stay local-only?

## Related upstream issues

- 332 — pg push overwrites original machine name on remote-synced
  sessions (open).
- 655 — pg push: sessions.id sole PG PK; same-id pushes from two
  machines silently merge and ping-pong (filed from this work).
- 412 — periodic SSH remote sync from serve (open feature request).
- 517 — multiple named pg targets (open feature request).
- 484 — stars/pins in pg serve (closed; metadata demand signal).
- 572 — multi-machine dashboard question (closed; demand signal).

## References

Full research underlying every claim here, including sources and
line-level code citations, lives in `local-first-sync-research/`:
codebase audits (01), the SSH remote-sync deep audit (02), technology
research with sources (03), the three competing design proposals (04),
and the adversarial critique that selected and hardened this design
(05).


_gpt-5.5 on behalf of maphew_.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local-first multi-machine sync #692

Local-First Multi-Machine Sync: the Artifact Ledger

Summary

Motivation

Goals

Non-goals

Why no CRDT engine

Alternatives considered and rejected

Design

Overview

Origin identity

Artifact kinds

Export

Ingestion

Metadata ledger

Deletes

Transports

Interaction with resync and dataVersion

What pg push becomes

Invariants (pinned before any code)

Trust model

Migration

Phasing

Risks

Open questions

Related upstream issues

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


Status	Draft proposal
Date	2026-06-12
Authors	maphew, with Claude Fable 5 (multi-agent research workflow)
Inputs	`local-first-sync-research/` (audits, research, critique)

Uh oh!

Local-first multi-machine sync #692

Description

Local-First Multi-Machine Sync: the Artifact Ledger

Summary

Motivation

Goals

Non-goals

Why no CRDT engine

Alternatives considered and rejected

Design

Overview

Origin identity

Artifact kinds

Export

Ingestion

Metadata ledger

Deletes

Transports

Interaction with resync and dataVersion

What pg push becomes

Invariants (pinned before any code)

Trust model

Migration

Phasing

Risks

Open questions

Related upstream issues

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions