Skip to content

fix(cli)(sphere-sdk#247): refuse CLI when a sphere daemon holds the OrbitDB lock#28

Merged
vrogojin merged 1 commit into
integration/all-fixesfrom
fix/issue-247-daemon-gate
May 24, 2026
Merged

fix(cli)(sphere-sdk#247): refuse CLI when a sphere daemon holds the OrbitDB lock#28
vrogojin merged 1 commit into
integration/all-fixesfrom
fix/issue-247-daemon-gate

Conversation

@vrogojin
Copy link
Copy Markdown
Contributor

Summary

Short-term mitigation for the §C.4 "Database is not open" failure observed in `manual-test-full-recovery.sh` (unicity-sphere/sphere-sdk#247).

The daemon parks the event loop forever with OrbitDB / Helia open; LevelDB takes a POSIX advisory file lock (`fcntl(F_SETLK)`) on `/orbitdb//_index/LOCK` and on `/datastore/LOCK`. A sibling CLI in the same dataDir hits `LEVEL_LOCKED` → `Database is not open`, and the bounded retry from sphere-sdk PR #246 can never succeed (the contention isn't transient).

This gate detects the live-daemon case in `getSphere()` and exits with `EX_TEMPFAIL` (75), telling the operator to `sphere daemon stop` first. Skipped when our own PID owns the PID file (`daemon start` calling back into `getSphere` is the legitimate owner). Bypassed for `daemon stop`/`status` (which don't go through `getSphere`).

The proper long-term fix is a daemon-as-broker IPC surface (sphere-sdk#247 follow-up: Unix domain socket at `/.sphere-cli/daemon.sock` + RemoteOrbitDbAdapter mirroring the OrbitDbAdapter interface, so CLI commands talk to the running daemon instead of opening OrbitDB directly).

Changes

  • `src/legacy/daemon.ts`: export `readPidFile` and `isDaemonProcessAlive` (previously file-private helpers) so the CLI's getSphere can reuse them.
  • `src/legacy/legacy-cli.ts`: add `checkNoLiveDaemonOrExit()` and call it at the start of `getSphere()`. Print a clear message + EX_TEMPFAIL on contention.

Test plan

  • `npm run build` — clean.
  • Manual: running `sphere balance` while a daemon is alive in the same CWD exits cleanly with the operator hint, instead of hanging on the retry budget.
  • Manual: `sphere daemon start` itself still works (it owns the PID file; the gate's self-skip kicks in).
  • Manual: `sphere daemon stop` / `status` still work (they don't go through getSphere).
  • sphere-sdk's `manual-test-full-recovery.sh` (with the companion stop+start §C.4 change) passes §A–§C.3 cleanly; previously hit "Database is not open".

Companion

Stacks with sphere-sdk `fix/issue-247-residuals` (which also updates the manual-test script to stop+start the daemons around §C.4).

…lock

The daemon parks the event loop forever with OrbitDB / Helia open;
LevelDB takes a POSIX advisory file lock (fcntl(F_SETLK)) on
<dataDir>/orbitdb/<dbAddress>/_index/LOCK and on
<dataDir>/datastore/LOCK. A sibling CLI in the same dataDir hits
LEVEL_LOCKED -> 'Database is not open', and the bounded retry from
sphere-sdk PR #246 can never succeed (the contention isn't transient).

This short-term gate detects the live-daemon case in getSphere() and
exits with EX_TEMPFAIL, telling the operator to 'sphere daemon stop'
first. Skipped when our own PID owns the PID file (daemon-start
calling back into getSphere is the legitimate owner). Bypassed for
daemon stop/status (which don't go through getSphere).

The proper fix is a daemon-as-broker IPC surface (sphere-sdk #247
long-term: Unix domain socket at <dataDir>/.sphere-cli/daemon.sock,
RemoteOrbitDbAdapter mirroring the OrbitDbAdapter interface). Until
then, this stops the script-level cascade observed at §C.4 in
manual-test-full-recovery.sh.

Exports readPidFile and isDaemonProcessAlive from daemon.ts so
legacy-cli.ts can reuse them without duplication.
@vrogojin vrogojin merged commit e73ddb7 into integration/all-fixes May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant