Skip to content

Feat/daytona sandbox sync#65

Open
jon3350 wants to merge 2 commits into
stagingfrom
feat/daytona-sandbox-sync
Open

Feat/daytona sandbox sync#65
jon3350 wants to merge 2 commits into
stagingfrom
feat/daytona-sandbox-sync

Conversation

@jon3350
Copy link
Copy Markdown
Collaborator

@jon3350 jon3350 commented May 3, 2026

No description provided.

@jon3350
Copy link
Copy Markdown
Collaborator Author

jon3350 commented May 3, 2026

@claude review this

@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Claude finished @jon3350's task in 2m 55s —— View job


PR Review: Feat/daytona sandbox sync

  • Gather context (reading CLAUDE.md and key files)
  • Review architecture and new features
  • Review security concerns
  • Review code quality and patterns
  • Post final review

This is a large PR (~26k lines across 100 files) introducing Daytona sandbox integration, a command palette, slash commands, rate limiting, usage display, harness-creation assistant, and a comprehensive test suite. The overall architecture is solid, but there are a few security and correctness issues worth addressing before merge.


Security Issues

1. Missing auth on getByIds in commands.ts [High]

packages/convex-backend/convex/commands.ts:56-62 — the getByIds query has no authentication check. Any caller can pass arbitrary Id<"commands"> values and read those records, including the parametersJson field which may contain sensitive configuration. If commands ever store auth tokens or credentials in parameters, this is a data leak.

// commands.ts:56 — no identity check before returning data
export const getByIds = query({
  args: { ids: v.array(v.id("commands")) },
  handler: async (ctx, args) => {
    const results = await Promise.all(args.ids.map((id) => ctx.db.get(id)));
    return results.filter(Boolean); // ← no auth guard
  },
});

Fix this →

2. Command name collision between users in commands.ts [Medium]

packages/convex-backend/convex/commands.ts:25-52 — the upsert mutation looks up existing commands by name only (no userId in the index). If two users have a command named my-tool, user B's upsert silently patches user A's record. The by_name index likely needs to be by_user_name including a userId field, and userId should be stored on each command row.

Fix this →


Architecture & Correctness Issues

3. SSE currentEvent reset is incorrect in use-chat-stream.ts [Medium]

apps/web/src/lib/use-chat-stream.ts:232 — after parsing each data: line, currentEvent is immediately reset to "message". The SSE spec says an event type persists until a blank line dispatches the event. If FastAPI ever sends a multi-line data: payload (which SSE allows), all lines after the first will be dispatched as "message" and dropped. The reset should happen on blank-line dispatch, not after each data: parse.

Additionally, silently swallowing JSON parse errors at line 229–231 makes production debugging very hard. Consider logging at console.warn level at minimum.

// line 229-232 — both issues visible here
} catch {
  // Skip malformed JSON lines  ← no log, no trace
}
currentEvent = "message";  // ← should reset on blank line, not here

Fix this →%20currentEvent%20is%20reset%20to%20%22message%22%20after%20each%20data%3A%20line%20(line%20232)%20but%20per%20SSE%20spec%20it%20should%20only%20reset%20on%20blank-line%20event%20dispatch.%202)%20JSON%20parse%20errors%20(line%20229)%20are%20silently%20swallowed%20%E2%80%94%20add%20console.warn%20so%20protocol%20bugs%20are%20visible.%20Fix%20by%20tracking%20currentData%20buffer%20and%20dispatching%20on%20blank%20lines.%20Branch%3A%20feat%2Fdaytona-sandbox-sync&repo=DIodide/Harness)

4. Blob URL memory leak in sandbox-api.ts [Medium]

apps/web/src/lib/sandbox-api.ts:251-252downloadUrl() returns a URL.createObjectURL() blob URL but there's no mechanism to revoke it. Every file download leaks memory until the page is reloaded. The caller needs to call URL.revokeObjectURL() after the download link is clicked, or the method should return an { url, revoke } pair to make cleanup explicit.

Fix this →

5. OAuth window name collision in mcp-server-status.tsx [Low]

apps/web/src/components/mcp-server-status.tsx:75 — the popup is opened with window.open(url, "mcp-oauth", ...). If a user initiates two OAuth flows in quick succession (e.g., two different MCP servers), the second call reuses the existing "mcp-oauth" window, navigating it to the new URL mid-flow and breaking the first auth flow. Use a unique suffix per flow (e.g., "mcp-oauth-" + Date.now()).

6. Conversations truncated silently in fork operations [Low]

packages/convex-backend/convex/conversations.ts — the fork/edit-fork mutations appear to use a hard limit (.take(8192)) when fetching messages. For users with very long conversations, messages beyond this limit would be silently dropped from the fork without any warning. Consider adding a guard or surfacing this limit to the user.


Privacy / Data Governance

7. Full conversation content stored in ratings [Low]

packages/convex-backend/convex/harnessConfigRatings.ts:12-17 — the rate mutation persists the entire conversationSnapshot (all message role+content pairs) alongside the harness config rating. If a conversation contains credentials, API keys pasted by the user, or PII, this data is now stored indefinitely in a separate table without a defined retention policy. Consider either truncating stored messages or documenting that this table needs the same data-handling treatment as the main conversations table.


Minor Notes

  • use-chat-stream.ts:6 — the ?? "http://localhost:8000" fallback for FASTAPI_URL means a misconfigured prod deployment silently sends chat requests to localhost rather than failing loudly. If env.ts already validates this var, the fallback is dead code and should be removed. If it doesn't validate it, add it to the Zod schema.
  • commands.ts:16parametersJson is stored as a raw JSON string. Using v.any() with a structured object would preserve type safety in Convex and prevent storing invalid JSON.
  • test.yml — good that tests were added; CI looks clean. Cache keys could include OS tag if runners are ever mixed, and PR-level coverage comments would improve visibility.

What's Good

  • sandbox-server.ts ownership verification (assertOwner) is clean and consistent across all lifecycle operations.
  • Rate limiting with Arcjet is correctly fail-open (an Arcjet outage doesn't block chat).
  • OAuth postMessage origin validation in mcp-server-status.tsx (event.origin !== BACKEND_ORIGIN) correctly prevents message hijacking.
  • Env var schema in env.ts using Zod is a solid pattern — all secrets are validated at startup.
  • Error handling across Daytona lifecycle ops is thorough, covering both not-found and transient failures.
  • Strong test coverage added for new hooks, components, and utilities.

@jon3350 jon3350 changed the base branch from main to staging May 3, 2026 16:16
@jon3350
Copy link
Copy Markdown
Collaborator Author

jon3350 commented May 3, 2026

@claude I meant to merge into stagging, not main. Can you review again.

@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Claude finished @jon3350's task in 3m 2s —— View job


PR Review: Feat/daytona sandbox sync (vs origin/staging)

  • Gather diff against origin/staging
  • Review changed files
  • Post final review

This PR moves sandbox lifecycle ops (start, stop, archive, delete, sync) from FastAPI into TanStack Start server functions, adds Convex intent-gating so FastAPI won't silently restart a user-stopped sandbox, and introduces an LRU eviction fallback for Daytona concurrency limits. The architecture is sound and the separation of concerns (FastAPI = inference path only, TanStack Start = browser CRUD) is clean and well-commented.


Security Issues

1. CONVEX_DEPLOY_KEY baked into Vite bundle [High]

apps/web/vite.config.ts:35-38CONVEX_DEPLOY_KEY is inlined via Vite's define, the same way ARCJET_KEY is. The deploy key is an admin credential that bypasses Convex's auth layer and can call internalMutation/internalQuery directly. If any code path causes this string to appear in the client bundle (e.g., a client-side import of env.ts that isn't fully tree-shaken), the key is served to every browser.

TanStack Start's dead-code elimination should strip .handler() bodies from the client bundle, but the getConvexDeployKey() helper (called from handlers) is not itself guarded. Verify with bun run build && strings dist/client-bundle.js | grep "CONVEX_" before shipping. A safer pattern is to read the key only from process.env.CONVEX_DEPLOY_KEY inside handlers (never through env.ts), so there's no path from client imports to the secret.

Fix this →

2. callConvex in sandbox-server.ts calls internalMutation paths without path validation [Low]

apps/web/src/lib/sandbox-server.ts:66-80callConvex takes a free-form path string. Any typo (e.g. "sandboxes:updateStatus""sandboxes:updateStatu") silently fails with a Convex 404, leaving the Convex record stale (e.g. stuck on "stopping"). Consider wrapping in typed constants so the TypeScript compiler catches path mistakes at build time.


Correctness Issues

3. Lifecycle state can get stuck if Convex write fails after Daytona call [Medium]

In sandbox-server.ts, e.g. startSandbox:

  1. Sets Convex status to "starting"
  2. Calls Daytona to start
  3. Sets Convex status to "running"

If step 3 throws (network blip, Convex downtime), Convex is permanently stuck on "starting" until the user manually syncs. The same applies to stopSandbox/archiveSandbox ("stopping" stuck). The syncSandbox action mitigates this, but it's a manual escape hatch. A try/finally that falls back to a sync on failure would auto-recover:

// in startSandbox handler
await setStatus(data.sandboxId, "starting");
try {
  const sandbox = await getDaytona().get(data.sandboxId);
  await sandbox.start();
  await setStatus(data.sandboxId, "running");
} catch (e) {
  // ... existing error handling
}

This pattern is already correct in stopSandbox for the not-found case. The missing piece is the catch that writes "error" instead of leaving it on "starting"/"stopping".

Fix this →

4. _is_started_limit_error heuristic catches unrelated 400s [Medium]

packages/fastapi/app/services/daytona_service.py:247-255 — A 400 with any message containing "limit" or "too many" triggers LRU eviction. This could false-positive on errors like "Too many query parameters" or "Limit exceeded on payload size", stopping a sibling sandbox unnecessarily. Tighten the heuristic by also requiring the status code to indicate a quota/payment issue:

if err.status_code in (402, 429):
    return True
if err.status_code == 403:  # or another Daytona-specific code
    msg = str(err).lower()
    return any(kw in msg for kw in ("limit", "quota", "exceeded", "concurrent"))
return False

Or better, document exactly what error Daytona actually returns for this case and match precisely.

5. archiveSandbox double-property cast is fragile [Low]

apps/web/src/lib/sandbox-server.ts:199-202 — the archive handler casts the sandbox object to { state?: unknown; status?: unknown } to probe both .state and .status. This suggests the SDK's type isn't known at compile time. If the SDK changes its API (e.g., drops .state), the status check silently falls back to "error" and archives unconditionally. Pin to the actual SDK type or read the status field consistently (.status already works in normalizeDaytonaStatus elsewhere).


Minor Notes

  • LRU eviction doesn't write Convex for evicted sibling (by design, acknowledged in comment) — the tradeoff is the UI shows a sibling as "running" until it's next accessed. Consider adding a touch + setStatus("stopped") for the evicted sandbox so the dashboard isn't stale.
  • UI: no mutation loading indicatorinTransition is derived from Convex's live status, so there's a brief window between clicking Start/Stop/Archive and Convex updating where the button is clickable again. React Query's mutation.isPending could gate the button immediately on click.
  • Tests — the new test suite (test_daytona_service.py) has good coverage of intent gating and LRU eviction, including edge cases (no candidates, all-stopped siblings). Nice addition.
  • getStatusByDaytonaId and listSiblingsByLastAccessed are internalQuery — correctly locked down to deploy-key callers only. Good.
  • SandboxStoppedByUserError → 409 in main.py — clean pattern. The stable "sandbox_stopped_by_user" error code lets the frontend render a friendly empty state rather than a generic 500. Well done.
  • Intent gating logic in _ensure_running — the read path (Convex → Daytona order) is correct: check intent first, then start. The touch_sandbox_sync on every cache miss gives the LRU evictor accurate signal.

What's Changed vs the Previous Review

The previous review (vs main) covered many additional features (command palette, rate limiting, slash commands, harness assistant, etc.) that are already in staging. This diff is specifically the sandbox sync architecture:

  • Removed from FastAPI: start, stop, delete REST endpoints (moved to TanStack Start)
  • Added: sandbox-server.ts with assertOwner + all lifecycle ops
  • Added: Intent-gating in _ensure_running (Convex status check before auto-start)
  • Added: LRU eviction with _start_with_lru_retry / _evict_lru_for
  • Added: Three new Convex internal functions (getStatusByDaytonaId, listSiblingsByLastAccessed, touchSandboxInternal)
  • UI: Sandbox cards now route lifecycle actions through server functions; added Archive + Sync menu items; stop confirmation dialog

The CONVEX_DEPLOY_KEY bundle exposure (#1) is the most important thing to verify before merge.

…ync etc. we rely on LRU cache and timeout to stop sandboxes ourselves
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant