Skip to content

feat(tasks): clean-sheet rebuild spec for the async tasks plugin#208

Merged
youssefhany-ixo merged 11 commits into
mainfrom
claude/bold-johnson-zfUMl
Jun 16, 2026
Merged

feat(tasks): clean-sheet rebuild spec for the async tasks plugin#208
youssefhany-ixo merged 11 commits into
mainfrom
claude/bold-johnson-zfUMl

Conversation

@youssefhany-ixo

Copy link
Copy Markdown
Collaborator

Adds specs/tasks-async-system.md as the active design for the bundled tasks
plugin. Supersedes ORA-165 (six task types, Y.Doc pages, 700-line sub-agent
prompt) and the TASK-31 port-as-is plan. Both prior docs gain a SUPERSEDED
header pointing at the new spec.

The new design collapses tasks to one primitive (TaskSpec as YAML+markdown),
puts all storage behind a TaskSpecStore port (Redis today, filesystem when
the runtime grows the API), keeps the approval gate entirely in the plugin
(replacing the deleted tryHandleApprovalResponse plumbing), and ships
preview-before-save plus conditional dedicated rooms in MVP.

claude added 6 commits June 10, 2026 19:22
Adds specs/tasks-async-system.md as the active design for the bundled tasks
plugin. Supersedes ORA-165 (six task types, Y.Doc pages, 700-line sub-agent
prompt) and the TASK-31 port-as-is plan. Both prior docs gain a SUPERSEDED
header pointing at the new spec.

The new design collapses tasks to one primitive (TaskSpec as YAML+markdown),
puts all storage behind a TaskSpecStore port (Redis today, filesystem when
the runtime grows the API), keeps the approval gate entirely in the plugin
(replacing the deleted tryHandleApprovalResponse plumbing), and ships
preview-before-save plus conditional dedicated rooms in MVP.
Amends specs/tasks-async-system.md with three simplifications driven by
review:

- Storage port reframed as a 4-method filesystem (TaskFs: read/write/delete/
  list) with paths under /users/<did>/tasks/<id>/. Redis adapter today; a
  UcanFsAdapter (same auth model as sandbox) when the runtime grows the API.
  Ephemeral state (run locks, pending approvals, BullMQ) stays on direct
  Redis and never goes through the port.
- The worker no longer rebuilds RuntimeContext. It calls
  MessagesService.invokeForAutomation — a single thin entry point on the
  runtime — which reuses every existing agent path (auth, credits middleware,
  capability gating, checkpointer). BackgroundContextBuilder and the
  attendant DI of UCAN/Secrets/Sessions/Matrix/AMBIENT are deleted.
- Webhook + chain.after triggers deferred out of this rebuild to a follow-up.
  MVP ships time.once + time.cron only.

Phase 1 effort drops from 6d to 5d; total from 8.5d to ~7d.
…istory

Three MVP cuts driven by review:

- No REST surface. Tasks are agent-managed; the 6 HTTP endpoints (list/get/
  patch/pause/resume/cancel) and the controllers folder are gone. Re-add
  later only when a non-agent consumer wants it. (FOLLOWUP-5.)
- Per-run cost/token meter pushed to "good to have". The credits middleware
  still deducts, but surfacing the numbers from MessagesService.invokeFor-
  Automation hasn't been verified, so the preview output schema and the
  AutomationResult shape drop the tokens/costUsd fields for now. (FOLLOWUP-2.)
- Per-task monthly budget enforcement pushed (depends on the cost meter).
  budget.monthlyUsd removed from the TaskSpec frontmatter and example.
  (FOLLOWUP-4.)
- Run history saving pushed. No runs.jsonl on disk in MVP. Matrix room
  messages are the natural run log; get_task returns a room link plus the
  last-error block. RunSummary type removed from folder layout.
  (FOLLOWUP-3.)

Self-healing still works in MVP via a Redis failure counter
(tasks:failures:<id> hash with count + lastError + lastFailedAt) — reset on
success, drives the failed-pending-review threshold and feeds
suggest_spec_fix.

Sections 14 (REST) and 17 (Cost accounting) deleted; remaining sections
renumbered (15→14, 16→15, 18→16, 19→17, 20→18, 21→19, 22→20). Phase 1
effort drops from 5d to 4d; total from ~7d to ~5.5d.
Replaces the BUNDLED_PLUGINS stub with a real TasksPlugin that ships:

- TaskSpec — YAML frontmatter + markdown body; round-trips via gray-matter,
  validated by Zod. Stored at /users/<did>/tasks/<id>/spec.md.
- TaskFs port — 4 methods (read / write / delete / list). RedisTaskFs is
  the only implementation today; UcanFsAdapter (FOLLOWUP-1) is the planned
  swap once the runtime grows a per-user UCAN filesystem.
- EphemeralStateService — direct ioredis for run locks, failure counters
  (count + lastError + lastFailedAt), preview tokens, and pending approval
  payloads. Never goes through TaskFs because none of it is user-readable.
- SchedulerService + 2 BullMQ queues — `task_run` (5 concurrency, 3 retries
  exponential) and `task_approval` (10 concurrency, fixed backoff). Cron
  next-fire computed via cron-parser.
- TaskRunWorker — loads spec, calls MessagesService.sendMessage with
  msgFromMatrixRoom=true, hands result to delivery or approval gate. No
  parallel buildRuntimeContext, no createMainAgent call.
- ApprovalService — owns request / approve / reject / handleTimeout, exposes
  APPROVAL_GATE_PORT for cross-plugin reuse.
- ApprovalGateMiddleware — pre-LLM intercept with fast keyword path → cheap
  LLM fallback. Re-enqueues the resolution as a BullMQ job so the worker
  owns the post-message work. Replaces the deleted tryHandleApproval-
  Response plumbing with zero edits to MessagesService.
- IntentClassifier — fast path covers yes/no/ok/cancel/approve/reject and
  short-prefix variants; slow path is utility-tier LLM with a 1-word
  return.
- RoomResolver + DedicatedRoomService — heuristic decides whether a task
  gets its own [Task] <title> room (sub-day cron, webhook/chain, "monitor"
  intents, or explicit). Spec.md posted as opening message.
- AutomationInvoker — single call into MessagesService; resolves the user's
  main room via MatrixManager.getOracleRoomIdWithHomeServer.
- 9 main-agent tools — preview_task (with required previewToken handshake),
  create_task, list_my_tasks, get_task, update_task, pause/resume/cancel
  _task, suggest_spec_fix.
- 3 starter templates — morning-brief, url-monitor, weekly-report.
- 20 unit tests covering manifest, auto-detect, configSchema, tool set,
  spec round-trip, schema validation, paths, hashing, trigger summary,
  intent classifier fast path, and room resolver truth table.

Deps added to oracle-runtime: cron-parser, gray-matter.

`apps/app/src/tasks/` is NOT removed in this change — that's coupled to
the broader TASK-32 apps/app deletion. The new plugin lives independently.
… plumbing

Full review of the first implementation found a fatal bug plus systemic
over-engineering, so this is a clean rewrite (24 internal files → 14).

The fatal bug: the worker invoked the agent with a synthetic sessionId,
but RequestPreparer.prepare() throws NotFoundException for unknown
sessions — every task run would have failed, pushing every task to
failed-pending-review. Fixed properly: each task now owns ONE real
session (SessionManagerService.createSession, anchored in the task's
dedicated room when there is one), stored as `sessionId` in the spec
frontmatter. Every run continues that LangGraph thread, which also gives
tasks memory across runs ("compare to what you reported yesterday") via
the checkpointer, for free. No new MessagesService surface needed.

Other v1 problems fixed by the rebuild:

- Dedicated rooms never worked: matrix-bot-sdk's createRoom returns the
  room id string, and v1 cast the result to `{room_id?}` — always
  undefined. (The cast hid it; removed.)
- Failure counting was per BullMQ attempt, so one flaky run (3 retries)
  hit the 3-failure threshold instantly. Now @OnWorkerEvent('failed')
  records one failure per exhausted run and reschedules the next cron
  occurrence so transient failures don't kill the task.
- The middleware created its own Redis client + Queue via a module-level
  side-channel map, and resolution took 3 hops (middleware → BullMQ
  "resolve" job → worker → service). Now: the Nest module hands its wired
  service bundle back to the plugin instance via onReady; the middleware
  shares those services and resolves approvals in one call.
- The "cheap LLM classifier" slow path was dead code — never wired. The
  design is now honest: keyword fast path resolves yes/no without any
  model call (wrapModelCall returns the ack AIMessage directly); nuanced
  replies go to the main agent with a per-call system-prompt hint and a
  new resolve_pending_approval tool. The agent IS the classifier.
- Dead config and decorative fields removed: TASKS_DEFAULT_TIMEZONE
  (never read), modelTier (sendMessage has no model override — storing
  it would lie), delivery.format (the body's "How to report" section is
  the contract), draft status (nothing ever was one), preview expiresAt
  (redundant with Redis TTL). TASKS_MAX_PER_USER is now actually
  enforced in create_task.
- Redis key names, spec parse/render, and failed-pending-review
  transitions each live in exactly one place now (RedisState, TaskStore,
  and the two services that own them) instead of being duplicated across
  files.
- An 8-key service-locator registry is gone; preview tokens are single-
  use (GETDEL); run job ids are per-fire-time (taskId:runAtIso) so
  re-enqueues are idempotent without fighting BullMQ's completed-job
  id memory.

Tools go from 9 to 10 (resolve_pending_approval). Tests go from 20 to 29
and now cover TaskStore against an in-memory TaskFs, nextRunAtFor, and
the classifier table. specs/tasks-async-system.md updated to match the
implemented design (frontmatter, approval flow, worker shape, folder
layout, follow-ups).
Full end-to-end trace of every code path turned up six real bugs. Each
fix below is independently scoped; together they fall out of one
architectural change.

Critical:

1. **Preview pollutes the user's main Matrix room.** preview_task created
   a session via SessionManagerService.createSession, which posts
   "New Conversation Started" to the resolved room. With no roomId
   passed, that's the user's main oracle room. Every preview leaked.

2. **Tasks without a dedicated room appear in the user's session list.**
   The persistent task session was anchored in the main room when the
   user opted out of a dedicated room, so SessionsService.listSessions
   surfaced it.

3. **Dedicated room posted with sessionId: ''.** create_task built the
   spec, created the dedicated room (posting renderSpec(spec)), THEN
   filled sessionId. Empty string leaked into the room.

Fix for 1/2/3: every run (and every preview) uses AgentInvoker.runOnce,
which creates a synthetic session with overrideEventId (skips the Matrix
post per SessionManagerService.createSession's `??` short-circuit), runs
one agent turn, deletes the session. No persistent task session, no
sessionId in the spec frontmatter. The user's main room and session list
stay clean. We lose "memory across runs" — accept as a non-feature for
MVP (file as FOLLOWUP if ever needed).

4. **Preview token consumed before validation.** consumePreview was a
   destructive GETDEL called before the task-limit / next-run checks. A
   user hitting the limit lost their token. Split into peekPreview
   (non-destructive) + deletePreview (destructive); peek up-front, delete
   only after the spec is saved and the run enqueued.

5. **delivery.post swallowed errors.** On Matrix outage, the worker
   thought the run succeeded — resetFailures, schedule next cron — while
   the user never saw the result. post now throws on failure; the worker
   bubbles it as a run failure (BullMQ retries; consecutive counter
   advances on the final attempt). Non-critical posts (approval prompts,
   reminders, failed-pending-review notice) call delivery.safePost, which
   keeps the swallow-and-log behaviour where surrounding state shouldn't
   roll back.

6. **claimApprovalResolution TTL was 5 min while approvals last 49 h.** A
   crash between claim and clear let the claim expire long before the
   pending payload — a later resolver could double-deliver. TTL is now
   passed in; takePending uses APPROVAL_TTL_SEC.

Minor:

- get_task now returns a curated subset instead of spreading all
  frontmatter (drops owner and raw trigger).
- Dedicated-room post is a friendly markdown summary (title, trigger,
  approval, body) instead of raw YAML frontmatter.

All 654 tests still pass; typecheck and lint clean.

Spec updated to match: the frontmatter schema drops sessionId, §10
documents the synthetic-session approach, the glossary swaps "Task
session" for "runOnce".
@youssefhany-ixo youssefhany-ixo force-pushed the claude/bold-johnson-zfUMl branch from 36c99b1 to c4fb104 Compare June 10, 2026 16:22
- Introduced `synthetic-session.ts` and `synthetic-session.test.ts` to manage synthetic session IDs for background tasks.
- Updated `MessagesService` to handle synthetic session IDs correctly, ensuring they do not interfere with thread relations and post-sync operations.
- Enhanced the `TasksPlugin` to include a new approval flow, allowing tasks to wait for user approval before executing actions.
- Added middleware for task approval, enabling fast classification of user replies and integration with the approval flow.
- Updated task specifications to support the new approval mechanism and ensure proper handling of task states.
- Refactored related components to streamline the task execution process and improve overall reliability.

All tests have been updated to reflect these changes and ensure functionality remains intact.
- Updated tests for `acquireToolLock` to assert that releasing locks does not throw exceptions.
- Added checks for independent lock keys to ensure isolation between different keys.
- Improved overall test coverage for lock functionality.
@youssefhany-ixo youssefhany-ixo changed the title docs(tasks): clean-sheet rebuild spec for the async tasks plugin feat(tasks): clean-sheet rebuild spec for the async tasks plugin Jun 16, 2026
@youssefhany-ixo youssefhany-ixo merged commit 0bef392 into main Jun 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants